Life science researchers across all disciplines work with ever larger and increasingly complex datasets. They use increasingly sophisticated data analysis pipelines and workflows, constructed from numerous individual software tools. Creating optimal workflows for specific data analysis problems is a challenge. It requires an interplay of exploring the latest relevant tool combinations and benchmarking selected workflow candidates with reference data to determine the best-performing ones. Due to a lack of adequate tooling, this is currently hardly done systematically. Therefore, many workflows compromise on scientific quality.
To tackle this problem, we will develop a novel software system facilitating a “Great Bake Off” of computational workflows in bioinformatics. Its key contribution will be a new and unique integration of bioinformatics tools and metadata with technologies for automated workflow exploration and benchmarking. The system will provide a much-needed platform for systematic workflow generation and evaluation that complements and can be interfaced with existing state-of-the-art workflow systems. It will leverage ongoing technological developments at the European level, in particular existing initiatives of the ELIXIR Tools Platform.
We have selected use cases from the thriving bioinformatics discipline of proteomics to drive its development. They are representative for many modern workflow applications as they deal with highly complex data, are composed of large collections of individual software tools, and typically require high-performance computing resources. The project will enable a new, systematic and rigorous approach to the development of cutting-edge proteomics workflows. This will increase their scientific quality and robustness, and furthermore improve their reproducibility, FAIRness and maintainability.