Facilitating the "Great Bake Off" of Bioinformatics Workflows

Workflomics: A platform for automated generation and workflow benchmarking in bioinformatics

Photo credit – Shutterstock

Life science researchers across all disciplines work with ever larger and increasingly complex datasets. They use increasingly sophisticated data analysis pipelines and workflows, constructed from numerous individual software tools. Creating optimal workflows for specific data analysis problems is a challenge. It requires an interplay of exploring the latest relevant tool combinations and benchmarking selected workflow candidates with reference data to determine the best-performing ones. Due to a lack of adequate tooling, this is currently hardly done systematically. Therefore, many workflows compromise on scientific quality.

To tackle this problem, we will develop a novel software system facilitating a “Great Bake Off” of computational workflows in bioinformatics. Its key contribution will be a new and unique integration of bioinformatics tools and metadata with technologies for automated workflow exploration and benchmarking. The system will provide a much-needed platform for systematic workflow generation and evaluation that complements and can be interfaced with existing state-of-the-art workflow systems. It will leverage ongoing technological developments at the European level, in particular existing initiatives of the ELIXIR Tools Platform.

We have selected use cases from the thriving bioinformatics discipline of proteomics to drive its development. They are representative for many modern workflow applications as they deal with highly complex data, are composed of large collections of individual software tools, and typically require high-performance computing resources. The project will enable a new, systematic and rigorous approach to the development of cutting-edge proteomics workflows. This will increase their scientific quality and robustness, and furthermore improve their reproducibility, FAIRness and maintainability.

Participating organisations

Leiden University Medical Center
Netherlands eScience Center
University of Potsdam
Utrecht University
Life Sciences
Life Sciences

Output

Team

Peter Kok
Peter Kok
Research Software Engineer
Netherlands eScience Center
NA
Nauman Ahmed
Research Software Engineer
Netherlands eScience Center
Magnus Palmblad
Magnus Palmblad
Principal investigator
Leiden University Medical Center
Anna-Lena Lamprecht
Anna-Lena Lamprecht
Principal investigator
University of Potsdam
Rob Marissen
Rob Marissen
Scientific Software Developer
Leiden University Medical Center
Pablo Lopez-Tarifa
Pablo Lopez-Tarifa
eScience Coordinator
Netherlands eScience Center

Related projects

Common Workflow Language

The reference CWL runner and other software from the Common Workflow Language open standards community.

Updated 13 months ago

Related software

APE

AP

A CLI and Java API for the automated generation of computational pipelines (scientific workflows) from large collections of computational tools.

Updated 2 weeks ago
23 5

RESTful APE

RE

RESTfull API for the APE (Automated Pipeline Explorer) library.

Updated 3 weeks ago
3

Workflomics

WO

Workflow exploration and benchmarking platform in bioinformatics domain.

Updated 3 weeks ago
1 7