Facilitating the "Great Bake Off" of Bioinformatics Workflows

Workflomics: A platform for automated generation and workflow benchmarking in bioinformatics

Photo credit – Shutterstock

Workflomics

Life science researchers across all disciplines work with ever larger and increasingly complex datasets. They use increasingly sophisticated data analysis pipelines and workflows, constructed from numerous individual software tools. Creating optimal workflows for specific data analysis problems is a challenge. It requires an interplay of exploring the latest relevant tool combinations and benchmarking selected workflow candidates with reference data to determine the best-performing ones. Due to a lack of adequate tooling, this is currently hardly done systematically. Therefore, many workflows compromise on scientific quality.

To tackle this problem, we will develop Workflomics a novel software system facilitating a “Great Bake Off” of computational workflows in bioinformatics. Its key contribution will be a new and unique integration of bioinformatics tools and metadata with technologies for automated workflow exploration and benchmarking. Workflomics will provide a much-needed platform for systematic workflow generation and evaluation that complements and can be interfaced with existing state-of-the-art workflow systems. It will leverage ongoing technological developments at the European level, in particular existing initiatives of the ELIXIR Tools Platform.

We have selected use cases from the thriving bioinformatics discipline of proteomics to drive the development of Workflomics. They are representative of many modern workflow applications as they deal with highly complex data, are composed of large collections of individual software tools, and typically require high-performance computing resources. The project will enable a new, systematic and rigorous approach to the development of cutting-edge proteomics workflows. This will increase their scientific quality and robustness, and furthermore improve their reproducibility, FAIRness and maintainability.

RESTful APE - Software Sustainability project

The RESTful APE (RESTful API for the APE library) was initially developed for the Great Bake Off project, enabling streamlined interaction with APE's automated pipeline exploration through HTTP requests. Originally focused on bioinformatics, the project, upon receiving a software sustainability budget, has expanded its usability to include domains such as geosciences. This expansion has been supported by enhancements in documentation, functionalities, and demonstrations.

Participating organisations

Leiden University Medical Center
Netherlands eScience Center
University of Potsdam
Utrecht University
Life Sciences
Life Sciences

Output

Team

Peter Kok
Peter Kok
Research Software Engineer
Netherlands eScience Center
NA
Nauman Ahmed
Research Software Engineer
Netherlands eScience Center
Magnus Palmblad
Magnus Palmblad
Principal investigator
Leiden University Medical Center
Anna-Lena Lamprecht
Anna-Lena Lamprecht
Principal investigator
University of Potsdam
Rob Marissen
Rob Marissen
Scientific Software Developer
Leiden University Medical Center
Pablo Lopez-Tarifa
Pablo Lopez-Tarifa
eScience Coordinator
Netherlands eScience Center

Related projects

Common Workflow Language

The reference CWL runner and other software from the Common Workflow Language open standards community.

Updated 17 months ago

Related software

APE

AP

A CLI, Java API and RESTful API for the automated generation of computational pipelines (scientific workflows) from large collections of computational tools.

Updated 1 month ago
24 5

RESTful APE

RE

RESTfull API for the APE (Automated Pipeline Explorer) library.

Updated 4 months ago
3

Workflomics

WO

Workflow exploration and benchmarking platform in bioinformatics domain.

Updated 2 months ago
1 7