Chemical Informatics for Metabolite Identification and Biochemical Network Reconstruction

Chemical informatics for metabolite identification and biochemical network reconstruction

Image: Tadpoles by Geoff Gallice (CC License)

In the 17th century Santorio Santorio conducted an expirement in which he weighed himself before and after eating, sleeping, working, fasting, and drinking. He found that most of the food he took in was lost through what he called “insensible perspiration”. What he was in fact witnessing were mechanisms of metabolic processes. Metabolism (from Greek: “change”) is the set of life-sustaining chemical transformations within the cells of living organisms; they allow organisms to grow and reproduce, maintain their structures, and respond to their environments.

Metabolomics, the technology to comprehensively measure (changes in) the metabolites in a biological sample, has great potential to impact on our understanding of biological systems and processes at a chemical level. Full exploitation of metabolomics data is currently limited by the complexity of the datasets generated within current platforms which are difficult to manage by human experts alone. eScience technology is therefore required to play a crucial role in mining and interpreting complex metabolomics data.

In this project a computational workflow will be developed to improve and accelerate metabolite identification and biochemical pathway reconstruction in metabolomics studies. A key step in the workflow is generating an in silico metabolite network on the basis of empirically derived reaction rules that delivers candidate structures for unknown metabolites in a metabolomics experiment. This will allow more systematic and automated structure elucidation on the basis of the bioanalytical data (e.g. LC-MS) and at the same time provide hypotheses for the biochemical pathways leading towards the newly identified metabolites.

Measurement of the metabolites in a biological sample results in a snapshot of the physiology of the cell. Integration of metabolomics data with other –omics data will present insight in the machinery which is present in cells and how these are used to metabolize compounds and will therefore provide a more complete picture of the functioning of organisms.

Due to the chemical diversity of metabolites, automation and throughput of the identification process is currently less advanced in metabolomics than in proteomics and transcriptomics. Development of a computational workflow to improve and accelarate metabolite identification and biochemical pathway reconstruction is required for metabolomics to increase its impact in systems biology.

Currently, the developed technology allows uploading mass spectral data and retrieval of candidate molecules from several public sources. The candidate molecules for each measured mass are presented and ranked by probability of being the measured compound by matching calculation against measured fragmentation patterns. This allows metabolomics experts to focus on the most relevant candidates and obtain a quick indication of the fragmentation pathways that occur. To extend the technology to the identification of unknown metabolites, not yet present in chemical databases, reaction rules are applied to complement the libraries of candidate molecules with potential metabolic products.

Successful implementation of this new concept will be accomplished by means of a flexible data infrastructure, efficient and parallelized computational algorithms and visualization of complex data. The result will be a practical toolbox that will be integrated with existing workflows for metabolomics data analysis.

Participating organisations

Wageningen University & Research
Netherlands eScience Center
Netherlands Metabolomics Institute
SURFsara
Unilever
Life Sciences
Life Sciences

Impact

Output

Team

SB
Susan Branchett
eScience Coordinator
Netherlands eScience Center
Lars Ridder
Lars Ridder
Principle Investigator
Wageningen Universiteit & Research
Stefan Verhoeven
Senior eScience Research Engineer
Netherlands eScience Center

Related projects

FEDMix

Fusible evolutionary deep neural network mixture learning from distributed data for robust medical...

Updated 22 months ago
Finished

candYgene

Prediction of candidate genes for traits using interoperable genome annotations

Updated 21 months ago
Finished

Biomarker Boosting

Better biomarkers through datasharing

Updated 22 months ago
Finished

TraIT

A sustainable infrastructure for translational medical research

Updated 22 months ago
Finished

Related software

MAGMa

MA

MAGMa is an online application for the automatic chemical annotation of mass spectrometry data.

Updated 16 months ago
97 3

Osmium

OS

Start, stop and monitor applications remotely via a HTTP interface.

Updated 31 months ago
2

Xenon

XE

If you are using remote machines to do your computations, and don’t feel like learning and implementing many different APIs, Xenon is the tool for you.

Updated 16 months ago
13 11