NPLinker
Microbial natural products data mining by integrating genomics and metabolomics data
A community-supported workflow connecting microbial genes, and organisms to their molecular products
Microbes produce bioactive molecules with high-value properties such as antibiotics. Microbial extracts from soil and gut samples contain many different unknown molecules of unknown producer organisms. Such extracts could well contain bioactives with novel modes of action that would make great antibiotics. Genomics and metabolomics profiles are excellent sources to map microbial chemistry and assess their novelty through comparison to known biosynthesis genes and molecules. This novelty and possible bioactivity can also be assessed through evolutionary signals in producers’ genomes. Integrating genomic profiling with metabolomics can link yet unknown biosynthesis genes to novel chemistry; however, integrative omics profiling remains very challenging as the number of hypothetical gene-molecule links becomes huge for hundreds of paired genomics-metabolomics samples. Moreover, targeting the most promising molecules and fully characterizing them remains very challenging. The effective connection of existing omics workflows to analyze large datasets represents the next hurdle toward the automated connection of microbial biosynthetic machinery to their characterized molecular end-products, and producers, and evolutionary analyses that facilitate functional predictions. Here, we will build a community-supported framework that will allow researchers to perform combined genome-metabolome mining at a large scale, create effective visualizations linking unique chemistry to genes, and integrate evolutionary analysis. We will develop novel algorithms based on chemical compound class and substructure annotations to establish gene-molecule and molecule-producer links to prioritize novel chemistry and to solve molecular structures. We expect that our framework will accelerate antibiotic discovery and impact neighboring fields like synthetic biology that also increasingly rely on omics profiling.
Relevant Software Sustainability Project:
Integrating data publishing principles in scientific workflows
Advancing our understanding of molecular mechanisms of health and disease
Microbial natural products data mining by integrating genomics and metabolomics data