Chemical Analytics Platform

Managing and exploiting growing data resources in chemical design

Image: Scott Lusher

Chemical design, like most scientific disciplines, is becoming increasingly data-intensive and dependent on our capacity to manage and exploit growing data resources. In particular, there is increasing need for drug-discovery organizations to enable decision making that is informed by the growth of their internally generated data and its integration with external data.

Data-driven chemistry (for drug-design materials science, catalysis, polymers) is dependent on researchers dealing with the growth in data and finding ways to convert these resources into better decisions. Increasing the capacity of chemists to undertake data-driven research has the potential to improve decision making in drug discovery and ensure the most benefit can be derived from the growth in data. The rapid increase in available data in the so-called Big Data era makes harnessing these resources and optimizing our research processes a prerequisite for future success.

At the core of chemical design is the “design, synthesis, testing and evaluation” cycle. Traditionally, all components of the cycle have been undertaken in the same laboratory under the control of a small team of synthetic chemists and a computational chemist as part of a multidisciplinary team. The most important task of the chemistry team is to evaluate new biological testing in the context of known chemistry rules, general and project specific models and any other available information such as protein structures. The key to successful design chemistry is the ability to balance an array of often conflicting properties as each round of design and synthesis improves the overall properties of the compound series (or at least facilitates future improvement). Design chemistry is therefore a data-driven task, with a requirement for immediate access to all available data if we want to ensure that the results of new testing truly influences the next rounds of synthesis.

Chemical data analysis workflow tools, such as KNIME, TAVERNA and PIPELINE PILOT have been implemented in most pharmaceutical companies, providing user-friendly workbenches for experts and non-experts to undertake complex data analysis tasks including machine learning, analytics and visualization. TAVERNA and KNIME are open source workflow tools, with large communities developing and sharing new functionality, providing dissemination of methods and rigorous community testing. It is now the case that even the largest commercial software providers, including Schrodinger, Tripos and CCG are providing tools (nodes and extensions) to the KNIME community.

Despite the user-friendly nature of these workflow tools, they are not trivial to manage, especially when seeking to connect with database tools or other extensions. For this reason the Dutch academic community benefits from the Netherlands eScience Center implementing an eScience platform around a workflow tool on their behalf.

This project delivers a local version of such an eScience for chemistry platform, supported by open source databases (MySQL and PostgreSQL), and connected to chemistry specific applications such as RDKit and CDK and the analytics and visualization capabilities of R based on previously described infrastructures.

Such an approach has the potential to support many aspects of data-driven chemistry, but also other disciplines as the central workflow tool KNIME (like TAVERNA and PIPELINE PILOT) is domain independent and could support projects in many other disciplines in the future.

Participating organisations

Radboud University Nijmegen
Vrije Universiteit Amsterdam
Netherlands eScience Center
Life Sciences
Life Sciences
Natural Sciences & Engineering
Natural Sciences & Engineering

Impact

Output

Team

Stefan Verhoeven
Senior eScience Research Engineer
Netherlands eScience Center
SL
Scott Lusher
eScience Coordinator
Netherlands eScience Center
GV
Gert Vriend
Principal investigator
Radboud Universiteit Nijmegen

Related projects

FAIR is as FAIR does

Integrating data publishing principles in scientific workflows

Updated 21 months ago
Finished

Enhancing Protein-Drug Binding Prediction

Combining molecular simulation and eScience technologies

Updated 1 month ago
Finished

Massive Biological Data Clustering, Reporting and Visualization Tools

Sequence validation in the DNA barcoding project

Updated 20 months ago
Finished

3D-e-Chem

Efficient exploitation of the massive amount of modern-day life science data

Updated 21 months ago
Finished

Computational Chemistry Made Easy

Bringing concepts from distributed computing and bioinformatics to the field of computational...

Updated 21 months ago
Finished

ODEX4all

Open discovery and exchange for all

Updated 21 months ago
Finished

Creation of Food Specific Ontologies for Food Focused Text Mining

Capitalizing on the growth of scientific knowledge on food

Updated 21 months ago
Finished

VLPB

The Virtual Laboratory for Plant Breeding

Updated 3 months ago
Finished

Related software

Chemical Analytics Platform

CH

Chemical Analytics Platform is a freely available Virtual Machine encompassing tools, databases, and KNIME workflows.

Updated 29 months ago
2

KLIFS KNIME nodes

KL

If you are working in the KNIME worflow platform and need data about your favorite kinase receptor ligand interaction, then these nodes are for you.

Updated 29 months ago
34 2

KNIME GPCRdb nodes

KN

A node for the KNIME workflow systems that allows you to retrieve data about your favorite G protein-coupled receptors from gpcrdb.org.

Updated 29 months ago
32 2

KNIME node archetype

KN

Want to write your own KNIME node Then use the KNIME node archetype to generate a node skeleton repository with sample code.

Updated 29 months ago
32 1

KNIME node for Kripo

KN

A node for the KNIME workflow systems that allows you to compare different binding sites in proteins with each other.

Updated 29 months ago
32 1

KNIME Python node archetype

KN

Want to write your own KNIME node wrapping a Python library. Then use the KNIME Python node archetype to generate a node skeleton repository with sample code

Updated 29 months ago
31 1

KNIME Silicos-it nodes

KN

A node for the KNIME workflow systems that allows you to use the Silicos-it software to filter or align molecules.

Updated 29 months ago
32 2