Sign in

Scaling up pangenomics for plant breeding

Delivering a pangenome approach that drastically improves the analytical power on plant data

Modern plant research, as other disciplines in biology, isgradually being transformed to a data-driven endeavor. One of the main drivers of this development is the continuous reduction in DNA sequencing costs; reconstructing the complete genome of a plant from short DNA sequences or finding genetic variants with respect to a reference genome are applications where large amounts of sequencing data are generated and applied to study plants and to accelerate and improve breeding.

Traditional approaches to compare genomes, centered on a single reference, no longer suffice and therefore the field of genomicsis switching to so-called pangenome approaches. Several novel graph-based data structures and algorithms are under development, but none of these can handle the numbers of large plant genomes required in modern research and in applications in plant breeding.

In this project, we will improve the scalability of a promising pangenome approach, called PanTools, using eScience technologies. We will specifically address bottlenecks in pangenome construction and analytics, based on a number of predefined use cases in plant genomics. Major performance improvements are expected from the integration of Spark technology and our sophisticated graph-based pangenome. This project will deliver the first pangenome approach that can handle the big data in plant genomics and will drastically improve the analytical power on plant data.

Participating organisations

SURF
Life Sciences
Life Sciences
Netherlands eScience Center
Wageningen University & Research

Team

SS
Sandra Smit
Principal investigator
Wageningen University and Research
Pablo Lopez-Tarifa
Pablo Lopez-Tarifa
Programme Manager
Netherlands eScience Center
Thijs van Lankveld
Thijs van Lankveld
Lead RSE
Netherlands eScience Center
Nauman Ahmed
Nauman Ahmed
RSE
Netherlands eScience Center

Related projects

A new perspective on global vegetation water dynamics from radar satellite data

Updated 2 months ago
Running

PADRE - The PetaFLOP AARTFAAC Data-Reduction Engine

Improving the AARTFAAC processing pipeline

Updated 6 months ago
Finished

DarkGenerators

Interpretable large scale deep generative models for Dark Matter searches

Updated 10 months ago
Finished

RETURN - Monitoring tropical forest recovery capacity using RADAR Sentinel satellite data

Demonstrating the potential of European Sentinel satellite data

Updated 10 months ago
Finished

eEcoLiDAR

eScience infrastructure for ecological applications of LiDAR point clouds

Updated 10 months ago
Finished

Blue-Action

Arctic impact on weather and climate

Updated 7 months ago
Finished

MAGIC

Metrics and Access to Global Indices for Climate Projections

Updated 10 months ago
Finished

Towards a species-by-species approach to global biodiversity modelling

The current decline of global biodiversity

Updated 6 months ago
Finished

PRIMAVERA

Process-based climate simulation: advances in high-resolution modelling and European climate risk...

Updated 5 months ago
Finished

Improving Open-Source Photogrammetric Workflows for Processing Big Datasets

Processing large datasets on consumer-grade computers

Updated 6 months ago
Finished

ERA-URBAN

Environmental re-analysis of urban areas: quantifying high-resolution energy and water budgets of...

Updated 6 months ago
Finished