AID

Artificial Intelligence for DNA barcode identification

Monitoring microbial biodiversity is crucial for understanding environmental change, but species identification from large-scale microbial samples presents significant computational challenges. While traditional methods like BLAST offer accuracy, they face scalability issues when processing millions of sequences. Recent advances in deep learning provide promising solutions to these limitations. In this work, we introduce TaxoTagger, an open-source Python library for DNA. In this work, we introduce TaxoTagger, an open-source Python library for DNAbarcode identification that integrates deep learning and semantic search. TaxoTagger efficiently generates taxonomic profiles from microbial DNA sequences, offering a faster and scalable alternative to conventional tools. It supports the creation of vector databases from FASTA files and is easily extendable to various embedding models, enhancing flexibility across multiple research contexts. With TaxoTagger, we aim to improve both the speed and accuracy of species identification, enabling more efficient monitoring of microbial biodiversity in the Netherlands and around the world.

This project is significant as it directly supports ARISE, a national initiative for biodiversity monitoring in the Netherlands, providing infrastructure for species identification. It also lays the groundwork for broader applications in global biodiversity research. Through advancements in microbial metabarcoding, the project impacts researchers worldwide, enabling them to classify vast DNA datasets more efficiently and uncover rare or underrepresented species.

The project's objectives have been met, with evolving goals expanding to include the integration of transformer models for enhanced accuracy. The target audience includes biodiversity researchers, computational biologists, and organizations like ARISE and BiodiversityXL.

Post-project, we plan to publish our findings, showcase case studies, and expand the tool's user base through workshops, symposia, and social media outreach. Our call-to-action encourages researchers to adopt TaxoTagger for scalable biodiversity studies and invites collaboration to advance microbial monitoring technologies globally.

Participating organisations

CBS-KNAW Fungal Biodiversity Centre
Netherlands eScience Center
Environment & Sustainability
Environment & Sustainability
Life Sciences
Life Sciences

Output

Team

DV
Duong Vu
Lead Applicant
Westerdijk Fungal Biodiversity Institute
Niels  Drost
Programme Manager
Netherlands eScience Center
NA
Nauman Ahmed
Lead RSE
Netherlands eScience Center
Patrick Bos
Technology Lead
Netherlands eScience Center

Related software

TaxoTagger

TA

DNA barcode identification, powered by semantic searching.

Updated 3 months ago
1 1

TaxoTagger Webapp

TA

Webapp for DNA barcode identification, powered by semantic searching.

Updated 3 months ago
1