TaxoTagger
DNA barcode identification, powered by semantic searching.
Artificial Intelligence for DNA barcode identification
Monitoring microbial biodiversity is crucial for understanding environmental change, but species identification from large-scale microbial samples presents significant computational challenges. While traditional methods like BLAST offer accuracy, they face scalability issues when processing millions of sequences. Recent advances in deep learning provide promising solutions to these limitations. In this work, we introduce TaxoTagger, an open-source Python library for DNA. In this work, we introduce TaxoTagger, an open-source Python library for DNAbarcode identification that integrates deep learning and semantic search. TaxoTagger efficiently generates taxonomic profiles from microbial DNA sequences, offering a faster and scalable alternative to conventional tools. It supports the creation of vector databases from FASTA files and is easily extendable to various embedding models, enhancing flexibility across multiple research contexts. With TaxoTagger, we aim to improve both the speed and accuracy of species identification, enabling more efficient monitoring of microbial biodiversity in the Netherlands and around the world.
This project is significant as it directly supports ARISE, a national initiative for biodiversity monitoring in the Netherlands, providing infrastructure for species identification. It also lays the groundwork for broader applications in global biodiversity research. Through advancements in microbial metabarcoding, the project impacts researchers worldwide, enabling them to classify vast DNA datasets more efficiently and uncover rare or underrepresented species.
The project's objectives have been met, with evolving goals expanding to include the integration of transformer models for enhanced accuracy. The target audience includes biodiversity researchers, computational biologists, and organizations like ARISE and BiodiversityXL.
Post-project, we plan to publish our findings, showcase case studies, and expand the tool's user base through workshops, symposia, and social media outreach. Our call-to-action encourages researchers to adopt TaxoTagger for scalable biodiversity studies and invites collaboration to advance microbial monitoring technologies globally.
DNA barcode identification, powered by semantic searching.
Webapp for DNA barcode identification, powered by semantic searching.