CollAIte

An Artificial Intelligence Approach to Comparing Text Versions

image credits: Shutterstock

Literary works are dynamic entities: they go through different stages of development before publication, and often continue to change even after their first publication. The early versions of a work, such as notes, draft manuscripts and typescripts, still show the traces of this dynamic development in the form of deletions, additions or substitutions. Today, these documents are carefully transcribed, annotated and encoded in a machine-readable language. Using text comparison tools, scholars can automatically compare the encoded text versions and examine the different stages in the work’s development. So far, however, it is not possible to include the annotations in the comparison process. This means that relevant scholarly information is lost.

The project employs machine learning technologies to develop a comparison tool that can take into account text as well as annotations. As a result, it will allow scholars to analyze the textual development at unprecedented levels of detail.

Participating organisations

Social Sciences & Humanities
Social Sciences & Humanities
Huygens Instituut
Netherlands eScience Center

Testimonials

I think it’s safe to say that this is one of the prettiest visualizations the field has seen
Elli Bleeker, Project Lead Applicant

Team

EB
Elli Bleeker
Lead Applicant
Huygens Institute for the History of the Netherlands
RHD
Ronald Haentjens Dekker
Co-Applicant
Huygens Institute for the History of the Netherlands
Niels  Drost
Niels Drost
Programme Manager
Netherlands eScience Center
Jisk Attema
Programme Manager
Netherlands eScience Center
Elena Ranguelova
Elena Ranguelova
Tech Lead
Netherlands eScience Center

Related projects

REL 2.0

Multilingual and Multipurpose Entity Linking Toolkit

Updated 6 months ago
Finished

Related software

Harmony

HA

Making harmonisation simple. Social scientists often have to compare items from different questionnaires or datasets. Harmony is a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items quickly, even in different languages.

Updated 9 months ago
5