An Artificial Intelligence Approach to Comparing Text Versions

image credits: Shutterstock

Literary works are dynamic entities: they go through different stages of development before publication, and often continue to change even after their first publication. The early versions of a work, such as notes, draft manuscripts and typescripts, still show the traces of this dynamic development in the form of deletions, additions or substitutions. Today, these documents are carefully transcribed, annotated and encoded in a machine-readable language. Using text comparison tools, scholars can automatically compare the encoded text versions and examine the different stages in the work’s development. So far, however, it is not possible to include the annotations in the comparison process. This means that relevant scholarly information is lost.

The project employs machine learning technologies to develop a comparison tool that can take into account text as well as annotations. As a result, it will allow scholars to analyze the textual development at unprecedented levels of detail.

Participating organisations

Social Sciences & Humanities
Social Sciences & Humanities
Huygens Instituut
Netherlands eScience Center


Elli Bleeker
Lead Applicant
Huygens Institute for the History of the Netherlands
Ronald Haentjens Dekker
Huygens Institute for the History of the Netherlands
Niels  Drost
Niels Drost
Programme Manager
Netherlands eScience Center
Jisk Attema
Programme Manager
Netherlands eScience Center
Elena Ranguelova
Elena Ranguelova
Tech Lead
Netherlands eScience Center

Related projects

REL 2.0

Multilingual and Multipurpose Entity Linking Toolkit

Updated 2 months ago

Related software



Making harmonisation simple. Social scientists often have to compare items from different questionnaires or datasets. Harmony is a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items quickly, even in different languages.

Updated 5 months ago