Literary works are dynamic entities: they go through different stages of development before publication, and often continue to change even after their first publication. The early versions of a work, such as notes, draft manuscripts and typescripts, still show the traces of this dynamic development in the form of deletions, additions or substitutions. Today, these documents are carefully transcribed, annotated and encoded in a machine-readable language. Using text comparison tools, scholars can automatically compare the encoded text versions and examine the different stages in the work’s development. So far, however, it is not possible to include the annotations in the comparison process. This means that relevant scholarly information is lost.
The project employs machine learning technologies to develop a comparison tool that can take into account text as well as annotations. As a result, it will allow scholars to analyze the textual development at unprecedented levels of detail.