LAHTeR

Leveraging AI for HTR post-correction

Image by Dave Straatmeyer

The GLOBALISE project is dedicated to automatically transcribing and analysing the ‘Overgekomen Brieven en Papieren’, a series of VOC documents sent in the 17 and 18th centuries from Batavia (Jakarta) to the Dutch Republic. While handwritten text recognition (HTR) has made tremendous advances in recent years, leading to impressive accuracy rates, the high degree of structural and orthographic variety in these documents continues to pose a significant challenge, in particular for subsequent NLP tasks. Applications such as named entity recognition and event detection are very sensitive to even small fluctuations in error rates in the transcripts, in particular when these start to rise above 5%. The aim of this project is to create a pipeline for Post-HTR error correction of historical Dutch texts.

Participating organisations

Huygens Instituut
Netherlands eScience Center
Social Sciences & Humanities
Social Sciences & Humanities

Output

Team

LP
Lodewijk Petram
Lead Applicant
Royal Netherlands Academy of Arts & Sciences (KNAW)
Jisk Attema
Programme Manager
Netherlands eScience Center
Carsten Schnober
Carsten Schnober
Lead RSE
Netherlands eScience Center

Related projects

REE-HDSC

Recognizing Extracted Entities for the Historical Database Suriname Curacao

Updated 4 months ago
In progress

Related software

htr-quality-classifier

HT

A package to determine the quality of a a digitized text, from a handwritten script or scanned print (HTR/OCR output).

Updated 9 months ago
1