htr-quality-classifier
A package to determine the quality of a a digitized text, from a handwritten script or scanned print (HTR/OCR output).
Leveraging AI for HTR post-correction
The GLOBALISE project is dedicated to automatically transcribing and analysing the ‘Overgekomen Brieven en Papieren’, a series of VOC documents sent in the 17 and 18th centuries from Batavia (Jakarta) to the Dutch Republic. While handwritten text recognition (HTR) has made tremendous advances in recent years, leading to impressive accuracy rates, the high degree of structural and orthographic variety in these documents continues to pose a significant challenge, in particular for subsequent NLP tasks. Applications such as named entity recognition and event detection are very sensitive to even small fluctuations in error rates in the transcripts, in particular when these start to rise above 5%. The aim of this project is to create a pipeline for Post-HTR error correction of historical Dutch texts.
Recognizing Extracted Entities for the Historical Database Suriname Curacao
A package to determine the quality of a a digitized text, from a handwritten script or scanned print (HTR/OCR output).