REE-HDSC
Software developed in the project REE-HDSC (Recognizing Extracted Entities for the Historical Database Suriname Curacao, 2023-2024) of the Radboud University Nijmegen and the Netherlands eScience Center.
Recognizing Extracted Entities for the Historical Database Suriname Curacao
The Historical Database Suriname and Caribbean (HDSC) creates a data infrastructure of free and enslaved inhabitants of Suriname and the Dutch Antilles (1828-1950). Until recently, we relied on citizen scientists for information extraction from digital scans of historical registers and civil certificates. The project REE-HDSC explored how handwritten text recognition (HTR) using Transkribus and entity recognition using ChatGPT and regular expressions can be used to automatize and accelerate the transcription process of historical civil certificates. For our case study, we used the Curacao death certificates for the period 1831-1950.
The automatic transcription process developed in the project extracts certificate and death dates with high precision (90%), but the precision of person name extraction is much lower (33%). As the correct spelling of names is crucial for life course reconstruction, this means that the current quality of hand-written technology does not match the quality needs of the HDSC and does not allow us to fully replace human transcribers. Nevertheless, following the insights derived from REE-HDSC we integrate automatic transcription into our current crowdsourcing workflow by showing the human transcribers pre-filled entities that they need to check (dates, names, names, occupations, etc.). This pilot indicates that model integration accelerates the transcription process.
Leveraging AI for HTR post-correction
Software developed in the project REE-HDSC (Recognizing Extracted Entities for the Historical Database Suriname Curacao, 2023-2024) of the Radboud University Nijmegen and the Netherlands eScience Center.