REE-HDSC

Recognizing Extracted Entities for the Historical Database Suriname Curacao

The Historical Database Suriname and Caribbean (HDSC) creates a data infrastructure of free and enslaved inhabitants of Suriname and the Dutch Antilles (1828-1950). Until recently, we relied on citizen scientists for information extraction from digital scans of historical registers and civil certificates. The project REE-HDSC explored how handwritten text recognition (HTR) using Transkribus and entity recognition using ChatGPT and regular expressions can be used to automatize and accelerate the transcription process of historical civil certificates. For our case study, we used the Curacao death certificates for the period 1831-1950.

The automatic transcription process developed in the project extracts certificate and death dates with high precision (90%), but the precision of person name extraction is much lower (33%). As the correct spelling of names is crucial for life course reconstruction, this means that the current quality of hand-written technology does not match the quality needs of the HDSC and does not allow us to fully replace human transcribers. Nevertheless, following the insights derived from REE-HDSC we integrate automatic transcription into our current crowdsourcing workflow by showing the human transcribers pre-filled entities that they need to check (dates, names, names, occupations, etc.). This pilot indicates that model integration accelerates the transcription process.

Participating organisations

Radboud University Nijmegen
Nationaal Archief Curaçao
Netherlands eScience Center
Social Sciences & Humanities
Social Sciences & Humanities

Impact

Output

Team

Matthias Rosenbaum-Feldbrügge
Matthias Rosenbaum-Feldbrügge
Lead Applicant
Radboud University Nijmegen
Björn Quanjer
Björn Quanjer
Researcher
Radboud University Nijmegen
Thunnis van Oort
Thunnis van Oort
Researcher
Radboud Univeristy Nijmegen
Coen van Galen
Coen van Galen
Project Manager
Radboud University Nijmegen
Lisa Hoek
Lisa Hoek
Data Scientist
Radboud University Nijmegen
Jisk Attema
Programme Manager
Netherlands eScience Center
Erik Tjong Kim Sang
Research Software Engineer
Netherlands eScience Center
Niels  Drost
Niels Drost
Programme Manager
Netherlands eScience Center

Related projects

LAHTeR

Leveraging AI for HTR post-correction

Updated 2 months ago
In progress

Related software

REE-HDSC

RE

Software developed in the project REE-HDSC (Recognizing Extracted Entities for the Historical Database Suriname Curacao, 2023-2024) of the Radboud University Nijmegen and the Netherlands eScience Center.

Updated 3 months ago
1