Skip to main content
Ctrl K

REE-HDSC

Recognizing Extracted Entities for the Historical Database Suriname Curacao

The Historical Database Suriname and Caribbean (HDSC) creates a data infrastructure of free and enslaved inhabitants of Suriname and the Dutch Antilles (1828-1950). Until recently, we relied on citizen scientists for information extraction from digital scans of historical registers and civil certificates. The project REE-HDSC explored how handwritten text recognition (HTR) using Transkribus and entity recognition using ChatGPT and regular expressions can be used to automatize and accelerate the transcription process of historical civil certificates. For our case study, we used the Curacao death certificates for the period 1831-1950.

The automatic transcription process developed in the project extracts certificate and death dates with high precision (90%), but the precision of person name extraction is much lower (33%). As the correct spelling of names is crucial for life course reconstruction, this means that the current quality of hand-written technology does not match the quality needs of the HDSC and does not allow us to fully replace human transcribers. Nevertheless, following the insights derived from REE-HDSC we integrate automatic transcription into our current crowdsourcing workflow by showing the human transcribers pre-filled entities that they need to check (dates, names, names, occupations, etc.). This pilot indicates that model integration accelerates the transcription process.

Participating organisations

Radboud University Nijmegen
Nationaal Archief Curaçao
Netherlands eScience Center
Social Sciences & Humanities
Social Sciences & Humanities

Impact

Output

Team

Contact person

Erik Tjong Kim Sang

Erik Tjong Kim Sang

Research Software Engineer
Netherlands eScience Center
0000-0002-8431-081XMail Erik
Matthias Rosenbaum-Feldbrügge
Matthias Rosenbaum-Feldbrügge
Lead Applicant
Radboud University Nijmegen
0000-0002-5082-6850
Björn Quanjer
Björn Quanjer
Researcher
Radboud University Nijmegen
0000-0003-2492-7380
Thunnis van Oort
Thunnis van Oort
Researcher
Radboud Univeristy Nijmegen
0000-0001-8912-0508
Coen van Galen
Coen van Galen
Project Manager
Radboud University Nijmegen
0000-0003-1423-0686
Lisa Hoek
Lisa Hoek
Data Scientist
Radboud University Nijmegen
0000-0002-6741-3585
Jisk Attema
Programme Manager
Netherlands eScience Center
0000-0002-0948-1176
Erik Tjong Kim Sang
Research Software Engineer
Netherlands eScience Center
0000-0002-8431-081X
Niels  Drost
Programme Manager
Netherlands eScience Center
0000-0001-9795-7981

Related projects

Rags2Riches

From rags to riches: A pipeline for processing semi-structured handwritten texts

Updated 1 month ago
In progress

LAHTeR

Leveraging AI for HTR post-correction

Updated 17 months ago
Finished

Related software

REE-HDSC

RE

Software developed in the project REE-HDSC (Recognizing Extracted Entities for the Historical Database Suriname Curacao, 2023-2024) of the Radboud University Nijmegen and the Netherlands eScience Center.

Updated 7 months ago
1