Ctrl K

Rags2Riches

From rags to riches: A pipeline for processing semi-structured handwritten texts

Wealth inequality is a major socio-economic challenge. Yet long-term empirical evidence, particularly for the Netherlands, remains limited. This project unlocks a unique historical source: Dutch inheritance tax records known as the Memories van Successie. Hundreds of thousands of these semi-structured handwritten documents contain detailed accounts of deceased individuals’ assets, offering rich material for analyzing wealth patterns.

Manually processing these records is infeasible. Existing Handwritten Text Recognition (HTR) tools struggle with their structure. The Rags2Riches project leverages recent advances in Document AI to fine-tune models on a curated sample of inheritance records containing detailed transcriptions and classified assets. The resulting pipeline will automatically read, interpret, and structure these scans into research-ready data.

The project will generate the most extensive historical dataset on wealth holdings to date and deliver a reusable framework for extracting structured data from complex handwritten sources; an increasingly vital process given the rapid growth of digitized historical documents.

Team

Angel Daza
Angel Daza
Lead Engineer
Netherlands eScience Center
ASS
Research Software Engineer
Netherlands eScience Center