Wealth inequality is a major socio-economic challenge. Yet long-term empirical evidence, particularly for the Netherlands, remains limited. This project unlocks a unique historical source: Dutch inheritance tax records known as the Memories van Successie. Hundreds of thousands of these semi-structured handwritten documents contain detailed accounts of deceased individuals’ assets, offering rich material for analyzing wealth patterns.
Manually processing these records is infeasible. Existing Handwritten Text Recognition (HTR) tools struggle with their structure. The Rags2Riches project leverages recent advances in Document AI to fine-tune models on a curated sample of inheritance records containing detailed transcriptions and classified assets. The resulting pipeline will automatically read, interpret, and structure these scans into research-ready data.
The project will generate the most extensive historical dataset on wealth holdings to date and deliver a reusable framework for extracting structured data from complex handwritten sources; an increasingly vital process given the rapid growth of digitized historical documents.