Ctrl K

The Semantics of Sustainability

Historicizing language models to study the conceptual history of sustainability in the Netherlands

image credit: shutterstock

‘Sustainability’ has many different meanings. Most entrepreneurs think differently about it than, for example, environmentalists. Likewise, sustainability did not have the same connotation in 1987 as it does now. Historians are generally interested in nuances like these, because they tell a great deal about changing attitudes, beliefs, and concerns. Language models based on transformer architecture bear great potential for this type of historical research, because they are able to study language on the most detailed level. The use of language models for history is hampered, however, by the fact that existing state-of-the-art language models are mostly trained on present-day data. This project, therefore, centered around the prerequisites for language models – transformers – to be used in a sound and critical manner in historical research. It developed a pipeline for fine-tuning a Dutch RoBERTa-based language model on historical data to visualize meaning change over time and between different actors or domains. By doing so, the project managed to develop important guidelines for historical (conceptual) research with the help of language models. These are crucial for a field that lacks broad experience with state-of-the-art language models.

The initial ambition of the project was to train a transformer on historical textual (20th century) data from scratch. Due to bottlenecks in computing power and manpower and the uncertain surplus value above finetuning existing models this was reoriented towards using an existing sentence transformer, which guaranteed the implementation of contextualized historical information. The project developed a two-step pipeline, first for creating databases of custom (historical) data based on the sentence transformer, and second for exploring semantic change over time by generating various visualizations based on these databases. These can be found on the project’s GitHub repository: https://github.com/Semantics-of-Sustainability/tempo-embeddings, while an example of the use of the pipelines can be found in two blog posts on the project website: https://semantics-of-sustainability.github.io/Website/news/index.html.

Participating organisations

Social Sciences & Humanities
Social Sciences & Humanities
Koninklijke Bibliotheek
Netherlands eScience Center
Utrecht University

Output

Team

PH
Pim Huijnen
HQ
Hugo Quené
co-Applicant
Center for Digital Humanties, Utrecht University
MK
Martijn Kleppe
co-Applicant
National Library of the Netherlands
LV
Leonardo Vida
Research Engineer
Utrecht University
Carsten Schnober
Carsten Schnober
Lead RSE
Netherlands eScience Center
Jisk Attema
Programme Manager
Netherlands eScience Center
Angel Daza
Angel Daza
eScience Research Software Engineer
eScience Center