‘Sustainability’ has many different meanings. Most entrepreneurs think differently about it than, for example, environmentalists. Likewise, sustainability did not have the same connotation in 1987 as it does now. Historians are generally interested in nuances like these, because they tell a great deal about changing attitudes, beliefs, and concerns. Language models based on transformer architecture bear great potential for this type of historical research, because they are able to study language on the most detailed level. The use of language models for history is hampered, however, by the fact that existing state-of-the-art language models are mostly trained on present-day data. This project, therefore, centered around the prerequisites for language models – transformers – to be used in a sound and critical manner in historical research. It developed a pipeline for fine-tuning a Dutch RoBERTa-based language model on historical data to visualize meaning change over time and between different actors or domains. By doing so, the project managed to develop important guidelines for historical (conceptual) research with the help of language models. These are crucial for a field that lacks broad experience with state-of-the-art language models.
The initial ambition of the project was to train a transformer on historical textual (20th century) data from scratch. Due to bottlenecks in computing power and manpower and the uncertain surplus value above finetuning existing models this was reoriented towards using an existing sentence transformer, which guaranteed the implementation of contextualized historical information. The project developed a two-step pipeline, first for creating databases of custom (historical) data based on the sentence transformer, and second for exploring semantic change over time by generating various visualizations based on these databases. These can be found on the project’s GitHub repository: https://github.com/Semantics-of-Sustainability/tempo-embeddings, while an example of the use of the pipelines can be found in two blog posts on the project website: https://semantics-of-sustainability.github.io/Website/news/index.html.