When addressing macro-historical questions, such as the emergence of transnational reference cultures, cultural text mining is of crucial importance. The mining of cultural aspects of entities and events in large textual repositories, such as the collection of digitized humanities newspapers provided by the National Library of the Netherlands (KB), can provide valuable insights. The ‘Translantis: Digital Humanities Approaches to Reference Cultures’ program uses text mining technologies to analyze Big Data repositories of public media. The eScience challenge here is to develop a tool for cultural text mining that enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way.
The Intelligent System Lab Amsterdam (ISLA) has developed a scalable open source text analysis service, xTAS, coupled to Elasticsearch, the scalable open source search and analytics platform, which underlies the Texcavator application that serves the specific text mining needs of the Translantis research team. xTAS will be further developed and future versions will include clustering concepts and sentiment mining of issues in public debates. Incorporating regular feedback loops will allow for iterative refinement of the analysis algorithm and extension of the current set of features.
This project aims to significantly strengthen the employment of computational methods in humanities research. Users working in interdisciplinary teams with the current tool (Texcavator) will be closely monitored to study what interface functionalities and features are desired and needed. The goal is to build a generic tool that enables fine-grained analysis of large-scale document collections, and that offers state-of-the-art visualizations to enable humanities scholars working in multidisciplinary teams to semi-automatically distinguish long-term patterns in large news media repositories.
The result will be an innovative text mining tool that is user-friendly and sustainable. Also, this project will result in a number of best practices demonstrating ways in which computational humanities can be integrated into conventional historical-interpretive approaches and vice versa. The software developed is open source and will be available to humanities scholars and social scientists to deploy in their own research.