Sign in
Ctrl K

Texcavator

Facilitating and supporting large-scale text mining in the field of digital humanities

Image: Koninklijke Bibliotheek (CC License)

When addressing macro-historical questions, such as the emergence of transnational reference cultures, cultural text mining is of crucial importance. The mining of cultural aspects of entities and events in large textual repositories, such as the collection of digitized humanities newspapers provided by the National Library of the Netherlands (KB), can provide valuable insights. The ‘Translantis: Digital Humanities Approaches to Reference Cultures’ program uses text mining technologies to analyze Big Data repositories of public media. The eScience challenge here is to develop a tool for cultural text mining that enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way.

The Intelligent System Lab Amsterdam (ISLA) has developed a scalable open source text analysis service, xTAS, coupled to Elasticsearch, the scalable open source search and analytics platform, which underlies the Texcavator application that serves the specific text mining needs of the Translantis research team. xTAS will be further developed and future versions will include clustering concepts and sentiment mining of issues in public debates. Incorporating regular feedback loops will allow for iterative refinement of the analysis algorithm and extension of the current set of features.

This project aims to significantly strengthen the employment of computational methods in humanities research. Users working in interdisciplinary teams with the current tool (Texcavator) will be closely monitored to study what interface functionalities and features are desired and needed. The goal is to build a generic tool that enables fine-grained analysis of large-scale document collections, and that offers state-of-the-art visualizations to enable humanities scholars working in multidisciplinary teams to semi-automatically distinguish long-term patterns in large news media repositories.

The result will be an innovative text mining tool that is user-friendly and sustainable. Also, this project will result in a number of best practices demonstrating ways in which computational humanities can be integrated into conventional historical-interpretive approaches and vice versa. The software developed is open source and will be available to humanities scholars and social scientists to deploy in their own research.

Participating organisations

Netherlands eScience Center
Utrecht University
Social Sciences & Humanities
Social Sciences & Humanities
Huygens Instituut
University of Amsterdam

Impact

Output

  • 1.
    Texcavator - Current Setup
    Author(s): Janneke M. van der Zwaan
    Published in 2015
  • 2.
    Texcavator - Current Setup
    Published in 2015
  • 3.
    Using digitized newspaper archives to investigate identity formation in long-term public discourse
    Published in 2014
  • 4.
    Using digitized newspaper archives to investigate identity formation in long-term public discourse
    Author(s): Hieke Huistra, Toine Pieters
    Published in 2014
  • 5.
    Texcavator demo
    Author(s): Janneke van der Zwaan
    Published in 2014
  • 6.
    Formation in Long-Term Public Discourse
    Author(s): Hieke Huistra, Toine Pieters
    Published in 2014
  • 7.
    Beyond Patterns: Using Digital Methods to Find and Think about Particularities
    Author(s): Bram Mellink, Hieke Huistra
    Published in 2014
  • 8.
    Texcavator demo
    Published in 2014
  • 9.
    Beyond Patterns: Using Digital Methods to Find and Think about Particularities
    Published in 2014
  • 1.
    Optimizing ElasticSearch for Texcavator
    Author(s): Eric de Kruijf
    Published in 2015

Team

Jisk Attema
eScience Coordinator
Netherlands eScience Center
Joris van Eijnatten
Joris van Eijnatten
Principle Investigator
Utrecht University
CvdH
Charles van den Heuvel
Co-Applicant
Huygens Institute for the History of the Netherlands
TP
Toine Pieters
MdR
Maarten de Rijke
Co-Applicant
Universiteit van Amsterdam
Janneke van der Zwaan
Janneke van der Zwaan
eScience Research Engineer
Netherlands eScience Center

Related projects

EviDENce

Ego Documents Events modelling – how individuals recall mass violence

Updated 12 months ago
Finished

TICCLAT

Text-induced corpus correction and lexical assessment tool

Updated 13 months ago
Finished

GlamMap

Visual analytics for the world’s library data

Updated 12 months ago
Finished

DiLiPaD

A new approach to the history of parliamentary communication and discourse

Updated 12 months ago
Finished

Beyond the Book

Visualizing the level of international readability of works of fiction

Updated 12 months ago
Finished

Dr. Watson

Medical experts helping machines diagnose

Updated 12 months ago
Finished

Related software

Harmony

HA

Making harmonisation simple. Social scientists often have to compare items from different questionnaires or datasets. Harmony is a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items quickly, even in different languages.

Updated 2 months ago
5

Texcavator

TE

Texcavator is a search engine and text mining application for creating word cloud and time line visualizations of large text corpora.

Updated 21 months ago
7 2