Texcavator

Facilitating and supporting large-scale text mining in the field of digital humanities

Image: Koninklijke Bibliotheek (CC License)

When addressing macro-historical questions, such as the emergence of transnational reference cultures, cultural text mining is of crucial importance. The mining of cultural aspects of entities and events in large textual repositories, such as the collection of digitized humanities newspapers provided by the National Library of the Netherlands (KB), can provide valuable insights. The ‘Translantis: Digital Humanities Approaches to Reference Cultures’ program uses text mining technologies to analyze Big Data repositories of public media. The eScience challenge here is to develop a tool for cultural text mining that enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way.

The Intelligent System Lab Amsterdam (ISLA) has developed a scalable open source text analysis service, xTAS, coupled to Elasticsearch, the scalable open source search and analytics platform, which underlies the Texcavator application that serves the specific text mining needs of the Translantis research team. xTAS will be further developed and future versions will include clustering concepts and sentiment mining of issues in public debates. Incorporating regular feedback loops will allow for iterative refinement of the analysis algorithm and extension of the current set of features.

This project aims to significantly strengthen the employment of computational methods in humanities research. Users working in interdisciplinary teams with the current tool (Texcavator) will be closely monitored to study what interface functionalities and features are desired and needed. The goal is to build a generic tool that enables fine-grained analysis of large-scale document collections, and that offers state-of-the-art visualizations to enable humanities scholars working in multidisciplinary teams to semi-automatically distinguish long-term patterns in large news media repositories.

The result will be an innovative text mining tool that is user-friendly and sustainable. Also, this project will result in a number of best practices demonstrating ways in which computational humanities can be integrated into conventional historical-interpretive approaches and vice versa. The software developed is open source and will be available to humanities scholars and social scientists to deploy in their own research.

Participating organisations

Social Sciences & Humanities

Impact

1.
Published in Benjamins Translation Library by John Benjamins Publishing Company in 2020
10.1075/btl.155

1.
Author(s): Nina Tahmasebi, Lars Borin, Adam Jatowt
Published by arXiv in 2018
10.48550/arxiv.1811.06278

Output

1.
Author(s): Martijn van der Klis, Fons Laan, Janneke M. van der Zwaan, Julian Gonggrijp, Lars Buitinck, Patrick Bos, José de Kruif, Mario Sassmann
Published by Zenodo in 2018
10.5281/zenodo.1442760

1.
Author(s): Verheul, Jaap, and Toine Pieters
Published in Digital Humanities 2014 by DH Archive in 2014, page: 299-301
Lausanne

1.
Author(s): Joris van Eijnatten, Toine Pieters, Jaap Verheul
Published in Tijdschrift voor Tijdschriftstudies by Portico in 2014, page: 59
10.18352/ts.303

1.
Published in 2015
2.
Author(s): Janneke M. van der Zwaan
Published in 2015
3.
Published in 2014
4.
Author(s): Janneke van der Zwaan
Published in 2014
5.
Author(s): Hieke Huistra, Toine Pieters
Published in 2014
6.
Author(s): Bram Mellink, Hieke Huistra
Published in 2014
7.
Author(s): Hieke Huistra, Toine Pieters
Published in 2014
8.
Published in 2014
9.
Published in 2014

1.
Author(s): Eric de Kruijf
Published in 2015

Team

Contact person

Jisk Attema

eScience Coordinator

Netherlands eScience Center

0000-0002-0948-1176 Mail Jisk

Jisk Attema

eScience Coordinator

Netherlands eScience Center

0000-0002-0948-1176

Joris van Eijnatten

Principle Investigator

Utrecht University

0000-0002-8865-0002

CvdH

Charles van den Heuvel

Co-Applicant

Huygens Institute for the History of the Netherlands

0000-0001-9638-400X

Toine Pieters

Co-Applicant

Universiteit Utrecht

0000-0002-8156-8436

MdR

Maarten de Rijke

Co-Applicant

Universiteit van Amsterdam

0000-0002-1086-0202

Janneke van der Zwaan

eScience Research Engineer

Netherlands eScience Center

0000-0002-8329-7000

Related projects

EviDENce

Ego Documents Events modelling – how individuals recall mass violence

Updated 39 months ago

Finished

TICCLAT

Text-induced corpus correction and lexical assessment tool

Updated 40 months ago

Finished

GlamMap

Visual analytics for the world’s library data

Updated 40 months ago

Finished

DiLiPaD

A new approach to the history of parliamentary communication and discourse

Updated 40 months ago

Finished

Beyond the Book

Visualizing the level of international readability of works of fiction

Updated 39 months ago

Finished

Dr. Watson

Medical experts helping machines diagnose

Updated 40 months ago

Finished

Related software

Harmony

Making harmonisation simple. Social scientists often have to compare items from different questionnaires or datasets. Harmony is a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items quickly, even in different languages.

Updated 29 months ago

Texcavator

Texcavator is a search engine and text mining application for creating word cloud and time line visualizations of large text corpora.

Updated 25 months ago

14 2

Texcavator

Participating organisations

Impact

Books1

Book section2

Journal articles2

Other1

Output

Computer programs1

Conference papers1

Journal articles1

Presentations9

Thesis1

Team

Contact person

Jisk Attema

Related projects

EviDENce

TICCLAT

GlamMap

DiLiPaD

Beyond the Book

Dr. Watson

Related software

Harmony

Texcavator