Sign in
Ctrl K


Searching public discourse

Image: Erik Tjallinks (CC License)

Research into our culture is about understanding conversations and debates as forms of public discourse. New ways of studying culture are inspired by the availability of massive digital collections: growing repositories of old (for example centuries worth of newspapers, radio, and television archives) and new media (for example blogs, twitter streams, and discussion forums). Scholars studying culture are beginning to engage with data-intensive research methods. Digital humanities, computational humanities, e-humanities – no matter what label is being used, we are observing a dramatic shift from data-poor to data-intensive research, a shift that generates unique challenges for today’s search and text mining technology.

How can we support a data-intensive research cycle in cultural studies? Building on existing open source tooling, the project will follow an iterative, task-based approach to creating key search, analysis, and visualization solutions for studying public discourses within a humanities context. The development will be guided by three complementary use cases. In one, public discourse is driven by a mixture of scientific notions and notions of the good life, in the second by differences in experience and valuation of films, and in the third by social issues related to law and order.

Use case 1 focuses on genetics and eugenics. This discourse involves the changing significance attached to nature over nurture, and the collective, yet varying images of “the good life”. The use case deserves analysis in terms of continuities and discontinuities. We focus on the early twentieth century, when the main polarities of this debate were articulated, and the early twenty-first century, when discussions on medical genetics are overshadowed by the “specter of eugenics,” the fear that genetics will lead to control over sexual reproduction.

Use case 2 is based on the fact that film titles can provoke different emotions in viewers depending on preferences, past experiences, and values. In public discussions people use the emotions they had when viewing to argue a film’s value. In analyzing the discussions on forums and in reviews we discern a variety and variability over time and relations with valuations may result in rich and possibly new categorizations of films, genres, and discourse positions.

Use case 3 focuses on drugs, drug trafficking and drug users in the early twentieth century and early twenty-first century. In both eras the public view on drugs alternates between medical and social aspects. SPuDisc will allow for associations, longitudinal search and comparisons that enable researchers to analyze the mechanisms of a swinging pendulum in public discourse.

The specific developments are centered on the realization of technologies for normalizing expressions, detecting semantic shifts in language usage patterns, exploiting multi-linguality to aid in understanding public discourse and explicating different perspectives in discussion around a given issue. The outcomes will be incorporated in dedicated interfaces for two key phases in humanities research: exploration and contextualisation.

Participating organisations

Netherlands eScience Center
Utrecht University
Social Sciences & Humanities
Social Sciences & Humanities



  • 1.
    Xtas 3, the eXtensible Text Analysis Suite
    Author(s): Lars Buitinck, Maarten de Rijke
    Published in 2016
  • 2.
    Analyzing emotional discourse on film. Emotion sharing by film viewers.
    Author(s): E Tan, Lars Buitinck, Maarten de Rijke, J van Amerongen, C Rodriguez-Hidalgo
    Published in 2015


Jisk Attema
eScience Coordinator
Netherlands eScience Center
Lars Buitinck
eScience Research Engineer
Netherlands eScience Center
Maarten de Rijke
Principal investigator
Utrecht University

Related projects

Uncovering Networks of Corporate Control

An interactive web-based platform to investigate the dynamics of global corporate networks

Updated 14 months ago

Inside the filter bubble

A framework for deep semantic analysis of mobile news consumption traces

Updated 18 months ago


Advancing media history by transparent automatic genre classification

Updated 15 months ago


Text-induced corpus correction and lexical assessment tool

Updated 15 months ago


Pillarization and depillarization tested in digitized media historical sources

Updated 15 months ago

Related software



the eXtensible Text Analysis Suite

Updated 23 months ago
1 2