NEWSGAC

Advancing media history by transparent automatic genre classification

This project studies how genres in newspapers and television news can be detected automatically using machine learning in a transparent manner. This will enable us to capture the often hypothesized but, due to the highly time consuming nature of manual content analysis, largely understudied shift from opinion-based to fact-centred reporting. Moreover, we will open the black box of machine learning by comparing, predicting and visualizing the effects of applying various algorithms on heterogeneous data with varying quality and genre features that shift over time. This will enable scholars to do large-scale analyses of historic texts and other media types as well as critically evaluate the methodological effects of various machine learning approaches.

This project brings together expertise of journalism history scholars (RUG), specialists in data modelling, integration and analysis (CWI), digital collection experts (KB & NISV) and e-science engineers (eScience Center). It will first use a big manually annotated dataset (VIDI-project PI) to develop a transparent and reproducible approach to train an automatic classifier. Building upon this, the project will generate three outcomes:

Participating organisations

Social Sciences & Humanities
Social Sciences & Humanities
CWI
Koninklijke Bibliotheek
Netherlands eScience Center
University of Groningen

Impact

Output

Team

JvO
Jacco van Ossenbruggen
Principal investigator
Centrum Wiskunde en Informatica
MB
Marcel J. Broersma
Principal investigator
University of Groningen
AB
Aysenur Bilgin
Postdoc
Centrum Wiskunde en Informatica
Erik Tjong Kim Sang
eScience Research Engineer
Netherlands eScience Center
TK
Tom Klaver
eScience Research Engineer
Netherlands eScience Center
Jisk Attema
eScience Coordinator
Netherlands eScience Center

Related projects

The eye of the beholder

Transparent pipelines for assessing online information quality

Updated 19 months ago
In progress

Understanding visually grounded spoken language via multi-tasking

An alternative approach for intelligent systems to understand human speech

Updated 15 months ago
Finished

Uncovering Networks of Corporate Control

An interactive web-based platform to investigate the dynamics of global corporate networks

Updated 15 months ago
Finished

Inside the filter bubble

A framework for deep semantic analysis of mobile news consumption traces

Updated 19 months ago
Finished

TICCLAT

Text-induced corpus correction and lexical assessment tool

Updated 16 months ago
Finished

Automated Analysis of Online Behaviour on Social Media

Gaining insights in the use of Twitter by politicians and journalists

Updated 19 months ago
Finished

News streams

Recording history in large news streams

Updated 15 months ago
Finished

PIDIMEHS

Pillarization and depillarization tested in digitized media historical sources

Updated 16 months ago
Finished

SPuDisc

Searching public discourse

Updated 16 months ago
Finished

Related software

NEWSGAC platform

NE

Software for running the online platform of the NEWSGAC project for running explainable machine learning models on textual data

Updated 9 months ago
3