Ctrl K

Software Projects Organisations Communities News

Ctrl K

Limited functionality: Your browser does not support JavaScript.

NEWSGAC

Advancing media history by transparent automatic genre classification

This project studies how genres in newspapers and television news can be detected automatically using machine learning in a transparent manner. This will enable us to capture the often hypothesized but, due to the highly time consuming nature of manual content analysis, largely understudied shift from opinion-based to fact-centred reporting. Moreover, we will open the black box of machine learning by comparing, predicting and visualizing the effects of applying various algorithms on heterogeneous data with varying quality and genre features that shift over time. This will enable scholars to do large-scale analyses of historic texts and other media types as well as critically evaluate the methodological effects of various machine learning approaches.

This project brings together expertise of journalism history scholars (RUG), specialists in data modelling, integration and analysis (CWI), digital collection experts (KB & NISV) and e-science engineers (eScience Center). It will first use a big manually annotated dataset (VIDI-project PI) to develop a transparent and reproducible approach to train an automatic classifier. Building upon this, the project will generate three outcomes:

Participating organisations

Social Sciences & Humanities

Social Sciences & Humanities

Centrum Wiskunde & Informatica

Koninklijke Bibliotheek

Netherlands eScience Center

University of Groningen

Impact

1.
Author(s): Netherlands eScience Center
Published in Medium by Medium in 2018, page: 1

1.
Author(s): Melvin Wevers
Published in Digitised Newspapers – A New Eldorado for Historians? by De Gruyter in 2022, page: 227-252
10.1515/9783110729214-011

1.
Author(s): Brunna de Sousa Pereira Amorim, André Luiz Firmino Alves, Maxwell Guimarães de Oliveira, Cláudio de Souza Baptista
Published in Proceedings of the 24th Brazilian Symposium on Multimedia and the Web by ACM in 2018, page: 245-252
10.1145/3243082.3243113

1.
Author(s): Denis Jouvet, David Langlois, Mohamed Amine Menacer, Dominique Fohr, Odile Mella, Kamel Smaïli
Published in 2018

Output

1.
Author(s): Kim Smeenk, Aysenur Bilgin, Tom Klaver, Erik Tjong Kim Sang, Laura Hollink, Jacco van Ossenbruggen, Frank Harbers, Marcel Broersma, Kim Smeenk
Published by DataverseNL in 2019
10.34894/5kwjub

1.
Author(s): Faton Rekathati
Published in 2020

Team

Contact person

Erik Tjong Kim Sang

Erik Tjong Kim Sang

eScience Research Engineer

Netherlands eScience Center

0000-0002-8431-081X

JvO

Jacco van Ossenbruggen

Principal investigator

Centrum Wiskunde en Informatica

0000-0002-7748-4715

MB

Marcel J. Broersma

Principal investigator

University of Groningen

0000-0002-7342-3472

AB

Aysenur Bilgin

Postdoc

Centrum Wiskunde en Informatica

0000-0002-6225-9953

Erik Tjong Kim Sang

Erik Tjong Kim Sang

eScience Research Engineer

Netherlands eScience Center

0000-0002-8431-081X

TK

Tom Klaver

eScience Research Engineer

Netherlands eScience Center

0000-0001-9411-2107

Jisk Attema

eScience Coordinator

Netherlands eScience Center

0000-0002-0948-1176

Related projects

The eye of the beholder

Transparent pipelines for assessing online information quality

Updated 35 months ago

Understanding visually grounded spoken language via multi-tasking

An alternative approach for intelligent systems to understand human speech

Updated 31 months ago

Uncovering Networks of Corporate Control

An interactive web-based platform to investigate the dynamics of global corporate networks

Updated 9 months ago

Inside the filter bubble

A framework for deep semantic analysis of mobile news consumption traces

Updated 35 months ago

TICCLAT

Text-induced corpus correction and lexical assessment tool

Updated 32 months ago

Automated Analysis of Online Behaviour on Social Media

Gaining insights in the use of Twitter by politicians and journalists

Updated 35 months ago

News streams

Recording history in large news streams

Updated 31 months ago

PIDIMEHS

Pillarization and depillarization tested in digitized media historical sources

Updated 31 months ago

SPuDisc

Searching public discourse

Updated 32 months ago

Related software

NEWSGAC platform

Software for running the online platform of the NEWSGAC project for running explainable machine learning models on textual data

Updated 25 months ago