News streams

Recording history in large news streams

The goal of this project is to design the optimal architecture for processing as many daily news items as fast as possible using the deepest semantic processing that is currently available in Natural Language Processing (NLP), so called deep reading. The complex and diverse technology developed in the European NewsReader project will be optimized given the infrastructure provided by EYR, exploiting the optimal capacity in a jungle architecture. The project will result in parallelized NLP pipelines, involving a large variety of software, it will provide knowledge on what it takes to process the daily batch of news that comes in every working day (estimated on 2 million items), but it also result in knowledge on how rich, complex and dynamic information streams of news really are. The installation can be exploited by researchers, historians, politicians, journalists and the general public using a visualization component specifically designed to handle complex and dynamic Big Data. The project will set up experiments to find the optimal processing installation and next test this system for processing the daily news stream for a year. All software components and resources are available through open-source licenses and are developed according to the latest standards to provide maximal sustainability.

Participating organisations

Vrije Universiteit Amsterdam
Netherlands eScience Center
Fondazione Bruno Kessler
University of the Basque Country
LexisNexis
Social Sciences & Humanities
Social Sciences & Humanities
SynerScope

Output

Team

PV
Piek Vossen
Principle Investigator
Vrije Universiteit Amsterdam
Elena Ranguelova
Elena Ranguelova
eScience Coordinator
Netherlands eScience Center
Stefan Verhoeven
eScience Research Engineer
Netherlands eScience Center

Related projects

EviDENce

Ego Documents Events modelling – how individuals recall mass violence

Updated 19 months ago
Finished

NEWSGAC

Advancing media history by transparent automatic genre classification

Updated 20 months ago
Finished