The goal of this project is to design the optimal architecture for processing as many daily news items as fast as possible using the deepest semantic processing that is currently available in Natural Language Processing (NLP), so called deep reading. The complex and diverse technology developed in the European NewsReader project will be optimized given the infrastructure provided by EYR, exploiting the optimal capacity in a jungle architecture. The project will result in parallelized NLP pipelines, involving a large variety of software, it will provide knowledge on what it takes to process the daily batch of news that comes in every working day (estimated on 2 million items), but it also result in knowledge on how rich, complex and dynamic information streams of news really are. The installation can be exploited by researchers, historians, politicians, journalists and the general public using a visualization component specifically designed to handle complex and dynamic Big Data. The project will set up experiments to find the optimal processing installation and next test this system for processing the daily news stream for a year. All software components and resources are available through open-source licenses and are developed according to the latest standards to provide maximal sustainability.
News streams
Recording history in large news streams
Participating organisations
Output
Team
Contact person
Stefan Verhoeven
eScience Research Engineer
Netherlands eScience Center
0000-0002-5821-2060
Mail StefanPV
Piek Vossen
Related projects
EviDENce
Ego Documents Events modelling – how individuals recall mass violence
Updated 31 months ago
Finished
NEWSGAC
Advancing media history by transparent automatic genre classification
Updated 32 months ago
Finished