xtas
the eXtensible Text Analysis Suite
Cite this software
Description
- easy access to numerous text processing and analysis tools
- full support for Dutch and English
- can use Elasticsearch for document storage
- can be run as a service
xtas is a collection of natural language processing and text mining tools, brought together in a single software package with built-in distributed computing and support for the Elasticsearch document store.
xtas functionality consists partly of wrappers for existing packages, with automatic installation of software and data; and partly of custom-built modules coming out of research. Currently offered are various parsers for Dutch and English (Alpino, CoreNLP, Frog, Semafor), named entity recognizers (Frog, Stanford and custom-built ones), a temporal expression tagger (Heideltime) and a sentiment tagger based on SentiWords.
A basic installation of xtas works like a Python module. Built-in package management and a simple, uniform interface take away the hassle of installing, configuring and using many existing NLP tools.
xtas’s open architecture makes it possible to include custom code, run this in a distributed fashion and have it communicate with Elasticsearch to provide document storage and retrieval.
Participating organisations
Reference papers
Mentions
- 1.Author(s): Damian Trilling, Bob Van De Velde, Anne C. Kroon, Felicia Locherbach, Theo Araujo, Joanna Strycharz, Tamara Raats, Lisa De Klerk, Jeroen G.F. JonkmanPublished in 2018 IEEE 14th International Conference on e-Science (e-Science) by IEEE in 2018, page: 329-33010.1109/escience.2018.00078
- 2.Author(s): Olga Uryupina, Barbara Plank, Gianni Barlacchi, Francisco J Valverde-Albacete, Manos Tsagkias, Antonio Uva, Alessandro MoschittiPublished in Proceedings of ACL-2016 System Demonstrations by Association for Computational Linguistics in 2016, page: 157-16210.18653/v1/p16-4027
- 1.Author(s): Kim Schouten, Flavius Frasincar, Rommert Dekker, Mark RiezebosPublished in Expert Systems with Applications by Elsevier BV in 2019, page: 68-8410.1016/j.eswa.2019.03.005
- 2.Author(s): Damian Trilling, Jeroen G. F. JonkmanPublished in Communication Methods and Measures by Informa UK Limited in 2018, page: 158-17410.1080/19312458.2018.1447655
- 1.Published in 2016
Contributors
Contact person
Related projects
SPuDisc
Searching public discourse