Towards next-generation scientific computing tools for diversity-aware language science and technology

Shutterstock 2163461429

Most methods in NLP and linguistics are geared towards dealing with text, not talk. Our research aims to change this by demonstrating the importance of linguistically diverse conversational data (audio+annotations). Our central question is: how can we use computational tools to make the language sciences conversation-ready? This is the next frontier for enabling quantitative approaches to conversational structure and for creating diversity-aware language technology. To get there, we combine methods from comparative linguistics, computational modelling and data science. Our key purposes are to enable broad curation, rapid exploration and rich visualization, as showcased in our 2022 ACL, LREC and Interspeech papers.

Participating organisations

Radboud University Nijmegen
Netherlands eScience Center
Social Sciences & Humanities
Social Sciences & Humanities

Team

MD
Mark Dingemanse
Jisk Attema
Programme Manager
Netherlands eScience Center
AL
Andreas Liesenfeld
BV
Barbara Vreede
eScience Research Engineer
Netherlands eScience Center
EV
Eva Viviani
eScience Research Engineer
Netherlands eScience Center

Related software

Harmony

HA

Making harmonisation simple. Social scientists often have to compare items from different questionnaires or datasets. Harmony is a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items quickly, even in different languages.

Updated 3 weeks ago
5

scikit-talk

SC

Scikit-talk is an open-source toolkit for processing collections of real-world conversational speech in Python. The toolkit aims to facilitate the exploration of large collections of transcriptions and annotations of conversational interaction.

Updated 1 month ago
7

talkr

TA

{talkr} is an R package that offers a set of convenience functions for quality control, visualisation and analysis of conversational data. Most notably it provides a range of plotting functions that play well with ggplot and the tidyverse.

Updated 1 month ago
5 4