Towards next-generation scientific computing tools for diversity-aware language science and technology

Diversity-aware language technology for conversational data

Shutterstock 2163461429

Most methods in NLP and linguistics are geared towards dealing with text, not talk. Our research aims to change this by demonstrating the importance of linguistically diverse conversational data (audio+annotations). Our central question is: how can we use computational tools to make the language sciences conversation-ready? This is the next frontier for enabling quantitative approaches to conversational structure and for creating diversity-aware language technology. To get there, we combine methods from comparative linguistics, computational modelling and data science. Our key purposes are to enable broad curation, rapid exploration and rich visualization, as showcased in our 2022 ACL, LREC and Interspeech papers.

Participating organisations

Social Sciences & Humanities

Team

Contact person

Pablo Rodríguez-Sánchez

Lead RSE

Netherlands eScience Center

0000-0002-2855-940X

Mail Pablo

Mark Dingemanse

Lead Applicant

Radboud University

0000-0002-3290-5723

Andreas Liesenfeld

Radboud University Nijmegen

0000-0001-6076-4406

Barbara Vreede

eScience Research Engineer

Netherlands eScience Center

0000-0002-5023-4601

Eva Viviani

eScience Research Engineer

Netherlands eScience Center

0000-0002-1330-0585

Pablo Rodríguez-Sánchez

Lead RSE

Netherlands eScience Center

0000-0002-2855-940X

Jisk Attema

Programme Manager

Netherlands eScience Center

0000-0002-0948-1176

Pablo Lopez-Tarifa

Programme Manager

Netherlands eScience Center

0000-0002-4136-1860

Patrick Bos

Tech Lead

Netherlands eScience Center

0000-0002-6033-960X

Related software

Harmony

Making harmonisation simple. Social scientists often have to compare items from different questionnaires or datasets. Harmony is a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items quickly, even in different languages.

Updated 20 months ago

scikit-talk

Scikit-talk is an open-source toolkit for processing collections of real-world conversational speech in Python. The toolkit aims to facilitate the exploration of large collections of transcriptions and annotations of conversational interaction.

Updated 21 months ago

talkr

{talkr} is an R package that offers a set of convenience functions for quality control, visualisation and analysis of conversational data. Most notably it provides a range of plotting functions that play well with ggplot and the tidyverse.

Updated 21 months ago

27 4

Towards next-generation scientific computing tools for diversity-aware language science and technology

Participating organisations

Team

Contact person

Pablo Rodríguez-Sánchez

Lead RSE

Netherlands eScience Center

.logo-orcid_svg__st1{fill:#fff}0000-0002-2855-940X

Related software

Harmony

scikit-talk

talkr

0000-0002-2855-940X