Understanding visually grounded spoken language via multi-tasking

Understanding spoken language is an important capability of intelligent systems which interact with people. Example applications which use a speech understanding component include personal assistants, search engines and others. The common way of enabling an application to understand and react to spoken language is to first transcribe speech into text using a speech recognition module, and then to process the text with a separate text understanding module.

We propose an alternative approach inspired by how humans understand speech. Speech will be processed directly by an end-to-end neural network model without first being transcribed into text, avoiding the need for large amounts of transcribed speech needed to train a traditional speech recognition system. The system will instead learn simultaneously from more easily obtained types of data: for example, it will learn to match images to their spoken descriptions, answer questions about images, or match utterances spoken in different languages.

Our proposal promises to be less reliant on strong supervision and expensive resources and thus applicable in a wider range of circumstances than traditional systems, especially when large amounts of transcribed speech are not available, for example when dealing with low-resource languages or specialized domains.

Understanding visually grounded spoken language via multi-tasking

Participating organisations

Impact

Output

Team

Contact person

Christiaan Meijer

eScience Research Engineer

Netherlands eScience Center

0000-0002-5529-5761

Related projects

DIANNA - Deep Insight and Neural Networks Analysis

ePODIUM

NEWSGAC

TICCLAT

Emotion Recognition in Dementia

Case Law Analytics

What Works When for Whom?

Dr. Watson

Related software

DIANNA

Platalea

Understanding visually grounded spoken language via multi-tasking

Participating organisations

Impact

Conference papers20

Journal articles9

Other11

Output

Journal articles1

Other1

Team

Contact person

Christiaan Meijer

eScience Research Engineer

Netherlands eScience Center

.logo-orcid_svg__st1{fill:#fff}0000-0002-5529-5761

Related projects

DIANNA - Deep Insight and Neural Networks Analysis

ePODIUM

NEWSGAC

TICCLAT

Emotion Recognition in Dementia

Case Law Analytics

What Works When for Whom?

Dr. Watson

Related software

DIANNA

Platalea

0000-0002-5529-5761