pith. machine review for the scientific record. sign in

arxiv: 1711.01694 · v2 · submitted 2017-11-06 · 📡 eess.AS · cs.AI· cs.CL

Recognition: unknown

Multilingual Speech Recognition With A Single End-To-End Model

Authors on Pith no claims yet
classification 📡 eess.AS cs.AIcs.CL
keywords modellanguagelanguagessequence-to-sequencerecognitionsingleadditionalbecause
0
0 comments X
read the original abstract

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.