pith. sign in

arxiv: 1802.07420 · v2 · pith:IA7CBR4Xnew · submitted 2018-02-21 · 💻 cs.CL · cs.SD· eess.AS

Sequence-based Multi-lingual Low Resource Speech Recognition

classification 💻 cs.CL cs.SDeess.AS
keywords multi-linguallanguagesresourcescenarioscross-lingualend-to-endmodelmodels
0
0 comments X
read the original abstract

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss. We show that our model improves performance on Babel languages by over 6% absolute in terms of word/phoneme error rate when compared to mono-lingual systems built in the same setting for these languages. We also show that the trained model can be adapted cross-lingually to an unseen language using just 25% of the target data. We show that training on multiple languages is important for very low resource cross-lingual target scenarios, but not for multi-lingual testing scenarios. Here, it appears beneficial to include large well prepared datasets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.