Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

Boris Ginsburg; Jason Li; Ravi Gadde; Vitaly Lavrukhin

arxiv: 1811.00707 · v1 · pith:QD3CCE3Anew · submitted 2018-11-02 · 💻 cs.CL · cs.LG· cs.SD· eess.AS

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

Jason Li , Ravi Gadde , Boris Ginsburg , Vitaly Lavrukhin This is my paper

classification 💻 cs.CL cs.LGcs.SDeess.AS

keywords speechdatasetmodelsrecognitionsyntheticlargeneuralaccurate

0 comments

read the original abstract

Building an accurate automatic speech recognition (ASR) system requires a large dataset that contains many hours of labeled speech samples produced by a diverse set of speakers. The lack of such open free datasets is one of the main issues preventing advancements in ASR research. To address this problem, we propose to augment a natural speech dataset with synthetic speech. We train very large end-to-end neural speech recognition models using the LibriSpeech dataset augmented with synthetic speech. These new models achieve state of the art Word Error Rate (WER) for character-level based models without an external language model.

This paper has not been read by Pith yet.

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

discussion (0)