RNN-T For Latency Controlled ASR With Improved Beam Search

Anuroop Sriram; Ching-Feng Yeh; Christian Fuegen; Jay Mahadeokar; Kaustubh Kalgaonkar; Kjell Schubert; Mahaveer Jain; Michael L. Seltzer

arxiv: 1911.01629 · v2 · pith:DBDOR5FJnew · submitted 2019-11-05 · 💻 cs.CL · cs.LG· eess.AS

RNN-T For Latency Controlled ASR With Improved Beam Search

Mahaveer Jain , Kjell Schubert , Jay Mahadeokar , Ching-Feng Yeh , Kaustubh Kalgaonkar , Anuroop Sriram , Christian Fuegen , Michael L. Seltzer This is my paper

classification 💻 cs.CL cs.LGeess.AS

keywords rnn-tmodelsystemsbeamhybridimprovedinferencelatency

0 comments

read the original abstract

Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization) into one single model. This greatly simplifies training and inference and hence makes RNN-T a desirable choice for ASR systems. In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed of the originally proposed RNN-T beam search algorithm. We evaluated our proposed system on English videos ASR dataset and show that neural RNN-T models can achieve comparable WER and better computational efficiency compared to a well tuned hybrid ASR baseline.

This paper has not been read by Pith yet.

RNN-T For Latency Controlled ASR With Improved Beam Search

discussion (0)