Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Chao Weng; Dan Su; Dong Yu; Jianwei Yu; Jinchuan Tian; Shi-Xiong Zhang; Yuexian Zou

arxiv: 2112.02498 · v2 · pith:END2NUSYnew · submitted 2021-12-05 · 💻 cs.AI · cs.CL

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Jinchuan Tian , Jianwei Yu , Chao Weng , Shi-Xiong Zhang , Dan Su , Dong Yu , Yuexian Zou This is my paper

classification 💻 cs.AI cs.CL

keywords frameworkslf-mmitrainingapproachcriteriondatasetsdecodingend-to-end

0 comments

read the original abstract

Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks. However, Lattice-Free Maximum Mutual Information (LF-MMI), as one of the discriminative training criteria that show superior performance in hybrid ASR systems, is rarely adopted in E2E ASR frameworks. In this work, we propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. The proposed approach shows its effectiveness on two of the most widely used E2E frameworks including Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements on various datasets and different E2E ASR frameworks. The best of our models achieves competitive CER of 4.1\% / 4.4\% on Aishell-1 dev/test set; we also achieve significant error reduction on Aishell-2 and Librispeech datasets over strong baselines.

This paper has not been read by Pith yet.

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

discussion (0)