A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition

Jiwei Li; Xi Zhou; Xuerui Yang

arxiv: 1810.11352 · v2 · pith:Q2LLAGZGnew · submitted 2018-10-26 · 💻 cs.SD · eess.AS

A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition

Xuerui Yang , Jiwei Li , Xi Zhou This is my paper

classification 💻 cs.SD eess.AS

keywords networkarchitectureinformationlattice-freelayerslibrispeechmemorynovel

0 comments

read the original abstract

Deep Feedforward Sequential Memory Network (DFSMN) has shown superior performance on speech recognition tasks. Based on this work, we propose a novel network architecture which introduces pyramidal memory structure to represent various context information in different layers. Additionally, res-CNN layers are added in the front to extract more sophisticated features as well. Together with lattice-free maximum mutual information (LF-MMI) and cross entropy (CE) joint training criteria, experimental results show that this approach achieves word error rates (WERs) of 3.62% and 10.89% respectively on Librispeech and LDC97S62 (Switchboard 300 hours) corpora. Furthermore, Recurrent neural network language model (RNNLM) rescoring is applied and a WER of 2.97% is obtained on Librispeech.

This paper has not been read by Pith yet.

A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition

discussion (0)