Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

arxiv: 1706.02737 · v1 · pith:KQGXHHWDnew · submitted 2017-06-08 · 💻 cs.CL

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Takaaki Hori , Shinji Watanabe , Yu Zhang , William Chan This is my paper

classification 💻 cs.CL

keywords networkattention-basedencoderend-to-endmodelspeechdecoderdeep

0 comments p. Extension

pith:KQGXHHWD Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{KQGXHHWD}

Prints a linked pith:KQGXHHWD badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10\% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.

This paper has not been read by Pith yet.

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

discussion (0)