End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

Dzmitry Bahdanau; Jan Chorowski; Kyunghyun Cho; Yoshua Bengio

arxiv: 1412.1602 · v1 · pith:Z2PZKXBHnew · submitted 2014-12-04 · 💻 cs.NE · cs.LG· stat.ML

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

Jan Chorowski , Dzmitry Bahdanau , Kyunghyun Cho , Yoshua Bengio This is my paper

classification 💻 cs.NE cs.LGstat.ML

keywords recurrentattentioncontinuousdecoderemitsinputmechanismnetwork

0 comments

read the original abstract

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cross-Attention End-to-End ASR for Two-Party Conversations
eess.AS 2019-07 unverdicted novelty 6.0

End-to-end ASR model with speaker-specific cross-attention for two-party conversations outperforms standard models on the Switchboard corpus.