End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

· 2014 · cs.NE · arXiv 1412.1602

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

representative citing papers

Cross-Attention End-to-End ASR for Two-Party Conversations

eess.AS · 2019-07-24 · unverdicted · novelty 6.0

End-to-end ASR model with speaker-specific cross-attention for two-party conversations outperforms standard models on the Switchboard corpus.

End-to-End ASR for Code-switched Hindi-English Speech

eess.AS · 2019-06-22 · unverdicted · novelty 4.0

End-to-end ASR for code-switched Hindi-English with <50 hours of data shows gains from multi-task learning and corpus balancing but underperforms cascaded baselines.

citing papers explorer

Showing 2 of 2 citing papers.

Cross-Attention End-to-End ASR for Two-Party Conversations eess.AS · 2019-07-24 · unverdicted · none · ref 25 · internal anchor
End-to-end ASR model with speaker-specific cross-attention for two-party conversations outperforms standard models on the Switchboard corpus.
End-to-End ASR for Code-switched Hindi-English Speech eess.AS · 2019-06-22 · unverdicted · none · ref 19 · internal anchor
End-to-end ASR for code-switched Hindi-English with <50 hours of data shows gains from multi-task learning and corpus balancing but underperforms cascaded baselines.

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

fields

years

verdicts

representative citing papers

citing papers explorer