High-frame-rate feature extraction at 200-400 fps improves end-to-end ASR word error rates on WSJ and CHiME-5, with relative reductions up to 24.1% when combined with speed perturbation.
Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
End-to-End Speech Recognition with High-Frame-Rate Features Extraction
High-frame-rate feature extraction at 200-400 fps improves end-to-end ASR word error rates on WSJ and CHiME-5, with relative reductions up to 24.1% when combined with speed perturbation.