End-to-end ASR model with speaker-specific cross-attention for two-party conversations outperforms standard models on the Switchboard corpus.
Con- nectionist temporal classification: labelling unsegmented se- quence data with recurrent neural networks,
4 Pith papers cite this work. Polarity classification is still indexing.
years
2019 4verdicts
UNVERDICTED 4representative citing papers
Activation maximization applied to a speech command DNN, followed by WaveNet synthesis, produces class-specific utterances that human evaluators can interpret, supporting its use for model debugging.
3D-2D-CNN-BLSTM with word-CTC reaches 1.3% WER on GRID seen-speaker lipreading (55% relative gain over LCANet) and 8.6% on unseen speakers (24.5% gain over LipNet).
End-to-end ASR for code-switched Hindi-English with <50 hours of data shows gains from multi-task learning and corpus balancing but underperforms cascaded baselines.
citing papers explorer
-
Cross-Attention End-to-End ASR for Two-Party Conversations
End-to-end ASR model with speaker-specific cross-attention for two-party conversations outperforms standard models on the Switchboard corpus.
-
Towards Debugging Deep Neural Networks by Generating Speech Utterances
Activation maximization applied to a speech command DNN, followed by WaveNet synthesis, produces class-specific utterances that human evaluators can interpret, supporting its use for model debugging.
-
LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models
3D-2D-CNN-BLSTM with word-CTC reaches 1.3% WER on GRID seen-speaker lipreading (55% relative gain over LCANet) and 8.6% on unseen speakers (24.5% gain over LipNet).
-
End-to-End ASR for Code-switched Hindi-English Speech
End-to-end ASR for code-switched Hindi-English with <50 hours of data shows gains from multi-task learning and corpus balancing but underperforms cascaded baselines.