Features from audio-visual semantic grounding models improve speech recognition when used as input, with earlier layers retaining more phonetic detail and deeper layers showing greater domain invariance.
Audio augmen- tation for speech recognition,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2019 2verdicts
UNVERDICTED 2representative citing papers
A single student model trained on essence knowledge from a teacher ensemble plus hard labels outperforms both a standard single model and the teacher ensemble itself on the Switchboard dataset.
citing papers explorer
-
Transfer Learning from Audio-Visual Grounding to Speech Recognition
Features from audio-visual semantic grounding models improve speech recognition when used as input, with earlier layers retaining more phonetic detail and deeper layers showing greater domain invariance.
-
Essence Knowledge Distillation for Speech Recognition
A single student model trained on essence knowledge from a teacher ensemble plus hard labels outperforms both a standard single model and the teacher ensemble itself on the Switchboard dataset.