Features from audio-visual semantic grounding models improve speech recognition when used as input, with earlier layers retaining more phonetic detail and deeper layers showing greater domain invariance.
Distributed representations of words and phrases and their com- positionality,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Transfer Learning from Audio-Visual Grounding to Speech Recognition
Features from audio-visual semantic grounding models improve speech recognition when used as input, with earlier layers retaining more phonetic detail and deeper layers showing greater domain invariance.