HP-VSR-ResFiLM adds a single residual FiLM modulation block conditioned on head pose to a CNN visual encoder, yielding WER of 25.0% on LRS2 and 33.2% on LRS3 under standard training conditions.
arXiv preprint arXiv:2305.09212 (2023)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Head-Pose-Aware Visual Speech Recognition with FiLM Modulation
HP-VSR-ResFiLM adds a single residual FiLM modulation block conditioned on head pose to a CNN visual encoder, yielding WER of 25.0% on LRS2 and 33.2% on LRS3 under standard training conditions.