The GG-AVSE framework uses listener gaze direction combined with YOLO5Face and AVSEMamba to resolve target-speaker ambiguity in audio-visual speech enhancement, yielding gains in PESQ, STOI, and SI-SDR.
Venema,Compression for Clinicians, Chapter 7, Thomson Delmar Learning, 2006
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework
The GG-AVSE framework uses listener gaze direction combined with YOLO5Face and AVSEMamba to resolve target-speaker ambiguity in audio-visual speech enhancement, yielding gains in PESQ, STOI, and SI-SDR.