A decoupled dual-stream model for audio-visual speaker detection reaches 95.6% mAP on AVA-ActiveSpeaker by isolating temporal continuity and inter-personal social modeling into separate branches.
Look&listen: Multi-modal correlation learn- ing for active speaker detection and speech enhancement,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MM 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Dual-Stream Decoupled Learning for Temporal Consistency and Speaker Interaction in AVSD
A decoupled dual-stream model for audio-visual speaker detection reaches 95.6% mAP on AVA-ActiveSpeaker by isolating temporal continuity and inter-personal social modeling into separate branches.