VISAFF is a tuning-free speaker-centered visual affective feature learning framework for emotion recognition in conversation that guides frozen VLMs to active speakers and uses reliability-guided complementation from textual and acoustic modalities to achieve competitive performance.
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MER2026 defines four tracks to advance generative emotion understanding from individual basic labels to dyadic, fine-grained, preference, and physiological scenarios.
citing papers explorer
-
VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation
VISAFF is a tuning-free speaker-centered visual affective feature learning framework for emotion recognition in conversation that guides frozen VLMs to active speakers and uses reliability-guided complementation from textual and acoustic modalities to achieve competitive performance.
-
MER 2026: From Discriminative Emotion Recognition to Generative Emotion Understanding
MER2026 defines four tracks to advance generative emotion understanding from individual basic labels to dyadic, fine-grained, preference, and physiological scenarios.