Proposes cross-attention audio-video fusion and VE-MD latent-space models for group emotion recognition that avoid individual cues and report competitive performance via ablation studies on synthetic and real data.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Multimodal Group Emotion Recognition In-the-Wild Towards a Privacy-Safe Non-Individual Approach
Proposes cross-attention audio-video fusion and VE-MD latent-space models for group emotion recognition that avoid individual cues and report competitive performance via ablation studies on synthetic and real data.