Reframing head pose estimation as relative pose prediction between image pairs enables a synthetic-only trained model to outperform absolute regression methods on real benchmarks.
Affect analysis in-the-Wild: Valence-arousal, expressions, action units and a unified framework
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
A Beta distribution framework models annotator consensus in continuous affect prediction by estimating mean and variance parameters to recover variability, skewness, and quantiles.
A framework converts interpretable facial and acoustic features into language descriptions, feeds them to a pretrained LM for semantic embeddings, and uses those embeddings as priors to improve valence and arousal change prediction on Aff-Wild2 and SEWA while remaining transparent.
A dual-modality model combining DINOv2 visual features with Wav2Vec audio features achieves Macro-F1 of 0.5368 on the ABAW validation set for facial expression recognition.
citing papers explorer
-
VGGT-HPE: Reframing Head Pose Estimation as Relative Pose Prediction
Reframing head pose estimation as relative pose prediction between image pairs enables a synthetic-only trained model to outperform absolute regression methods on real benchmarks.
-
Beyond the Mean: Modelling Annotation Distributions in Continuous Affect Prediction
A Beta distribution framework models annotator consensus in continuous affect prediction by estimating mean and variance parameters to recover variability, skewness, and quantiles.
-
LaScA: Language-Conditioned Scalable Modelling of Affective Dynamics
A framework converts interpretable facial and acoustic features into language descriptions, feeds them to a pretrained LM for semantic embeddings, and uses those embeddings as priors to improve valence and arousal change prediction on Aff-Wild2 and SEWA while remaining transparent.
-
A Two-Stage Dual-Modality Model for Facial Emotional Expression Recognition
A dual-modality model combining DINOv2 visual features with Wav2Vec audio features achieves Macro-F1 of 0.5368 on the ABAW validation set for facial expression recognition.