Distribution-based supervision for 9-class SER improves alignment with human annotator vote distributions over hard-label training.
Learning from Annotation Uncertainty: Entropy-Aware Curriculum for Speech Emotion Recognition
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Speech emotion recognition (SER) often relies on hard consensus labels that collapse annotator disagreement. We study distribution-based supervision for 9-class SER on MSP-Podcast 2.0 using a WavLM-Base multitask model for categorical emotion and dimensional VAD. Hard-label training is compared with targets from primary and merged primary--secondary annotator vote distributions. Distributional objectives improve alignment with human vote distributions, reducing JSD/KLD relative to hard-label training. Analysis shows that hard supervision partly benefits from assigning ambiguous utterances to the residual Other class, whereas distributional supervision redistributes uncertainty across emotion categories. Entropy-stratified evaluation shows that high-ambiguity utterances remain challenging, but distribution-based supervision better captures perceptual uncertainty. These findings support moving beyond hard labels toward targets that reflect listener disagreement.
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning from Annotation Uncertainty: Entropy-Aware Curriculum for Speech Emotion Recognition
Distribution-based supervision for 9-class SER improves alignment with human annotator vote distributions over hard-label training.