Beyond the Mean: Modelling Annotation Distributions in Continuous Affect Prediction
Pith reviewed 2026-05-10 17:22 UTC · model grok-4.3
The pith
By predicting Beta distribution parameters from mean and standard deviation, models can recover the full distribution of annotator ratings for continuous affect instead of just the average value.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper's central claim is that a Beta distribution can effectively model the consensus in continuous affect annotations. Models are trained to predict the mean and standard deviation of the annotation distribution, which are then converted into the two shape parameters of the Beta distribution using moment matching. From these parameters, higher-order statistics such as skewness, kurtosis, and various quantiles can be computed directly without additional prediction. Experiments demonstrate that the resulting predictive distributions closely resemble the empirical distributions from annotators on the SEWA and RECOLA datasets, while the point predictions remain competitive with those from 1
What carries the argument
The key mechanism is moment matching, which derives the two Beta shape parameters from the predicted mean and variance of the annotation ratings to enable closed-form recovery of the full distribution properties.
Where Pith is reading between the lines
- The same Beta modeling strategy could be tested on other subjective prediction tasks where multiple human labels are available.
- Optimizing the model directly against distribution distances rather than moment matching might yield even better alignment.
- Full distribution outputs could support more nuanced applications, such as identifying cases with high annotator disagreement for further review.
Load-bearing premise
That the distribution of annotator ratings in these tasks is adequately captured by a Beta distribution and that matching the mean and variance is sufficient to accurately determine the shape and higher moments.
What would settle it
Observing a substantial mismatch between the predicted Beta-derived quantiles and the actual empirical quantiles from annotator ratings on a new dataset would indicate the claim does not hold.
Figures
read the original abstract
Emotion annotation is inherently subjective and cognitively demanding, producing signals that reflect diverse perceptions across annotators rather than a single ground truth. In continuous affect prediction, this variability is typically collapsed into point estimates such as the mean or median, discarding valuable information about annotator disagreement and uncertainty. In this work, we propose a distribution-aware framework that models annotation consensus using the Beta distribution. Instead of predicting a single affect value, models estimate the mean and standard deviation of the annotation distribution, which are transformed into valid Beta parameters through moment matching. This formulation enables the recovery of higher-order distributional descriptors, including skewness, kurtosis, and quantiles, in closed form. As a result, the model captures not only the central tendency of emotional perception but also variability, asymmetry, and uncertainty in annotator responses. We evaluate the proposed approach on the SEWA and RECOLA datasets using multimodal features. Experimental results show that Beta-based modelling produces predictive distributions that closely match the empirical annotator distributions while achieving competitive performance with conventional regression approaches. These findings highlight the importance of modelling annotation uncertainty in affective computing and demonstrate the potential of distribution-aware learning for subjective signal analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a distribution-aware framework for continuous affect prediction that models annotator consensus via the Beta distribution. Models predict the mean and standard deviation of the annotation distribution, which are converted to Beta parameters (α, β) through moment matching; this enables closed-form recovery of higher-order descriptors such as skewness, kurtosis, and quantiles. The approach is evaluated on the SEWA and RECOLA datasets using multimodal features, with the claim that the resulting predictive distributions closely match empirical annotator distributions while achieving competitive performance against conventional regression baselines.
Significance. If the empirical results hold, the work would be significant for affective computing by moving beyond point estimates (mean/median) to explicitly model annotation uncertainty and variability. The moment-matching formulation is lightweight and allows recovery of full distributional properties without requiring direct density estimation or more complex models, which could improve robustness in subjective signal tasks where annotator disagreement carries information.
major comments (3)
- [Abstract] Abstract: the claim that 'Beta-based modelling produces predictive distributions that closely match the empirical annotator distributions' is unsupported by any quantitative metrics (e.g., KL divergence, Wasserstein distance, quantile errors, or histogram comparisons) or figures; without these, the match to empirical distributions cannot be verified.
- [Method] Method (moment matching step): predicting only the first two moments and mapping them to Beta(α, β) fixes all higher-order statistics (skewness, kurtosis, quantiles) to the Beta family by construction. The central claim therefore rests on the untested assumption that SEWA/RECOLA annotation histograms are well-approximated by Beta shapes; no analysis of multimodality, tail behavior, or comparison against alternative families (e.g., truncated Gaussian, mixtures) is provided to substantiate this.
- [Experiments] Experimental results: no specific performance numbers, baseline details, or distribution-similarity scores are reported to support 'competitive performance with conventional regression approaches'; this prevents assessment of whether the distributional modeling incurs any accuracy cost on the mean or improves uncertainty quantification.
minor comments (2)
- [Abstract] The abstract refers to 'multimodal features' without naming the modalities or feature extractors used on SEWA/RECOLA.
- [Method] Provide the explicit formulas and constraints (e.g., ensuring α, β > 0) for the moment-matching transformation from predicted μ and σ to Beta parameters.
Simulated Author's Rebuttal
We thank the referee for the constructive and positive review, which highlights the potential significance of our distribution-aware framework for affective computing. We address each major comment point by point below, providing clarifications from the manuscript and committing to revisions that strengthen the presentation of results and justifications without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'Beta-based modelling produces predictive distributions that closely match the empirical annotator distributions' is unsupported by any quantitative metrics (e.g., KL divergence, Wasserstein distance, quantile errors, or histogram comparisons) or figures; without these, the match to empirical distributions cannot be verified.
Authors: We agree that the abstract would benefit from more explicit quantitative backing to support the claim. The full manuscript evaluates the approach on SEWA and RECOLA using multimodal features and presents results indicating close matches via the recovered descriptors, but we acknowledge the need for direct metrics. In the revised version, we will add specific quantitative measures such as average KL divergence, Wasserstein distance, and quantile errors between predicted Beta distributions and empirical annotation histograms, along with additional figures for visual histogram comparisons. revision: yes
-
Referee: [Method] Method (moment matching step): predicting only the first two moments and mapping them to Beta(α, β) fixes all higher-order statistics (skewness, kurtosis, quantiles) to the Beta family by construction. The central claim therefore rests on the untested assumption that SEWA/RECOLA annotation histograms are well-approximated by Beta shapes; no analysis of multimodality, tail behavior, or comparison against alternative families (e.g., truncated Gaussian, mixtures) is provided to substantiate this.
Authors: The Beta distribution is a natural choice for normalized continuous affect labels bounded in [0,1], as it flexibly models varying means, variances, and skewness with only two parameters while permitting closed-form recovery of higher-order statistics. The manuscript demonstrates empirical effectiveness on the target datasets through competitive performance and recovered descriptors. However, we accept that an explicit justification and sensitivity analysis would strengthen the work. In revision, we will add a subsection analyzing the empirical annotation distributions for multimodality and tail behavior, plus a brief comparison to alternatives such as truncated Gaussian to substantiate the modeling choice. revision: partial
-
Referee: [Experiments] Experimental results: no specific performance numbers, baseline details, or distribution-similarity scores are reported to support 'competitive performance with conventional regression approaches'; this prevents assessment of whether the distributional modeling incurs any accuracy cost on the mean or improves uncertainty quantification.
Authors: The experiments section reports results on SEWA and RECOLA with multimodal features, comparing against conventional regression baselines and claiming competitive performance. We apologize if the numerical details were not sufficiently prominent. To address this directly, the revised manuscript will include expanded tables with specific performance numbers (e.g., CCC and MSE for mean prediction), baseline details, and distribution-similarity scores, explicitly showing that the proposed approach maintains mean accuracy while additionally providing uncertainty quantification via the full predictive distributions. revision: yes
Circularity Check
No significant circularity; standard moment-matching procedure with independent empirical validation
full rationale
The paper's core procedure predicts mean and standard deviation from multimodal features using regression, then applies the standard Beta moment-matching formulas (alpha = mu * ((mu*(1-mu)/sigma^2) - 1), beta = (1-mu) * ((mu*(1-mu)/sigma^2) - 1)) to obtain parameters. Higher-order descriptors (skewness, kurtosis, quantiles) follow directly from the closed-form Beta expressions. This is a conventional statistical transformation, not a reduction of any claimed prediction to its own fitted inputs by construction. The assertion that the resulting distributions 'closely match the empirical annotator distributions' is presented as an empirical outcome on SEWA/RECOLA, not a definitional equivalence. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked to justify the central modelling choice. The derivation chain remains self-contained and externally falsifiable via direct comparison to annotation histograms.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Annotation distributions in continuous affect tasks follow a Beta distribution
Reference graph
Works this paper leans on
-
[1]
Beta quantile regression for ro- bust estimation of uncertainty in the presence of outliers
Haleh Akrami, Omar Zamzam, Anand Joshi, Sergul Ay- dore, and Richard Leahy. Beta quantile regression for ro- bust estimation of uncertainty in the presence of outliers. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7480–7484. IEEE, 2024. 2
work page 2024
-
[2]
Deep evidential regression.Advances in neural information processing systems, 33:14927–14937, 2020
Alexander Amini, Wilko Schwarting, Ava Soleimany, and Daniela Rus. Deep evidential regression.Advances in neural information processing systems, 33:14927–14937, 2020. 2
work page 2020
-
[3]
Emotion recognition in human-computer in- teraction.IEEE Signal processing magazine, 18(1):32–80,
Roddy Cowie, Ellen Douglas-Cowie, Nicolas Tsapatsoulis, George V otsis, Stefanos Kollias, Winfried Fellenz, and John G Taylor. Emotion recognition in human-computer in- teraction.IEEE Signal processing magazine, 18(1):32–80,
-
[4]
An investigation of emotion pre- diction uncertainty using gaussian mixture regression
Ting Dang, Vidhyasaharan Sethu, Julien Epps, and Eliathamby Ambikairajah. An investigation of emotion pre- diction uncertainty using gaussian mixture regression. InIN- TERSPEECH, pages 1248–1252, 2017. 2
work page 2017
-
[5]
Abu Zaher Md Faridee and Hannes Gamper. Predicting score distribution to improve non-intrusive speech quality estima- tion.arXiv preprint arXiv:2204.06616, 2022. 2
-
[6]
Jing Han, Zixing Zhang, Nicholas Cummins, Fabien Ringeval, and Bj ¨orn Schuller. Strength modelling for real- world automatic continuous affect recognition from audiovi- sual signals.Image and Vision Computing, 65:76–86, 2017. 2
work page 2017
-
[7]
Affect analysis in-the-Wild: Valence-arousal, expressions, action units and a unified framework,
Dimitrios Kollias and Stefanos Zafeiriou. Affect analysis in-the-wild: Valence-arousal, expressions, action units and a unified framework.arXiv preprint arXiv:2103.15792, 2021. 2
-
[8]
Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Antoine Toisoul, Bj ¨orn Schuller, et al. Sewa db: A rich database for audio-visual emotion and sentiment re- search in the wild.IEEE transactions on pattern analysis and machine intelligence, 43(3):1022–1040, 2019. 2, 4
work page 2019
-
[9]
Mbnet: Mos prediction for synthe- sized speech with mean-bias network
Yichong Leng, Xu Tan, Sheng Zhao, Frank Soong, Xiang- Yang Li, and Tao Qin. Mbnet: Mos prediction for synthe- sized speech with mean-bias network. InICASSP 2021-2021 8 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 391–395. IEEE, 2021. 2
work page 2021
-
[10]
Deepmos: deep posterior mean-opinion-score of speech
Xinyu Liang, Fredrik Cumlin, Christian Sch ¨uldt, and Saikat Chatterjee. Deepmos: deep posterior mean-opinion-score of speech. InProceedings of INTERSPEECH, pages 526–530,
-
[11]
Ilias Maglogiannis, Filimon Trastelis, Michael Kalogeropou- los, Arsalan Khan, Parisis Gallos, Andreas Menychtas, Christos Panagopoulos, Petros Papachristou, Najmul Islam, Annika Wolff, et al. Ai4work project: Human-centric digital twin approaches to trustworthy ai and robotics for improved working conditions in healthcare and education sectors. In Digita...
work page 2024
-
[12]
The invariant ground truth of affect
Konstantinos Makantasis, Kosmas Pinitas, Antonios Liapis, and Georgios N Yannakakis. The invariant ground truth of affect. In2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and De- mos (ACIIW), pages 1–8. IEEE, 2022. 7
work page 2022
-
[13]
Konstantinos Makantasis, Kosmas Pinitas, Antonios Liapis, and Georgios N Yannakakis. From the lab to the wild: Affect modeling via privileged information.IEEE Transactions on Affective Computing, 15(2):380–392, 2023. 4
work page 2023
-
[14]
Nicolaou, Hatice Gunes, and Maja Pantic
Mihalis A. Nicolaou, Hatice Gunes, and Maja Pantic. Con- tinuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space.IEEE Transactions on Affective Computing, 2(2):92–105, 2011. 1
work page 2011
-
[15]
Conformalized multimodal uncertainty regression and reasoning
Domenico Parente, Nastaran Darabi, Alex C Stutts, Theja Tulabandhula, and Amit Ranjan Trivedi. Conformalized multimodal uncertainty regression and reasoning. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6985–6989. IEEE, 2024. 2
work page 2024
-
[16]
Rosalind W Picard.Affective computing. 2000. 1
work page 2000
-
[17]
Supervised contrastive learn- ing for affect modelling
Kosmas Pinitas, Konstantinos Makantasis, Antonios Liapis, and Georgios N Yannakakis. Supervised contrastive learn- ing for affect modelling. InProceedings of the 2022 Inter- national Conference on Multimodal Interaction, pages 531– 539, 2022. 4
work page 2022
-
[18]
Kosmas Pinitas, David Renaudie, Mike Thomsen, Matthew Barthet, Konstantinos Makantasis, Antonios Liapis, and Georgios N Yannakakis. Predicting player engagement in tom clancy’s the division 2: A multimodal approach via pix- els and gamepad actions. InProceedings of the 25th Inter- national Conference on Multimodal Interaction, pages 488– 497, 2023. 8
work page 2023
-
[19]
Across-game engagement modelling via few- shot learning
Kosmas Pinitas, Konstantinos Makantasis, and Georgios N Yannakakis. Across-game engagement modelling via few- shot learning. InEuropean Conference on Computer Vision, pages 390–406. Springer, 2024. 8
work page 2024
-
[20]
Varying the context to advance affect modelling: A study on game engagement prediction
Kosmas Pinitas, Nemanja Rasajski, Matthew Barthet, Maria Kaselimi, Konstantinos Makantasis, Antonios Liapis, and Georgios N Yannakakis. Varying the context to advance affect modelling: A study on game engagement prediction. In2024 12th International Conference on Affective Comput- ing and Intelligent Interaction (ACII), pages 194–202. IEEE,
-
[21]
Privileged contrastive pretraining for multi- modal affect modelling
Kosmas Pinitas, Konstantinos Makantasis, and Georgios Yannakakis. Privileged contrastive pretraining for multi- modal affect modelling. InProceedings of the 27th Inter- national Conference on Multimodal Interaction, pages 317– 325, 2025. 1
work page 2025
-
[22]
Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann- Willenbrock, and Timo Gerkmann. End-to-end label uncer- tainty modeling for speech-based arousal recognition using bayesian neural networks.arXiv preprint arXiv:2110.03299,
-
[23]
Introducing the recola multimodal corpus of remote collaborative and affective interactions
Fabien Ringeval, Andreas Sonderegger, J ¨urg Sauer, and De- nis Lalanne. Introducing the recola multimodal corpus of remote collaborative and affective interactions. In2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pages 1–8, 2013. 2, 4
work page 2013
-
[24]
The inter- speech 2009 emotion challenge
Bj ¨orn Schuller, Stefan Steidl, and Anton Batliner. The inter- speech 2009 emotion challenge. InINTERSPEECH, pages 312–315, 2009. 1
work page 2009
-
[25]
Alexis D Souchet, St ´ephanie Philippe, Domitile Lourdeaux, and Laure Leroy. Measuring visual fatigue and cognitive load via eye tracking while learning with virtual reality head-mounted displays: A review.International Journal of Human–Computer Interaction, 38(9):801–824, 2022. 1
work page 2022
-
[26]
Jiaxin Wu, Mathieu Barthet, David Melhart, and Georgios N. Yannakakis. Emotions as ambiguity-aware ordinal represen- tations. InACII, 2025. arXiv:2508.19193. 2
-
[27]
Wen Wu, Chao Zhang, and Philip C Woodland. Estimating the uncertainty in emotion attributes using deep evidential regression.arXiv preprint arXiv:2306.06760, 2023. 2
-
[28]
Lei Yang, Qi Tang, Zhaojin Chen, Shuhan Zhang, Yufeng Mu, Ye Yan, Peng Xu, Dezhong Yao, Fali Li, and Cunbo Li. Eeg based emotion recognition by hierarchical bayesian spectral regression framework.Journal of Neuroscience Methods, 402:110015, 2024. 2 9
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.