Interpretable Uncertainty Routing Separating Emotion Ambiguity from Distribution Shift in Facial Expression Recognition
Pith reviewed 2026-06-26 10:19 UTC · model grok-4.3
The pith
Uncertainty decomposition separates emotion ambiguity from distribution shift for differentiated routing in facial expression recognition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Uncertainty-Aware Routing exploits the separation of aleatoric uncertainty, which recovers human annotator disagreement at Spearman correlation 0.66, from epistemic uncertainty, which detects corruption-induced distribution shift at average AUROC 0.699. The routing mechanism therefore reports ambiguity for in-distribution faces and rejects out-of-distribution inputs, retaining approximately 1.8 times more ambiguous faces than single-uncertainty routing at a matched out-of-distribution rejection rate. A label-distribution-learning baseline recovers disagreement comparably yet cannot perform the differentiated routing because it lacks the separation.
What carries the argument
Uncertainty-Aware Routing (UAR), an inference-time mechanism that applies separate thresholds to aleatoric uncertainty for reporting ambiguity and to epistemic uncertainty for rejection.
If this is right
- Ambiguous in-distribution faces can be surfaced with their disagreement level instead of being discarded.
- Out-of-distribution inputs can be rejected without also discarding valid but ambiguous cases.
- Label distribution learning recovers annotator disagreement but supplies no mechanism for choosing different actions on shift.
- The separation enables interpretable selection between reporting and rejection at inference time.
Where Pith is reading between the lines
- The same decomposition could support routing in other label-ambiguous tasks such as medical image diagnosis where both disagreement and domain shift appear.
- Real-world deployment could route low-ambiguity in-distribution cases to automated output while routing high-ambiguity or shifted cases to human review.
- Extending the validation beyond synthetic corruptions to natural domain shifts would test whether epistemic uncertainty remains a reliable shift detector.
Load-bearing premise
That the aleatoric component extracted from the ensemble is a faithful proxy for human annotator disagreement and the epistemic component is a faithful proxy for distribution shift induced by image corruptions.
What would settle it
A direct comparison of UAR routing decisions against human judgments on whether each face should be reported with its ambiguity or rejected outright.
Figures
read the original abstract
Facial expression recognition (FER) is inherently ambiguous: human annotators frequently disagree, and models deployed in real environments face distribution shift. Crucially, these two conditions demand different downstream actions, as ambiguous in-distribution faces should be reported with their ambiguity whereas out-of-distribution inputs should be rejected. However, a single uncertainty score conflates the two. In this study, uncertainty decomposition into aleatoric and epistemic components for FER is investigated, and Uncertainty-Aware Routing (UAR), an inference-time routing mechanism that exploits the separation, is introduced. Specifically, aleatoric and epistemic uncertainties are obtained from a Deep Ensemble of fully fine-tuned DINOv2 models and are each validated against an independent external signal: aleatoric against human annotator disagreement, and epistemic against distribution shift induced by image corruptions. The proposed dual-validation protocol reveals that aleatoric recovers annotator disagreement with Spearman correlation 0.66 (95% CI: 0.64-0.68), and epistemic detects corruption-induced shifts, achieving average AUROC of 0.699 at the highest corruption severity. UAR retains approximately 1.8 times more ambiguous in-distribution faces than single-uncertainty routing at a matched out-of-distribution rejection rate. A strong label-distribution-learning baseline achieves comparable disagreement recovery but cannot separate ambiguity from shift and therefore cannot route, establishing that the value of decomposition lies in the separation enabling interpretable and differentiated action selection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that uncertainty in facial expression recognition can be decomposed into aleatoric and epistemic components using a Deep Ensemble of fine-tuned DINOv2 models. Aleatoric uncertainty is validated against human annotator disagreement (Spearman 0.66), epistemic against corruption-induced distribution shifts (average AUROC 0.699), and the resulting Uncertainty-Aware Routing (UAR) retains ~1.8x more ambiguous in-distribution faces than single-uncertainty routing at matched OOD rejection rates, while a label-distribution-learning baseline cannot separate the signals for routing.
Significance. If the components are specific to their target signals, the work provides a practical mechanism for differentiated actions in FER deployment (report ambiguity vs. reject shift). The dual-validation against independent external signals and the quantitative retention gain are concrete strengths that would support the value of decomposition over conflated uncertainty.
major comments (2)
- [Abstract and §4] Abstract and §4 (dual-validation protocol): Spearman 0.66 for aleatoric vs. annotator disagreement and AUROC 0.699 for epistemic vs. corruptions are reported, but no cross-sensitivity results are given (e.g., does aleatoric rise under corruptions; does epistemic rise with annotator disagreement). This test is load-bearing for the claim that the decomposition enables clean, interpretable routing separation.
- [§4.3] §4.3 (UAR evaluation): the 1.8x retention advantage at matched OOD rejection rate is attributed to the interpretable decomposition, yet without the cross-sensitivity evidence the gain cannot be unambiguously credited to separation rather than possible entanglement of the two uncertainty estimates.
minor comments (2)
- [Methods] Methods section: the number of ensemble members and the precise formulas used to extract aleatoric (e.g., expected entropy) and epistemic (e.g., mutual information) uncertainties from the DINOv2 ensemble predictions are not stated.
- [Abstract] Abstract: the 95% CI (0.64-0.68) on the Spearman correlation is given without the underlying sample size or computation method.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments correctly identify that cross-sensitivity tests would strengthen the evidence for clean separation of the uncertainty components. We address each point below and will revise the manuscript to incorporate the requested analyses.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (dual-validation protocol): Spearman 0.66 for aleatoric vs. annotator disagreement and AUROC 0.699 for epistemic vs. corruptions are reported, but no cross-sensitivity results are given (e.g., does aleatoric rise under corruptions; does epistemic rise with annotator disagreement). This test is load-bearing for the claim that the decomposition enables clean, interpretable routing separation.
Authors: We agree that the absence of cross-sensitivity results leaves open the possibility of entanglement. Our dual-validation protocol uses independent external signals, but we did not explicitly test whether aleatoric uncertainty increases under corruptions or whether epistemic uncertainty correlates with annotator disagreement. We will compute and report these cross-sensitivity results (including quantitative measures and visualizations) in the revised §4 and abstract to directly address this concern. revision: yes
-
Referee: [§4.3] §4.3 (UAR evaluation): the 1.8x retention advantage at matched OOD rejection rate is attributed to the interpretable decomposition, yet without the cross-sensitivity evidence the gain cannot be unambiguously credited to separation rather than possible entanglement of the two uncertainty estimates.
Authors: The 1.8x retention gain is measured using the separated uncertainties for differentiated routing actions. We acknowledge that without cross-sensitivity evidence it is not possible to fully rule out entanglement as an alternative explanation for the observed advantage. We will add the cross-sensitivity results and revise the discussion and attribution in §4.3 to reflect the new evidence. revision: yes
Circularity Check
No significant circularity; external validations independent of routing rule
full rationale
The paper obtains aleatoric and epistemic uncertainties from a Deep Ensemble of fine-tuned DINOv2 models using standard decomposition. These are validated against independent external signals (human annotator disagreement via Spearman correlation, corruption-induced shifts via AUROC), which are not derived from the same fitted parameters or routing rule. The UAR retention advantage (1.8x) is reported as an empirical comparison at matched rejection rates against single-uncertainty routing and a label-distribution-learning baseline. No equations or claims reduce by construction to inputs; no self-citations are invoked as load-bearing uniqueness theorems; the separation enabling differentiated actions is measured against quantities outside the model (annotator labels, synthetic corruptions). This is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Deep ensemble disagreement separates aleatoric from epistemic uncertainty in the manner required for the routing rule.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the 18th ACM International Conference on Multimodal Interactio
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interactio. pp. 279–283 (2016).https://doi.org/10.1145/2993148.2993165
-
[2]
In: Proceedings of the 33rd International Conference on Neural Information Processing Systems
Corbière, C., Thome, N., Bar-Hen, A., Cord, M., Pérez, P.: Addressing failure prediction by learning model confidence. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. pp. 2902–2913. No. 261 (2019).https://doi.org/10.5555/3454287.3454548
-
[3]
In: Proceedings of the 35 th International Conference on Machine Learning
Depeweg, S., Hernández-Lobato, J.M., Doshi-Velez, F., Udluft, S.: Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In: Proceedings of the 35 th International Conference on Machine Learning. pp. 1184–1193 (2018)
2018
-
[4]
In: Proceedings of the 9th International Conference on Learning Representations (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16×16 words: Transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations (2021)
2021
-
[5]
In: Proceedings of The 33rd International Conference on Machine Learning
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. pp. 1050–1059 (2016)
2016
-
[6]
In: Proceedings of the 31st International Conference on Neural Information Processing Systems
Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 4885–4894 (2017).https://doi.org/10.5555/3295222.3295241
-
[7]
Geng, X.: Label distribution learning. IEEE Transactions on Knowledge and Data Engineering28(7), 1734–1748 (2016).https://doi.org/10.1109/TKDE.2016. 2545658
-
[8]
In: Proceedings of the 34th International Conference on Machine Learning
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: Proceedings of the 34th International Conference on Machine Learning. pp. 1321–1330 (2017)
2017
-
[9]
In: Proceedings of the 7th International Conference on Learning Representations (2019)
Hendrycks,D.,Dietterich,T.:Benchmarkingneuralnetworkrobustnesstocommon corruptions and perturbations. In: Proceedings of the 7th International Conference on Learning Representations (2019)
2019
-
[10]
In: Proceedings of the 5th International Conference on Learning Representations (2017)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of- distribution examples in neural networks. In: Proceedings of the 5th International Conference on Learning Representations (2017)
2017
-
[11]
Machine Learnings110, 457– 506 (2021).https://doi.org/10.1007/s10994-021-05946-3
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learnings110, 457– 506 (2021).https://doi.org/10.1007/s10994-021-05946-3
-
[12]
Inoshita, K.: Bridging the silos in affective AI: A critical perspective from data to society (2026).https://doi.org/10.2139/ssrn.6774479, sSRN
-
[13]
Inoshita, K., Ueno, T.: Bayesian spectral emotion transition discovery from multi- annotator disagreement. arXiv (2026).https://doi.org/10.48550/arXiv.2606. 01906
-
[14]
arXiv (2026).https://doi.org/10.48550/ arXiv.2605.24773
Inoshita, K., Ueno, T.: Uncertainty decomposition via cyclical SG-MCMC and soft-label learning for subjective NLP. arXiv (2026).https://doi.org/10.48550/ arXiv.2605.24773
Pith/arXiv arXiv 2026
-
[15]
Inoshita, K., Zhou, X., Kawai, A., Yada, K.: LLMs capture emotion labels, not emotion uncertainty: Distributional analysis and calibration of human-LLM judg- ment gaps. arXiv (2026).https://doi.org/10.48550/arXiv.2604.27345 16 K. Inoshita and T. Ueno
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.27345 2026
-
[16]
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 5580–5590 (2017).https://doi.org/10. 5555/3295222.3295309
arXiv 2017
-
[17]
In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems. pp. 6405–6416 (2017).https://doi.org/10.5555/3295222.3295387
-
[18]
Le, N., Nguyen, K., Tran, Q., Tjiputra, E., Le, B., Nguyen, A.: Uncertainty-aware label distribution learning for facial expression recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 6088–6097 (2023).https://doi.org/10.1109/WACV56688.2023.00603
-
[19]
Lee, J., Choi, Y., Kim, H., Kim, I.J., Nam, G.P.: Navigating label ambiguity for facial expression recognition in the wild. vol. 39, pp. 4517–4525 (2025).https: //doi.org/10.1609/aaai.v39i4.32476
-
[20]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2852–2861 (2017). https://doi.org/10.1109/CVPR.2017.277
-
[21]
In: Proceedings of the 37th International Conference on Neural In- formation Processing Systems
Mao, A., Mohri, C., Mohri, M., Zhong, Y.: Two-stage learning to defer with mul- tiple experts. In: Proceedings of the 37th International Conference on Neural In- formation Processing Systems. pp. 3578–3606. No. 159 (2023).https://doi.org/ 10.5555/3666122.3666281
-
[22]
Mao, J., Xu, R., Yin, X., Chang, Y., Nie, B., Huang, A., Wang, Y.: POSTER++: A simpler and stronger facial expression recognition network157(C) (2025).https: //doi.org/10.1016/j.patcog.2024.110951
-
[23]
Transactions on Ma- chine Learning Research (2024), arXiv:2304.07193
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fer- nandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Syn- naeve, G., Xu, H., Jégou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual fe...
Pith/arXiv arXiv 2024
-
[24]
Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J., Lak- shminarayanan, B., Snoek, J.: Can you trust your model’s uncertainty? Evaluating predictiveuncertaintyunderdatasetshift.In:Proceedingsofthe33rdInternational Conference on Neural Information Processing Systems. pp. 14003–14014. No. 1254 (2019).https://doi.org/10.5555/3454...
-
[25]
The ``Problem'' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
Plank, B.: The “Problem” of human label variation: On ground truth in data, modeling and evaluation. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 10671–10682 (2022).https://doi. org/10.18653/v1/2022.emnlp-main.731
-
[26]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition
She, J., Hu, Y., Shi, H., Wang, J., Shen, Q., Mei, T.: Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recog- nition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition. pp. 6248–6257 (2021).https://doi.org/10.1109/CVPR46437. 2021.00618
-
[27]
Journal of Artificial Intelligence Research , volume =
Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., Poesio, M.: Learning from disagreement: A survey. Journal of Artificial Intelligence Research72, 1385–1470 (2021).https://doi.org/10.1613/jair.1.12752 Interpretable Uncertainty Routing for FER 17
-
[28]
Local deep im- plicit functions for 3d shape
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large- scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6897–6906 (2020).https:// doi.org/10.1109/CVPR42600.2020.00693
-
[29]
In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision
Wu, Z., Cui, J.: LA-Net: Landmark-aware learning for reliable facial expression recognition under label noise. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision. pp. 20698–20707 (2023).https://doi.org/10.1109/ ICCV51070.2023.01892
arXiv 2023
-
[30]
In: Proceedings of the 35th International Conference on Neural Information Processing Systems
Zhang, Y., Wang, C., Deng, W.: Relative uncertainty learning for facial expres- sion recognition. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. pp. 17616–17627. No. 1348 (2021)
2021
-
[31]
In: Computer Vision – ECCV
Zhang, Y., Wang, C., Ling, X., Deng, W.: Learn from all: Erasing attention con- sistency for noisy label facial expression recognition. In: Computer Vision – ECCV
-
[32]
418–434 (2022).https://doi.org/10.1007/978-3-031-19809-0_24
pp. 418–434 (2022).https://doi.org/10.1007/978-3-031-19809-0_24
-
[33]
Zhang, Z., Zhao, P., Park, E., Yang, J.: MART: Masked affective RepresenTa- tion learning via masked temporal distribution distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12830– 12840 (2024).https://doi.org/10.1109/CVPR52733.2024.01219
-
[34]
Neurocomputing621, 129261 (2025).https:// doi.org/10.1016/j.neucom.2024.129261
Zhou, H., Huang, S., Xu, Y.: UA-FER: Uncertainty-aware representation learning for facial expression recognition. Neurocomputing621, 129261 (2025).https:// doi.org/10.1016/j.neucom.2024.129261
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.