Interpretable Uncertainty Routing Separating Emotion Ambiguity from Distribution Shift in Facial Expression Recognition

Keito Inoshita; Takato Ueno

arxiv: 2606.22725 · v1 · pith:PATSDHC7new · submitted 2026-06-21 · 💻 cs.CV · cs.AI

Interpretable Uncertainty Routing Separating Emotion Ambiguity from Distribution Shift in Facial Expression Recognition

Keito Inoshita , Takato Ueno This is my paper

Pith reviewed 2026-06-26 10:19 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords facial expression recognitionuncertainty decompositionaleatoric uncertaintyepistemic uncertaintyannotator disagreementdistribution shiftrouting mechanism

0 comments

The pith

Uncertainty decomposition separates emotion ambiguity from distribution shift for differentiated routing in facial expression recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Facial expression recognition must handle two distinct problems that call for different responses: faces where human annotators disagree on the expressed emotion, and inputs that fall outside the training distribution. A single uncertainty score mixes the two, but decomposing uncertainty into aleatoric and epistemic components lets the system report ambiguity on the first while rejecting the second. The paper obtains both components from a deep ensemble of fine-tuned models and checks each against an external signal, finding that aleatoric uncertainty aligns with annotator disagreement while epistemic uncertainty flags corrupted images. This split powers an inference-time routing method that keeps substantially more ambiguous but in-distribution faces than a single-uncertainty baseline while maintaining the same out-of-distribution rejection rate. The advantage is shown to come specifically from the ability to choose different actions rather than from uncertainty measurement alone.

Core claim

Uncertainty-Aware Routing exploits the separation of aleatoric uncertainty, which recovers human annotator disagreement at Spearman correlation 0.66, from epistemic uncertainty, which detects corruption-induced distribution shift at average AUROC 0.699. The routing mechanism therefore reports ambiguity for in-distribution faces and rejects out-of-distribution inputs, retaining approximately 1.8 times more ambiguous faces than single-uncertainty routing at a matched out-of-distribution rejection rate. A label-distribution-learning baseline recovers disagreement comparably yet cannot perform the differentiated routing because it lacks the separation.

What carries the argument

Uncertainty-Aware Routing (UAR), an inference-time mechanism that applies separate thresholds to aleatoric uncertainty for reporting ambiguity and to epistemic uncertainty for rejection.

If this is right

Ambiguous in-distribution faces can be surfaced with their disagreement level instead of being discarded.
Out-of-distribution inputs can be rejected without also discarding valid but ambiguous cases.
Label distribution learning recovers annotator disagreement but supplies no mechanism for choosing different actions on shift.
The separation enables interpretable selection between reporting and rejection at inference time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition could support routing in other label-ambiguous tasks such as medical image diagnosis where both disagreement and domain shift appear.
Real-world deployment could route low-ambiguity in-distribution cases to automated output while routing high-ambiguity or shifted cases to human review.
Extending the validation beyond synthetic corruptions to natural domain shifts would test whether epistemic uncertainty remains a reliable shift detector.

Load-bearing premise

That the aleatoric component extracted from the ensemble is a faithful proxy for human annotator disagreement and the epistemic component is a faithful proxy for distribution shift induced by image corruptions.

What would settle it

A direct comparison of UAR routing decisions against human judgments on whether each face should be reported with its ambiguity or rejected outright.

Figures

Figures reproduced from arXiv: 2606.22725 by Keito Inoshita, Takato Ueno.

**Figure 1.** Figure 1: Overall pipeline of the proposed dual-validated uncertainty decomposition and routing framework. Selective prediction allows a model to abstain on uncertain inputs, reducing risk at the cost of coverage [6]. Confidence estimation based on failure prediction [2] improves abstention criteria, and extensions to learning-to-defer [21] enable routing to multiple specialists. In subjective tasks such as emotio… view at source ↗

**Figure 2.** Figure 2: UAR routing mechanism: each input is assigned to one of three actions based on independent thresholds on Hepi and Hale. the top-τ percentile are treated as positive examples of high disagreement, and the area under the ROC curve (AUROC) of Hale for this binary classification task, together with the Spearman correlation between Hale and d, measures how faithfully aleatoric uncertainty tracks annotator disag… view at source ↗

**Figure 3.** Figure 3: Scatter of decomposed uncertainties on clean FERPlus test images and OOD inputs, coloured by annotator disagreement; red dash-dot line: single-scalar threshold. Evaluation metrics. Accuracy, expected calibration error (ECE), Jensen– Shannon divergence to the human voting distribution, AUROC, and Spearman correlation are reported. All key values carry 95% confidence intervals from 2,000 bootstrap iterations… view at source ↗

**Figure 4.** Figure 4: Dual-validation results: panel (a) Hale deciles vs. mean annotator disagreement; panel (b) OOD detection AUROC by corruption severity and type. for aleatoric and 0.861 for epistemic, with the gap robust across positive-example thresholds from the top 20% to 50% of voting entropy (0.931 to 0.836). LDL achieves comparable recovery (ρ = 0.671, ADD AUROC 0.910); all ensemble members are trained with hard labe… view at source ↗

**Figure 5.** Figure 5: Routing performance comparison: panel (a) aggregate routing AUC across all methods; panel (b) per-corruption routing AUC for decomposed epistemic versus single maximum probability. ature calibration is monotone and preserves rankings, epistemic detection remains superior to the temperature-calibrated baseline. Paired bootstrap tests at the highest severity confirm significant advantages for Gaussian noise… view at source ↗

read the original abstract

Facial expression recognition (FER) is inherently ambiguous: human annotators frequently disagree, and models deployed in real environments face distribution shift. Crucially, these two conditions demand different downstream actions, as ambiguous in-distribution faces should be reported with their ambiguity whereas out-of-distribution inputs should be rejected. However, a single uncertainty score conflates the two. In this study, uncertainty decomposition into aleatoric and epistemic components for FER is investigated, and Uncertainty-Aware Routing (UAR), an inference-time routing mechanism that exploits the separation, is introduced. Specifically, aleatoric and epistemic uncertainties are obtained from a Deep Ensemble of fully fine-tuned DINOv2 models and are each validated against an independent external signal: aleatoric against human annotator disagreement, and epistemic against distribution shift induced by image corruptions. The proposed dual-validation protocol reveals that aleatoric recovers annotator disagreement with Spearman correlation 0.66 (95% CI: 0.64-0.68), and epistemic detects corruption-induced shifts, achieving average AUROC of 0.699 at the highest corruption severity. UAR retains approximately 1.8 times more ambiguous in-distribution faces than single-uncertainty routing at a matched out-of-distribution rejection rate. A strong label-distribution-learning baseline achieves comparable disagreement recovery but cannot separate ambiguity from shift and therefore cannot route, establishing that the value of decomposition lies in the separation enabling interpretable and differentiated action selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a concrete way to split aleatoric and epistemic uncertainty in FER, validate each against its own external signal, and get a 1.8x retention gain on ambiguous faces at matched OOD rejection, but the components' specificity to those signals is not checked.

read the letter

The main point is that this work shows how to pull apart two kinds of uncertainty in facial expression recognition so the system can keep ambiguous in-distribution faces while rejecting shifted ones, rather than treating both the same way.

They run a deep ensemble of fine-tuned DINOv2 models, extract aleatoric and epistemic scores, and tie each to an independent external check: annotator disagreement for aleatoric (Spearman 0.66) and corruption-induced shifts for epistemic (average AUROC 0.699 at highest severity). The routing rule built on that split keeps roughly 1.8 times more ambiguous faces than a single total-uncertainty baseline at the same rejection rate. A strong label-distribution baseline recovers disagreement about as well but cannot route, so the gain really comes from the separation.

The dual external validation is the part that works cleanly. It keeps the checks from being circular with the model's own parameters.

The soft spot is the missing cross-check. Nothing in the abstract shows whether the aleatoric score stays insensitive to corruptions or whether the epistemic score stays insensitive to annotator disagreement. If either component leaks across signals, the routing advantage cannot be credited to a clean decomposition. That gap is worth fixing but does not sink the central result.

This is for groups working on uncertainty-aware vision systems that need differentiated actions at inference time. The evidence is specific enough and the numbers are reported against external signals, so it deserves a serious referee even if the specificity tests need to be added.

Referee Report

2 major / 2 minor

Summary. The paper claims that uncertainty in facial expression recognition can be decomposed into aleatoric and epistemic components using a Deep Ensemble of fine-tuned DINOv2 models. Aleatoric uncertainty is validated against human annotator disagreement (Spearman 0.66), epistemic against corruption-induced distribution shifts (average AUROC 0.699), and the resulting Uncertainty-Aware Routing (UAR) retains ~1.8x more ambiguous in-distribution faces than single-uncertainty routing at matched OOD rejection rates, while a label-distribution-learning baseline cannot separate the signals for routing.

Significance. If the components are specific to their target signals, the work provides a practical mechanism for differentiated actions in FER deployment (report ambiguity vs. reject shift). The dual-validation against independent external signals and the quantitative retention gain are concrete strengths that would support the value of decomposition over conflated uncertainty.

major comments (2)

[Abstract and §4] Abstract and §4 (dual-validation protocol): Spearman 0.66 for aleatoric vs. annotator disagreement and AUROC 0.699 for epistemic vs. corruptions are reported, but no cross-sensitivity results are given (e.g., does aleatoric rise under corruptions; does epistemic rise with annotator disagreement). This test is load-bearing for the claim that the decomposition enables clean, interpretable routing separation.
[§4.3] §4.3 (UAR evaluation): the 1.8x retention advantage at matched OOD rejection rate is attributed to the interpretable decomposition, yet without the cross-sensitivity evidence the gain cannot be unambiguously credited to separation rather than possible entanglement of the two uncertainty estimates.

minor comments (2)

[Methods] Methods section: the number of ensemble members and the precise formulas used to extract aleatoric (e.g., expected entropy) and epistemic (e.g., mutual information) uncertainties from the DINOv2 ensemble predictions are not stated.
[Abstract] Abstract: the 95% CI (0.64-0.68) on the Spearman correlation is given without the underlying sample size or computation method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that cross-sensitivity tests would strengthen the evidence for clean separation of the uncertainty components. We address each point below and will revise the manuscript to incorporate the requested analyses.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (dual-validation protocol): Spearman 0.66 for aleatoric vs. annotator disagreement and AUROC 0.699 for epistemic vs. corruptions are reported, but no cross-sensitivity results are given (e.g., does aleatoric rise under corruptions; does epistemic rise with annotator disagreement). This test is load-bearing for the claim that the decomposition enables clean, interpretable routing separation.

Authors: We agree that the absence of cross-sensitivity results leaves open the possibility of entanglement. Our dual-validation protocol uses independent external signals, but we did not explicitly test whether aleatoric uncertainty increases under corruptions or whether epistemic uncertainty correlates with annotator disagreement. We will compute and report these cross-sensitivity results (including quantitative measures and visualizations) in the revised §4 and abstract to directly address this concern. revision: yes
Referee: [§4.3] §4.3 (UAR evaluation): the 1.8x retention advantage at matched OOD rejection rate is attributed to the interpretable decomposition, yet without the cross-sensitivity evidence the gain cannot be unambiguously credited to separation rather than possible entanglement of the two uncertainty estimates.

Authors: The 1.8x retention gain is measured using the separated uncertainties for differentiated routing actions. We acknowledge that without cross-sensitivity evidence it is not possible to fully rule out entanglement as an alternative explanation for the observed advantage. We will add the cross-sensitivity results and revise the discussion and attribution in §4.3 to reflect the new evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external validations independent of routing rule

full rationale

The paper obtains aleatoric and epistemic uncertainties from a Deep Ensemble of fine-tuned DINOv2 models using standard decomposition. These are validated against independent external signals (human annotator disagreement via Spearman correlation, corruption-induced shifts via AUROC), which are not derived from the same fitted parameters or routing rule. The UAR retention advantage (1.8x) is reported as an empirical comparison at matched rejection rates against single-uncertainty routing and a label-distribution-learning baseline. No equations or claims reduce by construction to inputs; no self-citations are invoked as load-bearing uniqueness theorems; the separation enabling differentiated actions is measured against quantities outside the model (annotator labels, synthetic corruptions). This is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard ensemble variance for epistemic uncertainty and on the assumption that model disagreement on corrupted images proxies real distribution shift; no new entities are postulated and no free parameters are explicitly fitted beyond standard training choices.

axioms (1)

domain assumption Deep ensemble disagreement separates aleatoric from epistemic uncertainty in the manner required for the routing rule.
Invoked when the paper states that aleatoric and epistemic components are obtained from the ensemble and each validated separately.

pith-pipeline@v0.9.1-grok · 5793 in / 1468 out tokens · 42376 ms · 2026-06-26T10:19:29.195755+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 22 canonical work pages · 1 internal anchor

[1]

In: Proceedings of the 18th ACM International Conference on Multimodal Interactio

Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interactio. pp. 279–283 (2016).https://doi.org/10.1145/2993148.2993165

work page doi:10.1145/2993148.2993165 2016
[2]

In: Proceedings of the 33rd International Conference on Neural Information Processing Systems

Corbière, C., Thome, N., Bar-Hen, A., Cord, M., Pérez, P.: Addressing failure prediction by learning model confidence. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. pp. 2902–2913. No. 261 (2019).https://doi.org/10.5555/3454287.3454548

work page doi:10.5555/3454287.3454548 2019
[3]

In: Proceedings of the 35 th International Conference on Machine Learning

Depeweg, S., Hernández-Lobato, J.M., Doshi-Velez, F., Udluft, S.: Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In: Proceedings of the 35 th International Conference on Machine Learning. pp. 1184–1193 (2018)

2018
[4]

In: Proceedings of the 9th International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16×16 words: Transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations (2021)

2021
[5]

In: Proceedings of The 33rd International Conference on Machine Learning

Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. pp. 1050–1059 (2016)

2016
[6]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 4885–4894 (2017).https://doi.org/10.5555/3295222.3295241

work page doi:10.5555/3295222.3295241 2017
[7]

IEEE Transactions on Knowledge and Data Engineering28(7), 1734–1748 (2016).https://doi.org/10.1109/TKDE.2016

Geng, X.: Label distribution learning. IEEE Transactions on Knowledge and Data Engineering28(7), 1734–1748 (2016).https://doi.org/10.1109/TKDE.2016. 2545658

work page doi:10.1109/tkde.2016 2016
[8]

In: Proceedings of the 34th International Conference on Machine Learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: Proceedings of the 34th International Conference on Machine Learning. pp. 1321–1330 (2017)

2017
[9]

In: Proceedings of the 7th International Conference on Learning Representations (2019)

Hendrycks,D.,Dietterich,T.:Benchmarkingneuralnetworkrobustnesstocommon corruptions and perturbations. In: Proceedings of the 7th International Conference on Learning Representations (2019)

2019
[10]

In: Proceedings of the 5th International Conference on Learning Representations (2017)

Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of- distribution examples in neural networks. In: Proceedings of the 5th International Conference on Learning Representations (2017)

2017
[11]

Machine Learnings110, 457– 506 (2021).https://doi.org/10.1007/s10994-021-05946-3

Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learnings110, 457– 506 (2021).https://doi.org/10.1007/s10994-021-05946-3

work page doi:10.1007/s10994-021-05946-3 2021
[12]

Inoshita, K.: Bridging the silos in affective AI: A critical perspective from data to society (2026).https://doi.org/10.2139/ssrn.6774479, sSRN

work page doi:10.2139/ssrn.6774479 2026
[13]

Yu, Critical Ledgers and Scale-Defect Cascades for Navier–Stokes, arXiv preprint arXiv:2606.13887 [math.AP], 2026

Inoshita, K., Ueno, T.: Bayesian spectral emotion transition discovery from multi- annotator disagreement. arXiv (2026).https://doi.org/10.48550/arXiv.2606. 01906

work page doi:10.48550/arxiv.2606 2026
[14]

arXiv (2026).https://doi.org/10.48550/ arXiv.2605.24773

Inoshita, K., Ueno, T.: Uncertainty decomposition via cyclical SG-MCMC and soft-label learning for subjective NLP. arXiv (2026).https://doi.org/10.48550/ arXiv.2605.24773

Pith/arXiv arXiv 2026
[15]

LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human-LLM Judgment Gaps

Inoshita, K., Zhou, X., Kawai, A., Yada, K.: LLMs capture emotion labels, not emotion uncertainty: Distributional analysis and calibration of human-LLM judg- ment gaps. arXiv (2026).https://doi.org/10.48550/arXiv.2604.27345 16 K. Inoshita and T. Ueno

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.27345 2026
[16]

Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 5580–5590 (2017).https://doi.org/10. 5555/3295222.3295309

arXiv 2017
[17]

In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems. pp. 6405–6416 (2017).https://doi.org/10.5555/3295222.3295387

work page doi:10.5555/3295222.3295387 2017
[18]

DSFormer: A Dual -domain Self - supervised Transformer for Accelerated Multi -contrast MRI Reconstruction,

Le, N., Nguyen, K., Tran, Q., Tjiputra, E., Le, B., Nguyen, A.: Uncertainty-aware label distribution learning for facial expression recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 6088–6097 (2023).https://doi.org/10.1109/WACV56688.2023.00603

work page doi:10.1109/wacv56688.2023.00603 2023
[19]

Lee, J., Choi, Y., Kim, H., Kim, I.J., Nam, G.P.: Navigating label ambiguity for facial expression recognition in the wild. vol. 39, pp. 4517–4525 (2025).https: //doi.org/10.1609/aaai.v39i4.32476

work page doi:10.1609/aaai.v39i4.32476 2025
[20]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2852–2861 (2017). https://doi.org/10.1109/CVPR.2017.277

work page doi:10.1109/cvpr.2017.277 2017
[21]

In: Proceedings of the 37th International Conference on Neural In- formation Processing Systems

Mao, A., Mohri, C., Mohri, M., Zhong, Y.: Two-stage learning to defer with mul- tiple experts. In: Proceedings of the 37th International Conference on Neural In- formation Processing Systems. pp. 3578–3606. No. 159 (2023).https://doi.org/ 10.5555/3666122.3666281

work page doi:10.5555/3666122.3666281 2023
[22]

Mao, J., Xu, R., Yin, X., Chang, Y., Nie, B., Huang, A., Wang, Y.: POSTER++: A simpler and stronger facial expression recognition network157(C) (2025).https: //doi.org/10.1016/j.patcog.2024.110951

work page doi:10.1016/j.patcog.2024.110951 2025
[23]

Transactions on Ma- chine Learning Research (2024), arXiv:2304.07193

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fer- nandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Syn- naeve, G., Xu, H., Jégou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual fe...

Pith/arXiv arXiv 2024
[24]

Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J., Lak- shminarayanan, B., Snoek, J.: Can you trust your model’s uncertainty? Evaluating predictiveuncertaintyunderdatasetshift.In:Proceedingsofthe33rdInternational Conference on Neural Information Processing Systems. pp. 14003–14014. No. 1254 (2019).https://doi.org/10.5555/3454...

work page doi:10.5555/3454287.3455541 2019
[25]

The ``Problem'' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Plank, B.: The “Problem” of human label variation: On ground truth in data, modeling and evaluation. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 10671–10682 (2022).https://doi. org/10.18653/v1/2022.emnlp-main.731

work page doi:10.18653/v1/2022.emnlp-main.731 2022
[26]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition

She, J., Hu, Y., Shi, H., Wang, J., Shen, Q., Mei, T.: Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recog- nition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition. pp. 6248–6257 (2021).https://doi.org/10.1109/CVPR46437. 2021.00618

work page doi:10.1109/cvpr46437 2021
[27]

Journal of Artificial Intelligence Research , volume =

Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., Poesio, M.: Learning from disagreement: A survey. Journal of Artificial Intelligence Research72, 1385–1470 (2021).https://doi.org/10.1613/jair.1.12752 Interpretable Uncertainty Routing for FER 17

work page doi:10.1613/jair.1.12752 2021
[28]

Local deep im- plicit functions for 3d shape

Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large- scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6897–6906 (2020).https:// doi.org/10.1109/CVPR42600.2020.00693

work page doi:10.1109/cvpr42600.2020.00693 2020
[29]

In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision

Wu, Z., Cui, J.: LA-Net: Landmark-aware learning for reliable facial expression recognition under label noise. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision. pp. 20698–20707 (2023).https://doi.org/10.1109/ ICCV51070.2023.01892

arXiv 2023
[30]

In: Proceedings of the 35th International Conference on Neural Information Processing Systems

Zhang, Y., Wang, C., Deng, W.: Relative uncertainty learning for facial expres- sion recognition. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. pp. 17616–17627. No. 1348 (2021)

2021
[31]

In: Computer Vision – ECCV

Zhang, Y., Wang, C., Ling, X., Deng, W.: Learn from all: Erasing attention con- sistency for noisy label facial expression recognition. In: Computer Vision – ECCV
[32]

418–434 (2022).https://doi.org/10.1007/978-3-031-19809-0_24

pp. 418–434 (2022).https://doi.org/10.1007/978-3-031-19809-0_24

work page doi:10.1007/978-3-031-19809-0_24 2022
[33]

2024 , pages =

Zhang, Z., Zhao, P., Park, E., Yang, J.: MART: Masked affective RepresenTa- tion learning via masked temporal distribution distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12830– 12840 (2024).https://doi.org/10.1109/CVPR52733.2024.01219

work page doi:10.1109/cvpr52733.2024.01219 2024
[34]

Neurocomputing621, 129261 (2025).https:// doi.org/10.1016/j.neucom.2024.129261

Zhou, H., Huang, S., Xu, Y.: UA-FER: Uncertainty-aware representation learning for facial expression recognition. Neurocomputing621, 129261 (2025).https:// doi.org/10.1016/j.neucom.2024.129261

work page doi:10.1016/j.neucom.2024.129261 2025

[1] [1]

In: Proceedings of the 18th ACM International Conference on Multimodal Interactio

Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interactio. pp. 279–283 (2016).https://doi.org/10.1145/2993148.2993165

work page doi:10.1145/2993148.2993165 2016

[2] [2]

In: Proceedings of the 33rd International Conference on Neural Information Processing Systems

Corbière, C., Thome, N., Bar-Hen, A., Cord, M., Pérez, P.: Addressing failure prediction by learning model confidence. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. pp. 2902–2913. No. 261 (2019).https://doi.org/10.5555/3454287.3454548

work page doi:10.5555/3454287.3454548 2019

[3] [3]

In: Proceedings of the 35 th International Conference on Machine Learning

Depeweg, S., Hernández-Lobato, J.M., Doshi-Velez, F., Udluft, S.: Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In: Proceedings of the 35 th International Conference on Machine Learning. pp. 1184–1193 (2018)

2018

[4] [4]

In: Proceedings of the 9th International Conference on Learning Representations (2021)

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16×16 words: Transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations (2021)

2021

[5] [5]

In: Proceedings of The 33rd International Conference on Machine Learning

Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. pp. 1050–1059 (2016)

2016

[6] [6]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 4885–4894 (2017).https://doi.org/10.5555/3295222.3295241

work page doi:10.5555/3295222.3295241 2017

[7] [7]

IEEE Transactions on Knowledge and Data Engineering28(7), 1734–1748 (2016).https://doi.org/10.1109/TKDE.2016

Geng, X.: Label distribution learning. IEEE Transactions on Knowledge and Data Engineering28(7), 1734–1748 (2016).https://doi.org/10.1109/TKDE.2016. 2545658

work page doi:10.1109/tkde.2016 2016

[8] [8]

In: Proceedings of the 34th International Conference on Machine Learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neu- ral networks. In: Proceedings of the 34th International Conference on Machine Learning. pp. 1321–1330 (2017)

2017

[9] [9]

In: Proceedings of the 7th International Conference on Learning Representations (2019)

Hendrycks,D.,Dietterich,T.:Benchmarkingneuralnetworkrobustnesstocommon corruptions and perturbations. In: Proceedings of the 7th International Conference on Learning Representations (2019)

2019

[10] [10]

In: Proceedings of the 5th International Conference on Learning Representations (2017)

Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of- distribution examples in neural networks. In: Proceedings of the 5th International Conference on Learning Representations (2017)

2017

[11] [11]

Machine Learnings110, 457– 506 (2021).https://doi.org/10.1007/s10994-021-05946-3

Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learnings110, 457– 506 (2021).https://doi.org/10.1007/s10994-021-05946-3

work page doi:10.1007/s10994-021-05946-3 2021

[12] [12]

Inoshita, K.: Bridging the silos in affective AI: A critical perspective from data to society (2026).https://doi.org/10.2139/ssrn.6774479, sSRN

work page doi:10.2139/ssrn.6774479 2026

[13] [13]

Yu, Critical Ledgers and Scale-Defect Cascades for Navier–Stokes, arXiv preprint arXiv:2606.13887 [math.AP], 2026

Inoshita, K., Ueno, T.: Bayesian spectral emotion transition discovery from multi- annotator disagreement. arXiv (2026).https://doi.org/10.48550/arXiv.2606. 01906

work page doi:10.48550/arxiv.2606 2026

[14] [14]

arXiv (2026).https://doi.org/10.48550/ arXiv.2605.24773

Inoshita, K., Ueno, T.: Uncertainty decomposition via cyclical SG-MCMC and soft-label learning for subjective NLP. arXiv (2026).https://doi.org/10.48550/ arXiv.2605.24773

Pith/arXiv arXiv 2026

[15] [15]

LLMs Capture Emotion Labels, Not Emotion Uncertainty: Distributional Analysis and Calibration of Human-LLM Judgment Gaps

Inoshita, K., Zhou, X., Kawai, A., Yada, K.: LLMs capture emotion labels, not emotion uncertainty: Distributional analysis and calibration of human-LLM judg- ment gaps. arXiv (2026).https://doi.org/10.48550/arXiv.2604.27345 16 K. Inoshita and T. Ueno

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.27345 2026

[16] [16]

Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 5580–5590 (2017).https://doi.org/10. 5555/3295222.3295309

arXiv 2017

[17] [17]

In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Proceedings of the 31st Inter- national Conference on Neural Information Processing Systems. pp. 6405–6416 (2017).https://doi.org/10.5555/3295222.3295387

work page doi:10.5555/3295222.3295387 2017

[18] [18]

DSFormer: A Dual -domain Self - supervised Transformer for Accelerated Multi -contrast MRI Reconstruction,

Le, N., Nguyen, K., Tran, Q., Tjiputra, E., Le, B., Nguyen, A.: Uncertainty-aware label distribution learning for facial expression recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 6088–6097 (2023).https://doi.org/10.1109/WACV56688.2023.00603

work page doi:10.1109/wacv56688.2023.00603 2023

[19] [19]

Lee, J., Choi, Y., Kim, H., Kim, I.J., Nam, G.P.: Navigating label ambiguity for facial expression recognition in the wild. vol. 39, pp. 4517–4525 (2025).https: //doi.org/10.1609/aaai.v39i4.32476

work page doi:10.1609/aaai.v39i4.32476 2025

[20] [20]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2852–2861 (2017). https://doi.org/10.1109/CVPR.2017.277

work page doi:10.1109/cvpr.2017.277 2017

[21] [21]

In: Proceedings of the 37th International Conference on Neural In- formation Processing Systems

Mao, A., Mohri, C., Mohri, M., Zhong, Y.: Two-stage learning to defer with mul- tiple experts. In: Proceedings of the 37th International Conference on Neural In- formation Processing Systems. pp. 3578–3606. No. 159 (2023).https://doi.org/ 10.5555/3666122.3666281

work page doi:10.5555/3666122.3666281 2023

[22] [22]

Mao, J., Xu, R., Yin, X., Chang, Y., Nie, B., Huang, A., Wang, Y.: POSTER++: A simpler and stronger facial expression recognition network157(C) (2025).https: //doi.org/10.1016/j.patcog.2024.110951

work page doi:10.1016/j.patcog.2024.110951 2025

[23] [23]

Transactions on Ma- chine Learning Research (2024), arXiv:2304.07193

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fer- nandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Syn- naeve, G., Xu, H., Jégou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual fe...

Pith/arXiv arXiv 2024

[24] [24]

Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J., Lak- shminarayanan, B., Snoek, J.: Can you trust your model’s uncertainty? Evaluating predictiveuncertaintyunderdatasetshift.In:Proceedingsofthe33rdInternational Conference on Neural Information Processing Systems. pp. 14003–14014. No. 1254 (2019).https://doi.org/10.5555/3454...

work page doi:10.5555/3454287.3455541 2019

[25] [25]

The ``Problem'' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Plank, B.: The “Problem” of human label variation: On ground truth in data, modeling and evaluation. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 10671–10682 (2022).https://doi. org/10.18653/v1/2022.emnlp-main.731

work page doi:10.18653/v1/2022.emnlp-main.731 2022

[26] [26]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition

She, J., Hu, Y., Shi, H., Wang, J., Shen, Q., Mei, T.: Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recog- nition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition. pp. 6248–6257 (2021).https://doi.org/10.1109/CVPR46437. 2021.00618

work page doi:10.1109/cvpr46437 2021

[27] [27]

Journal of Artificial Intelligence Research , volume =

Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., Poesio, M.: Learning from disagreement: A survey. Journal of Artificial Intelligence Research72, 1385–1470 (2021).https://doi.org/10.1613/jair.1.12752 Interpretable Uncertainty Routing for FER 17

work page doi:10.1613/jair.1.12752 2021

[28] [28]

Local deep im- plicit functions for 3d shape

Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large- scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6897–6906 (2020).https:// doi.org/10.1109/CVPR42600.2020.00693

work page doi:10.1109/cvpr42600.2020.00693 2020

[29] [29]

In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision

Wu, Z., Cui, J.: LA-Net: Landmark-aware learning for reliable facial expression recognition under label noise. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision. pp. 20698–20707 (2023).https://doi.org/10.1109/ ICCV51070.2023.01892

arXiv 2023

[30] [30]

In: Proceedings of the 35th International Conference on Neural Information Processing Systems

Zhang, Y., Wang, C., Deng, W.: Relative uncertainty learning for facial expres- sion recognition. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. pp. 17616–17627. No. 1348 (2021)

2021

[31] [31]

In: Computer Vision – ECCV

Zhang, Y., Wang, C., Ling, X., Deng, W.: Learn from all: Erasing attention con- sistency for noisy label facial expression recognition. In: Computer Vision – ECCV

[32] [32]

418–434 (2022).https://doi.org/10.1007/978-3-031-19809-0_24

pp. 418–434 (2022).https://doi.org/10.1007/978-3-031-19809-0_24

work page doi:10.1007/978-3-031-19809-0_24 2022

[33] [33]

2024 , pages =

Zhang, Z., Zhao, P., Park, E., Yang, J.: MART: Masked affective RepresenTa- tion learning via masked temporal distribution distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12830– 12840 (2024).https://doi.org/10.1109/CVPR52733.2024.01219

work page doi:10.1109/cvpr52733.2024.01219 2024

[34] [34]

Neurocomputing621, 129261 (2025).https:// doi.org/10.1016/j.neucom.2024.129261

Zhou, H., Huang, S., Xu, Y.: UA-FER: Uncertainty-aware representation learning for facial expression recognition. Neurocomputing621, 129261 (2025).https:// doi.org/10.1016/j.neucom.2024.129261

work page doi:10.1016/j.neucom.2024.129261 2025