Recognition: no theorem link
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification
Pith reviewed 2026-05-10 18:35 UTC · model grok-4.3
The pith
Fitting Dirichlet distributions via method of moments to ensembles of softmax outputs produces stable predictive uncertainty estimates that improve selective classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By treating an ensemble of softmax probability vectors as samples from an underlying Dirichlet distribution and estimating its concentration parameters with the method of moments, the resulting predictive distribution yields uncertainty estimates whose stability and calibration outperform those obtained from evidential training, producing measurable gains in downstream tasks such as confidence-based ranking and selective classification on standard image and text datasets.
What carries the argument
The method-of-moments estimator that matches the first two moments of an ensemble of softmax vectors to the mean and variance parameters of a Dirichlet distribution, thereby converting an implicit ensemble into an explicit parametric predictive distribution.
If this is right
- Uncertainty estimates become less sensitive to random seeds and training details because the ensemble averages out single-run fluctuations.
- Selective classification can safely reject a larger fraction of inputs while keeping error rate low, since the Dirichlet variance better flags low-confidence cases.
- Uncertainty estimation is decoupled from the choice of evidential loss, prior strength, or activation function, removing a source of hyperparameter fragility.
- The same ensemble outputs already produced during training can be reused for Dirichlet fitting without requiring a second training stage.
Where Pith is reading between the lines
- The same moment-matching idea could be tested on ensembles from non-neural models whose outputs are probability vectors.
- Combining the Dirichlet fit with cheaper ensemble approximations such as Monte-Carlo dropout would test whether full independent retraining is necessary.
- If the Dirichlet fit quality correlates with downstream task gains, one could monitor the moment-matching residual as a cheap diagnostic for when more ensemble members are needed.
Load-bearing premise
That the spread of softmax outputs across independent training runs is sufficiently well described by a single Dirichlet distribution for the moment-matching estimator to recover useful uncertainty values.
What would settle it
If, on a held-out dataset, the area under the selective-classification risk curve or the ranking quality of uncertainty scores shows no improvement over standard softmax baselines or evidential models, the performance advantage would be refuted.
read the original abstract
Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that applying a method of moments estimator (with optional MLE refinement) to ensembles of softmax outputs from cross-entropy trained networks produces stable Dirichlet concentration parameters, yielding explicit predictive distributions that improve stability and performance over evidential deep learning in uncertainty-guided tasks such as confidence scoring and selective classification.
Significance. If the empirical gains hold and are attributable to the Dirichlet construction rather than ensembling alone, the approach provides a practical, less fragile alternative to evidential training by leveraging standard cross-entropy ensembles for reliable predictive uncertainty on the simplex.
major comments (2)
- [Section 3.2] The load-bearing assumption that ensemble softmax outputs are sufficiently well-described by a Dirichlet distribution for the method-of-moments estimator to yield reliable epistemic uncertainty is not supported by diagnostics; no goodness-of-fit tests, QQ plots, or comparisons of empirical moments versus fitted Dirichlet are reported in the experimental section.
- [Table 4] Table 4 (selective classification results): performance is compared only to evidential baselines and single-model cross-entropy, but lacks an ablation against plain ensemble averaging of softmax probabilities without the Dirichlet fitting step; this leaves open whether gains are due to the proposed modeling or to ensembling per se.
minor comments (2)
- [Eq. (7)] Notation for the concentration vector alpha is introduced without explicitly distinguishing the raw method-of-moments estimate from the optionally refined MLE version in the predictive distribution formula.
- [Figure 2] Figure 2 caption does not specify the number of ensemble members used or the random seed variation across runs, making reproducibility of the stability claims difficult.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below with clarifications and commit to revisions that strengthen the empirical support for our claims without altering the core contribution.
read point-by-point responses
-
Referee: [Section 3.2] The load-bearing assumption that ensemble softmax outputs are sufficiently well-described by a Dirichlet distribution for the method-of-moments estimator to yield reliable epistemic uncertainty is not supported by diagnostics; no goodness-of-fit tests, QQ plots, or comparisons of empirical moments versus fitted Dirichlet are reported in the experimental section.
Authors: We agree that explicit validation of the Dirichlet approximation strengthens the presentation. The method-of-moments estimator is chosen precisely because it is a standard, closed-form procedure for fitting Dirichlet distributions to simplex-valued data, and the resulting predictive distributions demonstrably improve stability over single-run cross-entropy and evidential baselines. In the revised manuscript we will add QQ plots of the marginals and direct comparisons of empirical versus fitted first- and second-order moments on the evaluation sets to quantify the quality of the approximation. revision: yes
-
Referee: [Table 4] Table 4 (selective classification results): performance is compared only to evidential baselines and single-model cross-entropy, but lacks an ablation against plain ensemble averaging of softmax probabilities without the Dirichlet fitting step; this leaves open whether gains are due to the proposed modeling or to ensembling per se.
Authors: This is a fair criticism. While the Dirichlet construction supplies an explicit predictive distribution (rather than a point estimate) that enables the uncertainty-guided decisions in selective classification, an ablation against the plain ensemble mean is necessary to isolate the contribution of the fitting step. We will add this comparison to the revised Table 4, reporting selective-classification curves for the ensemble-averaged softmax probabilities alongside our Dirichlet-based results. revision: yes
Circularity Check
No circularity: method-of-moments Dirichlet fitting on ensembles is a direct statistical estimator
full rationale
The paper's core construction applies a standard method-of-moments estimator (with optional MLE) directly to the empirical distribution of softmax outputs from an ensemble of independently trained cross-entropy networks. This produces explicit Dirichlet concentration parameters without any self-definitional loop, without renaming a fitted quantity as a 'prediction,' and without load-bearing self-citations or uniqueness theorems. Downstream gains in confidence scoring and selective classification are presented as empirical outcomes on held-out data rather than algebraic consequences of the fitting procedure itself. The derivation chain therefore remains self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Softmax probability vectors from independently trained neural networks can be aggregated to parameterize a Dirichlet distribution that represents predictive uncertainty.
Reference graph
Works this paper leans on
-
[1]
On calibration of modern neural networks,
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On Calibration of Modern Neural Networks (2017). https://arxiv.org/abs/1706.04599
-
[2]
https://arxiv.org/abs/2012
Tomani, C., Buettner, F.: Towards Trustworthy Predictions from Deep Neural Networks with Fast Adversarial Calibration (2021). https://arxiv.org/abs/2012. 10923
2021
-
[3]
https://arxiv.org/abs/1701.05369
Molchanov, D., Ashukha, A., Vetrov, D.: Variational Dropout Sparsifies Deep Neural Networks (2017). https://arxiv.org/abs/1701.05369
-
[4]
https://arxiv.org/abs/1506.02557
Kingma, D.P., Salimans, T., Welling, M.: Variational Dropout and the Local Reparameterization Trick (2015). https://arxiv.org/abs/1506.02557
-
[5]
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016). https://arxiv.org/abs/1506.02142
work page Pith review arXiv 2016
-
[6]
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and Scalable Predic- tive Uncertainty Estimation using Deep Ensembles (2017). https://arxiv.org/abs/ 1612.01474
work page Pith review arXiv 2017
-
[7]
https://arxiv.org/ abs/2002.06470
Ashukha, A., Lyzhov, A., Molchanov, D., Vetrov, D.: Pitfalls of In-Domain Uncer- tainty Estimation and Ensembling in Deep Learning (2021). https://arxiv.org/ abs/2002.06470
-
[8]
Manning Publications Co
Monarch, R.: Human-in-the-loop Machine Learning: Active Learning and Anno- tation for Human-centered AI. Manning Publications Co. LLC, New York (2021)
2021
-
[9]
https://arxiv.org/abs/1806.01768
Sensoy, M., Kaplan, L., Kandemir, M.: Evidential Deep Learning to Quantify Classification Uncertainty (2018). https://arxiv.org/abs/1806.01768
-
[10]
Ulmer, D.T., Hardmeier, C., Frellsen, J.: Prior and posterior networks: A survey on evidential deep learning methods for uncertainty estimation. Transactions on Machine Learning Research (2023). arXiv preprint arXiv:2110.03051
-
[11]
Springer, Cham (2016)
Jøsang, A., service), S.O.: Subjective Logic: A Formalism for Reasoning Under Uncertainty, 1st edn. Springer, Cham (2016)
2016
-
[12]
Sensoy, M., Saleki, M., Julier, S., Aydogan, R., Reid, J.: Misclassification risk and uncertainty quantification in deep classifiers. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2483–2491 (2021). https://doi. org/10.1109/WACV48630.2021.00253
-
[13]
Mira Juergens, Nis Meinert, Viktor Bengs, Eyke Hüllermeier, and Willem Waegeman
J¨ urgens, M., Meinert, N., Bengs, V., H¨ ullermeier, E., Waegeman, W.: Is Epis- temic Uncertainty Faithfully Represented by Evidential Deep Learning Methods? (2024). https://arxiv.org/abs/2402.09056 43
-
[14]
https:// arxiv.org/abs/2310.12663
Davies, C., Vilamala, M.R., Preece, A.D., Cerutti, F., Kaplan, L.M., Chakraborty, S.: Knowledge from Uncertainty in Evidential Deep Learning (2023). https:// arxiv.org/abs/2310.12663
-
[15]
https://arxiv.org/abs/2402.06160
Shen, M., Ryu, J.J., Ghosh, S., Bu, Y., Sattigeri, P., Das, S., Wornell, G.W.: Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage? (2024). https://arxiv.org/abs/2402.06160
-
[16]
https://arxiv.org/abs/1802.10501
Malinin, A., Gales, M.: Predictive Uncertainty Estimation via Prior Networks (2018). https://arxiv.org/abs/1802.10501
-
[17]
https://arxiv.org/abs/ 2512.23753
Pandey, D.S., Choi, H., Yu, Q.: Generalized Regularized Evidential Deep Learning Models: Theory and Comprehensive Evaluation (2025). https://arxiv.org/abs/ 2512.23753
-
[18]
https://arxiv.org/abs/2010.05784
Wang, H., Yu, Z., Yue, Y., Anandkumar, A., Liu, A., Yan, J.: Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach (2024). https://arxiv.org/abs/2010.05784
- [19]
-
[20]
Yager, R., Liu, L.: Classic Works of the Dempster-Shafer Theory of Belief Func- tions vol. 219. Springer, ??? (2008). https://doi.org/10.1007/978-3-540-44792-4
-
[21]
https://arxiv.org/abs/2412.03391
Sensoy, M., Kaplan, L.M., Julier, S., Saleki, M., Cerutti, F.: Risk-aware Classifi- cation via Uncertainty Quantification (2024). https://arxiv.org/abs/2412.03391
-
[22]
Franzen, C., Pourkamali-Anaraki, F.: “out-of-the-box” uncertainty: Reducing confident errors with dirichlet classifiers. In: 2025 International Conference on Machine Learning and Applications (ICMLA). IEEE, ??? (2025). https://doi.org/ 10.1109/ICMLA66185.2025.00014
-
[23]
Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., Carin, L.: Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing (2019). https://arxiv. org/abs/1903.10145
-
[24]
https://arxiv.org/abs/1602.02282
Sønderby, C.K., Raiko, T., Maaløe, L., Sønderby, S.K., Winther, O.: Ladder Variational Autoencoders (2016). https://arxiv.org/abs/1602.02282
-
[25]
He, J., Spokoyny, D., Neubig, G., Berg-Kirkpatrick, T.: Lagging Inference Net- works and Posterior Collapse in Variational Autoencoders (2019). https://arxiv. org/abs/1901.05534
-
[26]
Neural Computing and Applications36(29), 18297–18311 (2024)
Pourkamali-Anaraki, F., Nasrin, T., Jensen, R.E., Peterson, A.M., Hansen, C.J.: 44 Adaptive activation functions for predictive modeling with sparse experimental data. Neural Computing and Applications36(29), 18297–18311 (2024)
2024
-
[27]
https://arxiv.org/abs/2207
Gilany, M., Wilson, P., Jamzad, A., Fooladgar, F., To, M.N.N., Wodlinger, B., Abolmaesumi, P., Mousavi, P.: Towards Confident Detection of Prostate Can- cer using High Resolution Micro-ultrasound (2022). https://arxiv.org/abs/2207. 10485
2022
-
[28]
Hendrix, R., Salanitri, F.P., Spampinato, C., Palazzo, S., Bagci, U.: Evidential Federated Learning for Skin Lesion Image Classification (2024). https://arxiv. org/abs/2411.10071
-
[29]
https://arxiv.org/abs/2412.14048
Khot, A., Luo, X., Kagawa, A., Yoo, S.: Evidential Deep Learning for Probabilistic Modelling of Extreme Storm Events (2024). https://arxiv.org/abs/2412.14048
-
[30]
Neural computing and applications35(30), 22071–22085 (2023)
Li, H., Nan, Y., Del Ser, J., Yang, G.: Region-based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation. Neural computing and applications35(30), 22071–22085 (2023)
2023
-
[31]
https: //arxiv.org/abs/2107.01557
Singh, S.K., Fowdur, J.S., Gawlikowski, J., Medina, D.: Leveraging Graph and Deep Learning Uncertainties to Detect Anomalous Trajectories (2022). https: //arxiv.org/abs/2107.01557
-
[32]
https://arxiv.org/abs/2405.20986
Yu, L., Yang, B., Wang, T., Li, K., Chen, F.: Predictive Uncertainty Quantifica- tion for Bird’s Eye View Segmentation: A Benchmark and Novel Loss Function (2025). https://arxiv.org/abs/2405.20986
-
[33]
Engineering Applications of Artificial Intelligence126, 106983 (2023)
Pourkamali-Anaraki, F., Nasrin, T., Jensen, R.E., Peterson, A.M., Hansen, C.J.: Evaluation of classification models in limited data scenarios with application to additive manufacturing. Engineering Applications of Artificial Intelligence126, 106983 (2023)
2023
-
[34]
https://arxiv.org/abs/2006.09239
Charpentier, B., Z¨ ugner, D., G¨ unnemann, S.: Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts (2020). https://arxiv.org/abs/2006.09239
-
[35]
Pakdaman Naeini, M., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. Proceedings of the AAAI Conference on Artificial Intelligence29(1) (2015) https://doi.org/10.1609/aaai.v29i1.9602
-
[36]
https://arxiv.org/abs/ 1902.06977
Vaicenavicius, J., Widmann, D., Andersson, C., Lindsten, F., Roll, J., Sch¨ on, T.B.: Evaluating model calibration in classification (2019). https://arxiv.org/abs/ 1902.06977
-
[37]
Verified Uncertainty Calibration , url =
Kumar, A., Liang, P., Ma, T.: Verified Uncertainty Calibration (2020). https: //arxiv.org/abs/1909.10155 45
-
[38]
https://arxiv.org/abs/1904
Nixon, J., Dusenberry, M., Jerfel, G., Nguyen, T., Liu, J., Zhang, L., Tran, D.: Measuring Calibration in Deep Learning (2020). https://arxiv.org/abs/1904. 01685
2020
-
[39]
https://arxiv.org/abs/2509.14568
Tan, H.S., Wang, K., McBeth, R.: Evidential Physics-Informed Neural Networks for Scientific Discovery (2025). https://arxiv.org/abs/2509.14568
-
[40]
https://arxiv.org/abs/ 2510.08938
Yang, Z., Ma, Y., Chen, L.: Bi-level Meta-Policy Control for Dynamic Uncer- tainty Calibration in Evidential Deep Learning (2025). https://arxiv.org/abs/ 2510.08938
-
[41]
https://arxiv.org/abs/2106.01216
Kandemir, M., Akg¨ ul, A., Haussmann, M., Unal, G.: Evidential Turing Processes (2022). https://arxiv.org/abs/2106.01216
-
[42]
Credal and interval deep evidential classifications.arXiv preprint arXiv:2512.05526,
Caprio, M., Manchingal, S.K., Cuzzolin, F.: Credal and Interval Deep Evidential Classifications (2025). https://arxiv.org/abs/2512.05526
-
[43]
https://arxiv.org/abs/2304
Deery, J., Lee, C.W., Waslander, S.: ProPanDL: A Modular Architecture for Uncertainty-Aware Panoptic Segmentation (2023). https://arxiv.org/abs/2304. 08645
2023
-
[44]
https://arxiv.org/abs/ 2508.11460
Grefsrud, A., Blaser, N., Buanes, T.: Calibrated and uncertain? Evaluating uncer- tainty estimates in binary classification models (2025). https://arxiv.org/abs/ 2508.11460
-
[45]
https://arxiv.org/abs/2509.12772
Agbelese, D., Chaitanya, K., Pati, P., Parmar, C., Mobadersany, P., Fadnavis, S., Surace, L., Yarandi, S., Ghanem, L.R., Lucas, M., Mansi, T., Cula, O.G., Dam- asceno, P.F., Standish, K.: MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos (2025). https://arxiv.org/abs/2509.12772
-
[46]
IEEE Robotics and Automation Letters11(3), 2378–2385 (2026) https://doi.org/10
Kim, J., Jeon, M., Min, J., Kwak, K., Seo, J.: E2-bki: Evidential ellipsoidal bayesian kernel inference for uncertainty-aware gaussian semantic mapping. IEEE Robotics and Automation Letters11(3), 2378–2385 (2026) https://doi.org/10. 1109/lra.2026.3653367
-
[47]
https://arxiv.org/abs/2112.09368
Oh, D., Shin, B.: Improving evidential deep learning via multi-task learning (2021). https://arxiv.org/abs/2112.09368
-
[48]
https://arxiv.org/abs/2203.05114
Bao, W., Yu, Q., Kong, Y.: OpenTAL: Towards Open Set Temporal Action Localization (2022). https://arxiv.org/abs/2203.05114
-
[49]
https://doi.org/10.3390/universe11120403
Tan, H.S.: Inferring Cosmological Parameters with Evidential Physics-Informed Neural Networks (2025). https://doi.org/10.3390/universe11120403 . https:// arxiv.org/abs/2509.24327
-
[50]
46 https://arxiv.org/abs/2004.10629
Radev, S.T., D’Alessandro, M., Mertens, U.K., Voss, A., K¨ othe, U., B¨ urkner, P.- C.: Amortized Bayesian model comparison with evidential deep learning (2021). 46 https://arxiv.org/abs/2004.10629
-
[51]
Journal of the Royal Statistical Society
DeGroot, M.H., Fienberg, S.E.: The comparison and evaluation of forecasters. Journal of the Royal Statistical Society. Series D (The Statistician)32(1/2), 12–22 (1983)
1983
-
[52]
Predicting good probabilities with supervised learning
Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with super- vised learning. In: Proceedings of the 22nd International Conference on Machine Learning. ICML ’05, pp. 625–632. Association for Computing Machin- ery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102430 . https://doi.org/10.1145/1102351.1102430
- [53]
-
[54]
https://arxiv.org/abs/1906.00816
Haussmann, M., Gerwinn, S., Kandemir, M.: Bayesian Evidential Deep Learning with PAC Regularization (2021). https://arxiv.org/abs/1906.00816
-
[55]
https://arxiv.org/ abs/2209.05522
Li, X., Shen, W., Charles, D.: TEDL: A Two-stage Evidential Deep Learning Method for Classification Uncertainty Quantification (2022). https://arxiv.org/ abs/2209.05522
-
[56]
https://arxiv.org/abs/2306.11113
Pandey, D., Yu, Q.: Learn to Accumulate Evidence from All Training Samples: Theory and Practice (2023). https://arxiv.org/abs/2306.11113
-
[57]
Yoon, T., Kim, H.: Uncertainty Estimation by Density Aware Evidential Deep Learning (2024). https://arxiv.org/abs/2409.08754
-
[58]
Chapman and Hall/CRC, Boca Raton, FL (2000)
Kotz, S., Balakrishnan, N., Johnson, N.L.: Dirichlet Distributions and Their Uses. Chapman and Hall/CRC, Boca Raton, FL (2000)
2000
-
[59]
Technical report, MIT (2000)
Minka, T.P.: Estimating a dirichlet distribution. Technical report, MIT (2000)
2000
-
[60]
https://arxiv.org/abs/2401.12708
Pugnana, A., Perini, L., Davis, J., Ruggieri, S.: Deep Neural Network Benchmarks for Selective Classification (2024). https://arxiv.org/abs/2401.12708
-
[61]
Geifman, Y., El-Yaniv, R.: Selective Classification for Deep Neural Networks (2017). https://arxiv.org/abs/1705.08500
-
[62]
Technical report, University of Toronto (2009)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
2009
-
[63]
Scientific Data10(1), 41 (2023)
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medm- nist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data10(1), 41 (2023)
2023
-
[64]
https://www.kaggle.com/competitions/aptos2019-blindness-detection
Asia Pacific Tele-Ophthalmology Society: APTOS 2019 Blindness Detection. https://www.kaggle.com/competitions/aptos2019-blindness-detection. Kaggle 47 competition dataset (2019)
2019
-
[65]
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR. 2009.5206848
-
[66]
Deep Residual Learning for Image Recognition
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition (2015). https://arxiv.org/abs/1512.03385
work page internal anchor Pith review arXiv 2015
-
[67]
Adam: A Method for Stochastic Optimization
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017). https: //arxiv.org/abs/1412.6980 48
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.