arxiv: 2604.06032 · v1 · submitted 2026-04-07 · 📊 stat.ML · cs.LG

Recognition: no theorem link

Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification

Courtney Franzen , Farhad Pourkamali-Anaraki

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:35 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords Dirichlet distributionensemble methodspredictive uncertaintyselective classificationsoftmax outputsmethod of momentsevidential learning

0 comments

The pith

Fitting Dirichlet distributions via method of moments to ensembles of softmax outputs produces stable predictive uncertainty estimates that improve selective classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that ensembles of softmax vectors from separately trained cross-entropy networks can be turned into explicit Dirichlet distributions using a method of moments estimator, with an optional maximum-likelihood step. This construction avoids the design sensitivities of evidential deep learning while reducing run-to-run variability in uncertainty values. A sympathetic reader cares because reliable uncertainty lets models abstain on hard cases in selective classification and produce trustworthy scores, which single-run softmax outputs often fail to do.

Core claim

By treating an ensemble of softmax probability vectors as samples from an underlying Dirichlet distribution and estimating its concentration parameters with the method of moments, the resulting predictive distribution yields uncertainty estimates whose stability and calibration outperform those obtained from evidential training, producing measurable gains in downstream tasks such as confidence-based ranking and selective classification on standard image and text datasets.

What carries the argument

The method-of-moments estimator that matches the first two moments of an ensemble of softmax vectors to the mean and variance parameters of a Dirichlet distribution, thereby converting an implicit ensemble into an explicit parametric predictive distribution.

If this is right

Uncertainty estimates become less sensitive to random seeds and training details because the ensemble averages out single-run fluctuations.
Selective classification can safely reject a larger fraction of inputs while keeping error rate low, since the Dirichlet variance better flags low-confidence cases.
Uncertainty estimation is decoupled from the choice of evidential loss, prior strength, or activation function, removing a source of hyperparameter fragility.
The same ensemble outputs already produced during training can be reused for Dirichlet fitting without requiring a second training stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same moment-matching idea could be tested on ensembles from non-neural models whose outputs are probability vectors.
Combining the Dirichlet fit with cheaper ensemble approximations such as Monte-Carlo dropout would test whether full independent retraining is necessary.
If the Dirichlet fit quality correlates with downstream task gains, one could monitor the moment-matching residual as a cheap diagnostic for when more ensemble members are needed.

Load-bearing premise

That the spread of softmax outputs across independent training runs is sufficiently well described by a single Dirichlet distribution for the moment-matching estimator to recover useful uncertainty values.

What would settle it

If, on a held-out dataset, the area under the selective-classification risk curve or the ranking quality of uncertainty scores shows no improvement over standard softmax baselines or evidential models, the performance advantage would be refuted.

read the original abstract

Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fits Dirichlet parameters to ensembles of softmax outputs via method of moments as a simpler alternative to evidential deep learning, but the gains may largely trace to ensembling rather than the parametric fit.

read the letter

The main thing to know is that the authors estimate Dirichlet concentration parameters directly from an ensemble of standard cross-entropy networks using the method of moments, with an optional MLE polish. This keeps training unchanged and avoids the loss-design sensitivities they correctly note in evidential deep learning. The result is an explicit predictive distribution over the simplex that can be used for confidence scoring and selective classification. That construction is new enough in its details and cleanly motivated by the variability they observe across independent runs. It does the practical job of turning reliable ensemble outputs into something that looks like a full uncertainty model without extra training tricks. The optional refinement step is a reasonable addition that lets users trade compute for tighter fits. The central assumption is that the collection of softmax vectors behaves like draws from a Dirichlet. If the empirical distribution on the simplex shows multimodality, strong negative correlations, or tails that no Dirichlet matches, the moments estimator will produce concentration values that do not reflect actual epistemic uncertainty. In that case the reported improvements in downstream tasks could be artifacts of averaging the ensemble rather than the Dirichlet modeling itself. The abstract does not describe goodness-of-fit diagnostics or direct comparisons against plain ensemble variance baselines, so it is hard to separate those effects. Experiments are summarized at a high level, leaving effect sizes and statistical controls unverified from the given text. This work is aimed at practitioners who want usable uncertainty estimates for classifiers without switching to evidential training regimes. A reader focused on selective classification or safety-critical deployment would find the construction useful to try. It deserves peer review because the idea is straightforward, the motivation is sound, and the method is easy to reproduce and test. I would not cite it in my own work until the experiments show that the Dirichlet step adds measurable value beyond what standard ensembling already provides.

Referee Report

2 major / 2 minor

Summary. The paper claims that applying a method of moments estimator (with optional MLE refinement) to ensembles of softmax outputs from cross-entropy trained networks produces stable Dirichlet concentration parameters, yielding explicit predictive distributions that improve stability and performance over evidential deep learning in uncertainty-guided tasks such as confidence scoring and selective classification.

Significance. If the empirical gains hold and are attributable to the Dirichlet construction rather than ensembling alone, the approach provides a practical, less fragile alternative to evidential training by leveraging standard cross-entropy ensembles for reliable predictive uncertainty on the simplex.

major comments (2)

[Section 3.2] The load-bearing assumption that ensemble softmax outputs are sufficiently well-described by a Dirichlet distribution for the method-of-moments estimator to yield reliable epistemic uncertainty is not supported by diagnostics; no goodness-of-fit tests, QQ plots, or comparisons of empirical moments versus fitted Dirichlet are reported in the experimental section.
[Table 4] Table 4 (selective classification results): performance is compared only to evidential baselines and single-model cross-entropy, but lacks an ablation against plain ensemble averaging of softmax probabilities without the Dirichlet fitting step; this leaves open whether gains are due to the proposed modeling or to ensembling per se.

minor comments (2)

[Eq. (7)] Notation for the concentration vector alpha is introduced without explicitly distinguishing the raw method-of-moments estimate from the optionally refined MLE version in the predictive distribution formula.
[Figure 2] Figure 2 caption does not specify the number of ensemble members used or the random seed variation across runs, making reproducibility of the stability claims difficult.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications and commit to revisions that strengthen the empirical support for our claims without altering the core contribution.

read point-by-point responses

Referee: [Section 3.2] The load-bearing assumption that ensemble softmax outputs are sufficiently well-described by a Dirichlet distribution for the method-of-moments estimator to yield reliable epistemic uncertainty is not supported by diagnostics; no goodness-of-fit tests, QQ plots, or comparisons of empirical moments versus fitted Dirichlet are reported in the experimental section.

Authors: We agree that explicit validation of the Dirichlet approximation strengthens the presentation. The method-of-moments estimator is chosen precisely because it is a standard, closed-form procedure for fitting Dirichlet distributions to simplex-valued data, and the resulting predictive distributions demonstrably improve stability over single-run cross-entropy and evidential baselines. In the revised manuscript we will add QQ plots of the marginals and direct comparisons of empirical versus fitted first- and second-order moments on the evaluation sets to quantify the quality of the approximation. revision: yes
Referee: [Table 4] Table 4 (selective classification results): performance is compared only to evidential baselines and single-model cross-entropy, but lacks an ablation against plain ensemble averaging of softmax probabilities without the Dirichlet fitting step; this leaves open whether gains are due to the proposed modeling or to ensembling per se.

Authors: This is a fair criticism. While the Dirichlet construction supplies an explicit predictive distribution (rather than a point estimate) that enables the uncertainty-guided decisions in selective classification, an ablation against the plain ensemble mean is necessary to isolate the contribution of the fitting step. We will add this comparison to the revised Table 4, reporting selective-classification curves for the ensemble-averaged softmax probabilities alongside our Dirichlet-based results. revision: yes

Circularity Check

0 steps flagged

No circularity: method-of-moments Dirichlet fitting on ensembles is a direct statistical estimator

full rationale

The paper's core construction applies a standard method-of-moments estimator (with optional MLE) directly to the empirical distribution of softmax outputs from an ensemble of independently trained cross-entropy networks. This produces explicit Dirichlet concentration parameters without any self-definitional loop, without renaming a fitted quantity as a 'prediction,' and without load-bearing self-citations or uniqueness theorems. Downstream gains in confidence scoring and selective classification are presented as empirical outcomes on held-out data rather than algebraic consequences of the fitting procedure itself. The derivation chain therefore remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, invented entities, or non-standard axioms are stated in the provided text. The approach implicitly assumes standard properties of softmax outputs and Dirichlet distributions in classification.

axioms (1)

domain assumption Softmax probability vectors from independently trained neural networks can be aggregated to parameterize a Dirichlet distribution that represents predictive uncertainty.
This modeling choice is required for the method of moments estimator to produce the claimed Dirichlet predictive distributions.

pith-pipeline@v0.9.0 · 5477 in / 1272 out tokens · 27349 ms · 2026-05-10T18:35:22.314128+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 52 canonical work pages · 2 internal anchors

[1]

On calibration of modern neural networks,

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On Calibration of Modern Neural Networks (2017). https://arxiv.org/abs/1706.04599

work page arXiv 2017
[2]

https://arxiv.org/abs/2012

Tomani, C., Buettner, F.: Towards Trustworthy Predictions from Deep Neural Networks with Fast Adversarial Calibration (2021). https://arxiv.org/abs/2012. 10923

2021
[3]

https://arxiv.org/abs/1701.05369

Molchanov, D., Ashukha, A., Vetrov, D.: Variational Dropout Sparsifies Deep Neural Networks (2017). https://arxiv.org/abs/1701.05369

work page arXiv 2017
[4]

https://arxiv.org/abs/1506.02557

Kingma, D.P., Salimans, T., Welling, M.: Variational Dropout and the Local Reparameterization Trick (2015). https://arxiv.org/abs/1506.02557

work page arXiv 2015
[5]

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Gal, Y., Ghahramani, Z.: Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016). https://arxiv.org/abs/1506.02142

work page Pith review arXiv 2016
[6]

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and Scalable Predic- tive Uncertainty Estimation using Deep Ensembles (2017). https://arxiv.org/abs/ 1612.01474

work page Pith review arXiv 2017
[7]

https://arxiv.org/ abs/2002.06470

Ashukha, A., Lyzhov, A., Molchanov, D., Vetrov, D.: Pitfalls of In-Domain Uncer- tainty Estimation and Ensembling in Deep Learning (2021). https://arxiv.org/ abs/2002.06470

work page arXiv 2021
[8]

Manning Publications Co

Monarch, R.: Human-in-the-loop Machine Learning: Active Learning and Anno- tation for Human-centered AI. Manning Publications Co. LLC, New York (2021)

2021
[9]

https://arxiv.org/abs/1806.01768

Sensoy, M., Kaplan, L., Kandemir, M.: Evidential Deep Learning to Quantify Classification Uncertainty (2018). https://arxiv.org/abs/1806.01768

work page arXiv 2018
[10]

Prior and posterior networks: A survey on evidential deep learning methods for uncertainty estima- tion,

Ulmer, D.T., Hardmeier, C., Frellsen, J.: Prior and posterior networks: A survey on evidential deep learning methods for uncertainty estimation. Transactions on Machine Learning Research (2023). arXiv preprint arXiv:2110.03051

work page arXiv 2023
[11]

Springer, Cham (2016)

Jøsang, A., service), S.O.: Subjective Logic: A Formalism for Reasoning Under Uncertainty, 1st edn. Springer, Cham (2016)

2016
[12]

URLhttps://openaccess.thecvf.com/content/WACV2021/html/ Mathew_DocVQA_A_Dataset_for_VQA_on_Document_Images_WACV_2021_paper.html

Sensoy, M., Saleki, M., Julier, S., Aydogan, R., Reid, J.: Misclassification risk and uncertainty quantification in deep classifiers. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2483–2491 (2021). https://doi. org/10.1109/WACV48630.2021.00253

work page doi:10.1109/wacv48630.2021.00253 2021
[13]

Mira Juergens, Nis Meinert, Viktor Bengs, Eyke Hüllermeier, and Willem Waegeman

J¨ urgens, M., Meinert, N., Bengs, V., H¨ ullermeier, E., Waegeman, W.: Is Epis- temic Uncertainty Faithfully Represented by Evidential Deep Learning Methods? (2024). https://arxiv.org/abs/2402.09056 43

work page arXiv 2024
[14]

https:// arxiv.org/abs/2310.12663

Davies, C., Vilamala, M.R., Preece, A.D., Cerutti, F., Kaplan, L.M., Chakraborty, S.: Knowledge from Uncertainty in Evidential Deep Learning (2023). https:// arxiv.org/abs/2310.12663

work page arXiv 2023
[15]

https://arxiv.org/abs/2402.06160

Shen, M., Ryu, J.J., Ghosh, S., Bu, Y., Sattigeri, P., Das, S., Wornell, G.W.: Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage? (2024). https://arxiv.org/abs/2402.06160

work page arXiv 2024
[16]

https://arxiv.org/abs/1802.10501

Malinin, A., Gales, M.: Predictive Uncertainty Estimation via Prior Networks (2018). https://arxiv.org/abs/1802.10501

work page arXiv 2018
[17]

https://arxiv.org/abs/ 2512.23753

Pandey, D.S., Choi, H., Yu, Q.: Generalized Regularized Evidential Deep Learning Models: Theory and Comprehensive Evaluation (2025). https://arxiv.org/abs/ 2512.23753

work page arXiv 2025
[18]

https://arxiv.org/abs/2010.05784

Wang, H., Yu, Z., Yue, Y., Anandkumar, A., Liu, A., Yan, J.: Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach (2024). https://arxiv.org/abs/2010.05784

work page arXiv 2024
[19]

Ovadia, E

Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J.V., Lakshminarayanan, B., Snoek, J.: Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift (2019). https://arxiv.org/ abs/1906.02530

work page arXiv 2019
[20]

Yager, R., Liu, L.: Classic Works of the Dempster-Shafer Theory of Belief Func- tions vol. 219. Springer, ??? (2008). https://doi.org/10.1007/978-3-540-44792-4

work page doi:10.1007/978-3-540-44792-4 2008
[21]

https://arxiv.org/abs/2412.03391

Sensoy, M., Kaplan, L.M., Julier, S., Saleki, M., Cerutti, F.: Risk-aware Classifi- cation via Uncertainty Quantification (2024). https://arxiv.org/abs/2412.03391

work page arXiv 2024
[22]

Tabular data with class imbalance: Predicting electric vehicle crash severity with pretrained transformers (tabpfn) and mamba-based models

Franzen, C., Pourkamali-Anaraki, F.: “out-of-the-box” uncertainty: Reducing confident errors with dirichlet classifiers. In: 2025 International Conference on Machine Learning and Applications (ICMLA). IEEE, ??? (2025). https://doi.org/ 10.1109/ICMLA66185.2025.00014

work page doi:10.1109/icmla66185.2025.00014 2025
[23]

https://arxiv

Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., Carin, L.: Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing (2019). https://arxiv. org/abs/1903.10145

work page arXiv 2019
[24]

https://arxiv.org/abs/1602.02282

Sønderby, C.K., Raiko, T., Maaløe, L., Sønderby, S.K., Winther, O.: Ladder Variational Autoencoders (2016). https://arxiv.org/abs/1602.02282

work page arXiv 2016
[25]

https://arxiv

He, J., Spokoyny, D., Neubig, G., Berg-Kirkpatrick, T.: Lagging Inference Net- works and Posterior Collapse in Variational Autoencoders (2019). https://arxiv. org/abs/1901.05534

work page arXiv 2019
[26]

Neural Computing and Applications36(29), 18297–18311 (2024)

Pourkamali-Anaraki, F., Nasrin, T., Jensen, R.E., Peterson, A.M., Hansen, C.J.: 44 Adaptive activation functions for predictive modeling with sparse experimental data. Neural Computing and Applications36(29), 18297–18311 (2024)

2024
[27]

https://arxiv.org/abs/2207

Gilany, M., Wilson, P., Jamzad, A., Fooladgar, F., To, M.N.N., Wodlinger, B., Abolmaesumi, P., Mousavi, P.: Towards Confident Detection of Prostate Can- cer using High Resolution Micro-ultrasound (2022). https://arxiv.org/abs/2207. 10485

2022
[28]

https://arxiv

Hendrix, R., Salanitri, F.P., Spampinato, C., Palazzo, S., Bagci, U.: Evidential Federated Learning for Skin Lesion Image Classification (2024). https://arxiv. org/abs/2411.10071

work page arXiv 2024
[29]

https://arxiv.org/abs/2412.14048

Khot, A., Luo, X., Kagawa, A., Yoo, S.: Evidential Deep Learning for Probabilistic Modelling of Extreme Storm Events (2024). https://arxiv.org/abs/2412.14048

work page arXiv 2024
[30]

Neural computing and applications35(30), 22071–22085 (2023)

Li, H., Nan, Y., Del Ser, J., Yang, G.: Region-based evidential deep learning to quantify uncertainty and improve robustness of brain tumor segmentation. Neural computing and applications35(30), 22071–22085 (2023)

2023
[31]

https: //arxiv.org/abs/2107.01557

Singh, S.K., Fowdur, J.S., Gawlikowski, J., Medina, D.: Leveraging Graph and Deep Learning Uncertainties to Detect Anomalous Trajectories (2022). https: //arxiv.org/abs/2107.01557

work page arXiv 2022
[32]

https://arxiv.org/abs/2405.20986

Yu, L., Yang, B., Wang, T., Li, K., Chen, F.: Predictive Uncertainty Quantifica- tion for Bird’s Eye View Segmentation: A Benchmark and Novel Loss Function (2025). https://arxiv.org/abs/2405.20986

work page arXiv 2025
[33]

Engineering Applications of Artificial Intelligence126, 106983 (2023)

Pourkamali-Anaraki, F., Nasrin, T., Jensen, R.E., Peterson, A.M., Hansen, C.J.: Evaluation of classification models in limited data scenarios with application to additive manufacturing. Engineering Applications of Artificial Intelligence126, 106983 (2023)

2023
[34]

https://arxiv.org/abs/2006.09239

Charpentier, B., Z¨ ugner, D., G¨ unnemann, S.: Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts (2020). https://arxiv.org/abs/2006.09239

work page arXiv 2020
[35]

2015 , month=

Pakdaman Naeini, M., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. Proceedings of the AAAI Conference on Artificial Intelligence29(1) (2015) https://doi.org/10.1609/aaai.v29i1.9602

work page doi:10.1609/aaai.v29i1.9602 2015
[36]

https://arxiv.org/abs/ 1902.06977

Vaicenavicius, J., Widmann, D., Andersson, C., Lindsten, F., Roll, J., Sch¨ on, T.B.: Evaluating model calibration in classification (2019). https://arxiv.org/abs/ 1902.06977

work page arXiv 2019
[37]

Verified Uncertainty Calibration , url =

Kumar, A., Liang, P., Ma, T.: Verified Uncertainty Calibration (2020). https: //arxiv.org/abs/1909.10155 45

work page arXiv 2020
[38]

https://arxiv.org/abs/1904

Nixon, J., Dusenberry, M., Jerfel, G., Nguyen, T., Liu, J., Zhang, L., Tran, D.: Measuring Calibration in Deep Learning (2020). https://arxiv.org/abs/1904. 01685

2020
[39]

https://arxiv.org/abs/2509.14568

Tan, H.S., Wang, K., McBeth, R.: Evidential Physics-Informed Neural Networks for Scientific Discovery (2025). https://arxiv.org/abs/2509.14568

work page arXiv 2025
[40]

https://arxiv.org/abs/ 2510.08938

Yang, Z., Ma, Y., Chen, L.: Bi-level Meta-Policy Control for Dynamic Uncer- tainty Calibration in Evidential Deep Learning (2025). https://arxiv.org/abs/ 2510.08938

work page arXiv 2025
[41]

https://arxiv.org/abs/2106.01216

Kandemir, M., Akg¨ ul, A., Haussmann, M., Unal, G.: Evidential Turing Processes (2022). https://arxiv.org/abs/2106.01216

work page arXiv 2022
[42]

Credal and interval deep evidential classifications.arXiv preprint arXiv:2512.05526,

Caprio, M., Manchingal, S.K., Cuzzolin, F.: Credal and Interval Deep Evidential Classifications (2025). https://arxiv.org/abs/2512.05526

work page arXiv 2025
[43]

https://arxiv.org/abs/2304

Deery, J., Lee, C.W., Waslander, S.: ProPanDL: A Modular Architecture for Uncertainty-Aware Panoptic Segmentation (2023). https://arxiv.org/abs/2304. 08645

2023
[44]

https://arxiv.org/abs/ 2508.11460

Grefsrud, A., Blaser, N., Buanes, T.: Calibrated and uncertain? Evaluating uncer- tainty estimates in binary classification models (2025). https://arxiv.org/abs/ 2508.11460

work page arXiv 2025
[45]

https://arxiv.org/abs/2509.12772

Agbelese, D., Chaitanya, K., Pati, P., Parmar, C., Mobadersany, P., Fadnavis, S., Surace, L., Yarandi, S., Ghanem, L.R., Lucas, M., Mansi, T., Cula, O.G., Dam- asceno, P.F., Standish, K.: MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos (2025). https://arxiv.org/abs/2509.12772

work page arXiv 2025
[46]

IEEE Robotics and Automation Letters11(3), 2378–2385 (2026) https://doi.org/10

Kim, J., Jeon, M., Min, J., Kwak, K., Seo, J.: E2-bki: Evidential ellipsoidal bayesian kernel inference for uncertainty-aware gaussian semantic mapping. IEEE Robotics and Automation Letters11(3), 2378–2385 (2026) https://doi.org/10. 1109/lra.2026.3653367

work page arXiv 2026
[47]

https://arxiv.org/abs/2112.09368

Oh, D., Shin, B.: Improving evidential deep learning via multi-task learning (2021). https://arxiv.org/abs/2112.09368

work page arXiv 2021
[48]

https://arxiv.org/abs/2203.05114

Bao, W., Yu, Q., Kong, Y.: OpenTAL: Towards Open Set Temporal Action Localization (2022). https://arxiv.org/abs/2203.05114

work page arXiv 2022
[49]

https://doi.org/10.3390/universe11120403

Tan, H.S.: Inferring Cosmological Parameters with Evidential Physics-Informed Neural Networks (2025). https://doi.org/10.3390/universe11120403 . https:// arxiv.org/abs/2509.24327

work page doi:10.3390/universe11120403 2025
[50]

46 https://arxiv.org/abs/2004.10629

Radev, S.T., D’Alessandro, M., Mertens, U.K., Voss, A., K¨ othe, U., B¨ urkner, P.- C.: Amortized Bayesian model comparison with evidential deep learning (2021). 46 https://arxiv.org/abs/2004.10629

work page arXiv 2021
[51]

Journal of the Royal Statistical Society

DeGroot, M.H., Fienberg, S.E.: The comparison and evaluation of forecasters. Journal of the Royal Statistical Society. Series D (The Statistician)32(1/2), 12–22 (1983)

1983
[52]

Predicting good probabilities with supervised learning

Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with super- vised learning. In: Proceedings of the 22nd International Conference on Machine Learning. ICML ’05, pp. 625–632. Association for Computing Machin- ery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102430 . https://doi.org/10.1145/1102351.1102430

work page doi:10.1145/1102351.1102430 2005
[53]

Cheng, Z

Chen, M., Gao, J., Xu, C.: Revisiting Essential and Nonessential Settings of Evidential Deep Learning (2024). https://arxiv.org/abs/2410.00393

work page arXiv 2024
[54]

https://arxiv.org/abs/1906.00816

Haussmann, M., Gerwinn, S., Kandemir, M.: Bayesian Evidential Deep Learning with PAC Regularization (2021). https://arxiv.org/abs/1906.00816

work page arXiv 2021
[55]

https://arxiv.org/ abs/2209.05522

Li, X., Shen, W., Charles, D.: TEDL: A Two-stage Evidential Deep Learning Method for Classification Uncertainty Quantification (2022). https://arxiv.org/ abs/2209.05522

work page arXiv 2022
[56]

https://arxiv.org/abs/2306.11113

Pandey, D., Yu, Q.: Learn to Accumulate Evidence from All Training Samples: Theory and Practice (2023). https://arxiv.org/abs/2306.11113

work page arXiv 2023
[57]

Yoon and H

Yoon, T., Kim, H.: Uncertainty Estimation by Density Aware Evidential Deep Learning (2024). https://arxiv.org/abs/2409.08754

work page arXiv 2024
[58]

Chapman and Hall/CRC, Boca Raton, FL (2000)

Kotz, S., Balakrishnan, N., Johnson, N.L.: Dirichlet Distributions and Their Uses. Chapman and Hall/CRC, Boca Raton, FL (2000)

2000
[59]

Technical report, MIT (2000)

Minka, T.P.: Estimating a dirichlet distribution. Technical report, MIT (2000)

2000
[60]

https://arxiv.org/abs/2401.12708

Pugnana, A., Perini, L., Davis, J., Ruggieri, S.: Deep Neural Network Benchmarks for Selective Classification (2024). https://arxiv.org/abs/2401.12708

work page arXiv 2024
[61]

Geifman and R

Geifman, Y., El-Yaniv, R.: Selective Classification for Deep Neural Networks (2017). https://arxiv.org/abs/1705.08500

work page arXiv 2017
[62]

Technical report, University of Toronto (2009)

Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)

2009
[63]

Scientific Data10(1), 41 (2023)

Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., Ni, B.: Medm- nist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data10(1), 41 (2023)

2023
[64]

https://www.kaggle.com/competitions/aptos2019-blindness-detection

Asia Pacific Tele-Ophthalmology Society: APTOS 2019 Blindness Detection. https://www.kaggle.com/competitions/aptos2019-blindness-detection. Kaggle 47 competition dataset (2019)

2019
[65]

Reid, and Silvio Savarese

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR. 2009.5206848

work page doi:10.1109/cvpr 2009
[66]

Deep Residual Learning for Image Recognition

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition (2015). https://arxiv.org/abs/1512.03385

work page internal anchor Pith review arXiv 2015
[67]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2017). https: //arxiv.org/abs/1412.6980 48

work page internal anchor Pith review Pith/arXiv arXiv 2017