Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier

Berk Hayta; Felix Krahmer; Hannah Laus; Simon Mittermaier

arxiv: 2605.22746 · v1 · pith:IWTYM65Nnew · submitted 2026-05-21 · 💻 cs.LG · eess.AS· stat.ML

Plug-in Losses for Evidential Deep Learning: A Simplified Framework for Uncertainty Estimation that Includes the Softmax Classifier

Berk Hayta , Hannah Laus , Simon Mittermaier , Felix Krahmer This is my paper

Pith reviewed 2026-05-22 06:56 UTC · model grok-4.3

classification 💻 cs.LG eess.ASstat.ML

keywords evidential deep learninguncertainty estimationplug-in lossDirichlet distributionsoftmax classifierapproximation errorspeech recognition

0 comments

The pith

Evidential deep learning objectives can be approximated by plug-in losses evaluated at the Dirichlet mean, with the error decaying as evidence grows and the framework recovering the standard softmax classifier under a specific mapping.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a way to replace the complex Dirichlet-expected objectives in evidential deep learning with simpler plug-in losses computed at the predicted mean. This matters for sensor-based systems that need reliable uncertainty estimates without heavy computational overhead during training. The authors prove that, under mild conditions, the difference between the true objective and the plug-in version shrinks for common losses such as mean-squared error and cross-entropy once evidence becomes large. One special case of their construction recovers the ordinary softmax classifier exactly, supplying a theoretical link between evidential methods and standard classification. On the Google Speech Commands task the simplified losses deliver accuracy and selective-prediction behavior on par with full evidential training while fitting into ordinary deep-learning pipelines.

Core claim

The first-order empirical risk minimization problem induced by EDL is approximated by a plug-in loss evaluated at the Dirichlet mean; under mild assumptions the approximation error decays with growing evidence for a broad class of loss functions including mean-squared error and cross-entropy loss. As a special case the analysis justifies the softmax classifier under a particular evidence-to-Dirichlet mapping.

What carries the argument

Plug-in loss evaluated at the Dirichlet mean, which acts as a surrogate for the full Dirichlet expected objective inside the empirical risk minimization problem.

Load-bearing premise

The mild assumptions on the loss functions and the evidence-to-Dirichlet mapping must hold so that the approximation error shrinks when evidence increases.

What would settle it

On a held-out set, compute both the true Dirichlet-expected loss and the plug-in loss for models with systematically increasing evidence levels and check whether their absolute difference fails to approach zero.

Figures

Figures reproduced from arXiv: 2605.22746 by Berk Hayta, Felix Krahmer, Hannah Laus, Simon Mittermaier.

**Figure 2.** Figure 2: Entropy-based selective-prediction threshold curves for all model variants on GSC V1. [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Entropy KDE plots for all model variants on GSC V1. Each plot shows the distribution of [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Additional vacuity KDE plots for model variants not shown in the main text. Each plot [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

read the original abstract

Real-world sensor-based learning systems require uncertainty estimation that is both reliable and computationally efficient. Evidential Deep Learning (EDL) provides single-pass uncertainty estimation by modeling the class probabilities via Dirichlet distributions, where the Dirichlet parameters are predicted by a learned neural network mapping. However, this approach can lead to computational challenges, as Dirichlet expected objectives are more complex than standard supervised learning losses, complicating their analysis and implementation. We address this issue by approximating the objective of the first-order empirical risk minimization problem induced by EDL with a plug-in loss evaluated at the Dirichlet mean and show that, under mild assumptions, the approximation error decays with growing evidence for a broad class of loss functions, including mean-squared error and cross-entropy loss. As a special case, our analysis provides justification for the use of softmax in the context of uncertainty estimation, since under a particular evidence-to-Dirichlet mapping, our framework includes the standard softmax classifier. We validate the proposed simplified objectives on the Google Speech Commands dataset and show that they achieve predictive accuracy and selective prediction performance comparable to classical EDL, while being simpler to implement using standard deep learning losses and training pipelines. To the best of our knowledge, this empirical analysis is the first to obtain coverage-accuracy trade-offs for speech recognition tasks through EDL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical plug-in approximation for evidential deep learning objectives that recovers softmax under one mapping, with decent experiments on speech data, but the decay claim depends on assumptions that stay vague.

read the letter

Hey colleague, the main takeaway is that this work approximates the evidential deep learning objective using a plug-in loss evaluated at the Dirichlet mean, proving that the approximation error vanishes with increasing evidence under mild assumptions for losses including MSE and cross-entropy. It also shows that standard softmax arises as a special case with a suitable evidence mapping. They do well by demonstrating that these simplified losses achieve similar predictive accuracy and selective prediction performance to full EDL on the Google Speech Commands dataset. This makes implementation easier with standard deep learning tools, which is practical for sensor-based systems needing uncertainty estimates. The speech recognition experiment adds some novelty as the first coverage-accuracy analysis in that domain using EDL. The soft spots are around the theoretical part. The mild assumptions for the error decay are not explicitly listed or verified empirically in the provided details, so it's hard to assess how general the result is. The inclusion of softmax relies on a particular choice of evidence-to-Dirichlet mapping, which might be seen as circular if not justified independently. If those conditions are too restrictive, the justification for using plug-in losses weakens. This paper is aimed at practitioners in uncertainty estimation for deep learning who want simpler training without losing the benefits of evidential models. A reader focused on practical applications would find the experiments and the simplified framework valuable. I recommend sending it for peer review. The idea is useful and the empirical validation is solid enough to merit detailed feedback, even though the theory could use more explicit conditions and checks.

Referee Report

1 major / 1 minor

Summary. The paper proposes approximating the first-order empirical risk minimization objective in Evidential Deep Learning (EDL) with a plug-in loss evaluated at the Dirichlet mean. It claims that, under mild assumptions, the approximation error decays with growing evidence for a broad class of losses including MSE and cross-entropy. As a special case, the framework includes the standard softmax classifier under a particular evidence-to-Dirichlet mapping. Empirical validation on the Google Speech Commands dataset shows that the simplified objectives achieve predictive accuracy and selective prediction performance comparable to classical EDL while being simpler to implement.

Significance. If the approximation result holds, the work offers a practical simplification for EDL by permitting standard deep learning losses and pipelines, which could broaden adoption for uncertainty estimation in sensor-based systems. The justification for the softmax special case and the first reported coverage-accuracy trade-offs on speech recognition tasks are notable strengths. The contribution is strengthened by the focus on reproducible implementation via conventional training.

major comments (1)

[theoretical derivation of the plug-in objective] The central claim that the plug-in approximation error decays with growing evidence for cross-entropy (and MSE) rests on unspecified 'mild assumptions' about the loss function and evidence-to-Dirichlet mapping. The derivation section should explicitly state the required regularity conditions (e.g., Lipschitz or smoothness modulus with respect to the probability simplex, and concentration properties of the mean) and verify that they are satisfied in the regime of interest; without this, the justification for using standard softmax losses inside EDL cannot be fully assessed.

minor comments (1)

The abstract's claim that this is the first empirical analysis of coverage-accuracy trade-offs for speech recognition via EDL would benefit from a brief comparison to prior EDL applications in audio tasks.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the practical value of the plug-in loss framework. We address the major comment below and will revise the manuscript to strengthen the theoretical section.

read point-by-point responses

Referee: [theoretical derivation of the plug-in objective] The central claim that the plug-in approximation error decays with growing evidence for cross-entropy (and MSE) rests on unspecified 'mild assumptions' about the loss function and evidence-to-Dirichlet mapping. The derivation section should explicitly state the required regularity conditions (e.g., Lipschitz or smoothness modulus with respect to the probability simplex, and concentration properties of the mean) and verify that they are satisfied in the regime of interest; without this, the justification for using standard softmax losses inside EDL cannot be fully assessed.

Authors: We agree that the derivation would benefit from an explicit statement of the regularity conditions. In the revised version we will add a new subsection that lists the precise assumptions: (i) the loss is Lipschitz continuous w.r.t. total-variation distance on the probability simplex, and (ii) the Dirichlet mean concentrates around the mode at a rate governed by the total evidence (via standard Dirichlet concentration bounds). We will then verify that both cross-entropy and MSE satisfy these conditions under the evidence-to-Dirichlet mapping used in the paper, including the special case that recovers the softmax classifier. This addition will make the decay of the approximation error fully rigorous while preserving the original proof strategy. revision: yes

Circularity Check

1 steps flagged

Special-case inclusion of softmax classifier achieved by explicit choice of evidence-to-Dirichlet mapping

specific steps

self definitional [Abstract]
"As a special case, our analysis provides justification for the use of softmax in the context of uncertainty estimation, since under a particular evidence-to-Dirichlet mapping, our framework includes the standard softmax classifier."

The claimed justification for including the softmax classifier is obtained by deliberately choosing the evidence-to-Dirichlet mapping that makes the plug-in loss identical to the standard softmax cross-entropy objective. The inclusion is therefore true by the authors' selection of the mapping rather than an independent consequence of the approximation theorem.

full rationale

The paper's central derivation approximates the EDL first-order risk objective by a plug-in loss at the Dirichlet mean and proves error decay under mild assumptions on the loss and mapping. This mathematical step is self-contained and does not reduce to its inputs by construction. The only load-bearing element that borders on self-definition is the special-case claim for softmax, which is obtained precisely by selecting one particular evidence-to-Dirichlet mapping that forces the plug-in objective to coincide with standard cross-entropy. No self-citations, fitted predictions, or ansatzes imported from prior work are used to justify the main result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Dirichlet modeling inherited from prior EDL work and on unspecified mild assumptions about loss functions and evidence growth; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Mild assumptions on the loss class and evidence growth under which the plug-in approximation error decays
Invoked to guarantee that the approximation becomes accurate as evidence increases.

pith-pipeline@v0.9.0 · 5779 in / 1262 out tokens · 40176 ms · 2026-05-22T06:56:41.414368+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We address this issue by approximating the objective of the first-order empirical risk minimization problem induced by EDL with a plug-in loss evaluated at the Dirichlet mean and show that, under mild assumptions, the approximation error decays with growing evidence for a broad class of loss functions, including mean-squared error and cross-entropy loss.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ℓEDL(α,y) = ℓplug(α,y) + R(α,y) where the remainder satisfies R(α,y) = O((α₀ + 1)⁻¹)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

[1]

Advances in Neural Information Processing Systems , year =

Sensoy, Murat and Kaplan, Lance and Kandemir, Melih , title =. Advances in Neural Information Processing Systems , year =

work page
[2]

Proceedings of the International Conference on Learning Representations , year =

Chen, Mengyuan and Gao, Junyu and Xu, Changsheng , title =. Proceedings of the International Conference on Learning Representations , year =

work page
[3]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

Chen, Mengyuan and Gao, Junyu and Xu, Changsheng , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

work page
[4]

Advances in Neural Information Processing Systems , volume=

Are uncertainty quantification capabilities of evidential deep learning a mirage? , author=. Advances in Neural Information Processing Systems , volume=

work page
[5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

A Comprehensive Survey on Evidential Deep Learning and Its Applications , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page
[6]

Advances in neural information processing systems , volume=

Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift , author=. Advances in neural information processing systems , volume=

work page
[7]

Information , volume=

Revisiting softmax for uncertainty approximation in text classification , author=. Information , volume=. 2023 , publisher=

work page 2023
[8]

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , author=. arXiv preprint arXiv:1804.03209 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

International Conference on Machine Learning , pages=

Uncertainty estimation by fisher information-based evidential deep learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[10]

MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition , author=. Proc. Interspeech 2020 , pages=

work page 2020
[11]

Advances in Neural Information Processing Systems , volume=

Pitfalls of epistemic uncertainty quantification through loss minimisation , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

Proceedings of the 41st International Conference on Machine Learning , pages=

Is epistemic uncertainty faithfully represented by evidential deep learning methods? , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page
[13]

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems , volume=

A logic for uncertain probabilities , author=. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems , volume=. 2001 , publisher=

work page 2001
[14]

2018 , publisher=

Subjective Logic: A formalism for reasoning under uncertainty , author=. 2018 , publisher=

work page 2018
[15]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

A generalization of Bayesian inference , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1968 , publisher=

work page 1968
[16]

1976 , publisher=

A Mathematical Theory of Evidence , author=. 1976 , publisher=

work page 1976
[17]

International Journal of Approximate Reasoning , volume=

Perspectives on the theory and practice of belief functions , author=. International Journal of Approximate Reasoning , volume=. 1990 , publisher=

work page 1990
[18]

Transactions on Machine Learning Research , year=

Prior and Posterior Networks: A Survey on Evidential Deep Learning Methods For Uncertainty Estimation , author=. Transactions on Machine Learning Research , year=

work page
[19]

International Conference on Machine Learning , pages=

Dropout as a bayesian approximation: Representing model uncertainty in deep learning , author=. International Conference on Machine Learning , pages=. 2016 , organization=

work page 2016
[20]

Advances in Neural Information Processing Systems , volume=

Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in Neural Information Processing Systems , volume=

work page
[21]

Advances in Neural Information Processing Systems , volume=

Bayesian deep learning and a probabilistic perspective of generalization , author=. Advances in Neural Information Processing Systems , volume=

work page
[22]

International Conference on Machine Learning , pages=

Weight uncertainty in neural network , author=. International Conference on Machine Learning , pages=. 2015 , organization=

work page 2015
[23]

2012 , publisher=

Bayesian learning for neural networks , author=. 2012 , publisher=

work page 2012
[24]

Advances in Neural Information Processing Systems , volume=

Predictive uncertainty estimation via prior networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[25]

Advances in Neural Information Processing Systems , volume=

Deep evidential regression , author=. Advances in Neural Information Processing Systems , volume=

work page
[26]

arXiv preprint arXiv:2006.11590 , year=

Regression prior networks , author=. arXiv preprint arXiv:2006.11590 , year=

work page arXiv 2006
[27]

Neural Networks , volume=

Information aware max-norm Dirichlet networks for predictive uncertainty estimation , author=. Neural Networks , volume=. 2021 , publisher=

work page 2021
[28]

Advances in Approximate Bayesian Inference Symposium , year =

Bayesian Evidential Deep Learning with PAC Regularization , author =. Advances in Approximate Bayesian Inference Symposium , year =

work page
[29]

arXiv preprint arXiv:1909.09577 , year=

NeMo: a toolkit for building AI applications using neural modules , author=. arXiv preprint arXiv:1909.09577 , year=

work page arXiv 1909
[30]

2013 , publisher=

Introductory lectures on convex optimization: A basic course , author=. 2013 , publisher=

work page 2013
[31]

2018 , publisher=

Foundations of machine learning , author=. 2018 , publisher=

work page 2018

[1] [1]

Advances in Neural Information Processing Systems , year =

Sensoy, Murat and Kaplan, Lance and Kandemir, Melih , title =. Advances in Neural Information Processing Systems , year =

work page

[2] [2]

Proceedings of the International Conference on Learning Representations , year =

Chen, Mengyuan and Gao, Junyu and Xu, Changsheng , title =. Proceedings of the International Conference on Learning Representations , year =

work page

[3] [3]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

Chen, Mengyuan and Gao, Junyu and Xu, Changsheng , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

work page

[4] [4]

Advances in Neural Information Processing Systems , volume=

Are uncertainty quantification capabilities of evidential deep learning a mirage? , author=. Advances in Neural Information Processing Systems , volume=

work page

[5] [5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

A Comprehensive Survey on Evidential Deep Learning and Its Applications , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page

[6] [6]

Advances in neural information processing systems , volume=

Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

Information , volume=

Revisiting softmax for uncertainty approximation in text classification , author=. Information , volume=. 2023 , publisher=

work page 2023

[8] [8]

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , author=. arXiv preprint arXiv:1804.03209 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

International Conference on Machine Learning , pages=

Uncertainty estimation by fisher information-based evidential deep learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[10] [10]

MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition , author=. Proc. Interspeech 2020 , pages=

work page 2020

[11] [11]

Advances in Neural Information Processing Systems , volume=

Pitfalls of epistemic uncertainty quantification through loss minimisation , author=. Advances in Neural Information Processing Systems , volume=

work page

[12] [12]

Proceedings of the 41st International Conference on Machine Learning , pages=

Is epistemic uncertainty faithfully represented by evidential deep learning methods? , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

work page

[13] [13]

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems , volume=

A logic for uncertain probabilities , author=. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems , volume=. 2001 , publisher=

work page 2001

[14] [14]

2018 , publisher=

Subjective Logic: A formalism for reasoning under uncertainty , author=. 2018 , publisher=

work page 2018

[15] [15]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

A generalization of Bayesian inference , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1968 , publisher=

work page 1968

[16] [16]

1976 , publisher=

A Mathematical Theory of Evidence , author=. 1976 , publisher=

work page 1976

[17] [17]

International Journal of Approximate Reasoning , volume=

Perspectives on the theory and practice of belief functions , author=. International Journal of Approximate Reasoning , volume=. 1990 , publisher=

work page 1990

[18] [18]

Transactions on Machine Learning Research , year=

Prior and Posterior Networks: A Survey on Evidential Deep Learning Methods For Uncertainty Estimation , author=. Transactions on Machine Learning Research , year=

work page

[19] [19]

International Conference on Machine Learning , pages=

Dropout as a bayesian approximation: Representing model uncertainty in deep learning , author=. International Conference on Machine Learning , pages=. 2016 , organization=

work page 2016

[20] [20]

Advances in Neural Information Processing Systems , volume=

Simple and scalable predictive uncertainty estimation using deep ensembles , author=. Advances in Neural Information Processing Systems , volume=

work page

[21] [21]

Advances in Neural Information Processing Systems , volume=

Bayesian deep learning and a probabilistic perspective of generalization , author=. Advances in Neural Information Processing Systems , volume=

work page

[22] [22]

International Conference on Machine Learning , pages=

Weight uncertainty in neural network , author=. International Conference on Machine Learning , pages=. 2015 , organization=

work page 2015

[23] [23]

2012 , publisher=

Bayesian learning for neural networks , author=. 2012 , publisher=

work page 2012

[24] [24]

Advances in Neural Information Processing Systems , volume=

Predictive uncertainty estimation via prior networks , author=. Advances in Neural Information Processing Systems , volume=

work page

[25] [25]

Advances in Neural Information Processing Systems , volume=

Deep evidential regression , author=. Advances in Neural Information Processing Systems , volume=

work page

[26] [26]

arXiv preprint arXiv:2006.11590 , year=

Regression prior networks , author=. arXiv preprint arXiv:2006.11590 , year=

work page arXiv 2006

[27] [27]

Neural Networks , volume=

Information aware max-norm Dirichlet networks for predictive uncertainty estimation , author=. Neural Networks , volume=. 2021 , publisher=

work page 2021

[28] [28]

Advances in Approximate Bayesian Inference Symposium , year =

Bayesian Evidential Deep Learning with PAC Regularization , author =. Advances in Approximate Bayesian Inference Symposium , year =

work page

[29] [29]

arXiv preprint arXiv:1909.09577 , year=

NeMo: a toolkit for building AI applications using neural modules , author=. arXiv preprint arXiv:1909.09577 , year=

work page arXiv 1909

[30] [30]

2013 , publisher=

Introductory lectures on convex optimization: A basic course , author=. 2013 , publisher=

work page 2013

[31] [31]

2018 , publisher=

Foundations of machine learning , author=. 2018 , publisher=

work page 2018