General Uncertainty Estimation with Delta Variances

Hado van Hasselt; John Shawe-Taylor; Simon Schmitt

arxiv: 2502.14698 · v2 · submitted 2025-02-20 · 💻 cs.LG · cs.AI· stat.AP· stat.ML

General Uncertainty Estimation with Delta Variances

Simon Schmitt , John Shawe-Taylor , Hado van Hasselt This is my paper

Pith reviewed 2026-05-23 02:36 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.APstat.ML

keywords epistemic uncertaintyuncertainty quantificationneural networksgradient computationdelta variancesweather simulationmachine learning

0 comments

The pith

Delta Variances estimate epistemic uncertainty for neural networks and their compositions using one gradient computation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Delta Variances as a family of algorithms for quantifying epistemic uncertainty induced by limited data. These algorithms require only a single gradient computation and apply directly to neural networks as well as functions built from them, without any modifications to architecture or training. Special cases of the approach recover existing popular methods, and a unified theoretical view leads to a natural extension whose benefit is shown empirically. The method is demonstrated on a weather simulator whose step function is neural-network based, where it achieves competitive performance.

Core claim

Delta Variances form a family of algorithms for epistemic uncertainty quantification that remain computationally efficient at the cost of one gradient computation. The family applies without change to neural networks and to more general functions composed of neural networks. Multiple theoretical derivations are discussed, under which special cases recover popular techniques and a unified perspective emerges; this perspective yields a natural extension that improves empirical results.

What carries the argument

Delta Variances, a family of uncertainty estimators obtained from several theoretical derivations that unify related methods and operate through gradient computations.

If this is right

The same procedure works on any function built by composing neural networks.
No retraining or architectural modification is required.
Special cases match well-known existing uncertainty techniques.
The unified view produces an extension that improves performance on the tested simulator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-gradient property could make the method attractive for very large models where repeated forward passes are prohibitive.
Similar derivations might apply to uncertainty in other gradient-based systems such as physics-informed networks.
The approach could be tested on sequential decision tasks where uncertainty must be estimated inside a simulator loop.

Load-bearing premise

The derivations remain valid when the functions involved are arbitrary compositions of neural networks.

What would settle it

A direct comparison on a new neural-network composition task where the single-gradient Delta Variance estimates are less accurate than standard multi-sample methods.

Figures

Figures reproduced from arXiv: 2502.14698 by Hado van Hasselt, John Shawe-Taylor, Simon Schmitt.

**Figure 2.** Figure 2: Illustrative survival prediction example. Actual epis [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of variance estimators in terms of their [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: To investigate more intricate quantities of interest, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside -- here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Delta Variances unifies some existing uncertainty estimators under one view and adds a cheap extension, but the support for general NN compositions is thin.

read the letter

The paper presents Delta Variances as a family of epistemic uncertainty methods that recover several known techniques as special cases through multiple derivations and then offers one natural extension. The practical angle is clear: it needs only a single gradient, works on neural networks or functions built from them, and requires no architecture or training changes. The weather-simulator example shows competitive performance in that setting, which is a reasonable test for a neural step inside a larger model. That convenience and the organizing perspective are the parts worth noting. The derivations themselves are not shown in detail here, so it is not possible to check the exact assumptions on differentiability or the outer function. The stress-test point about whether the single-gradient formula holds for arbitrary nonlinear compositions without extra conditions is fair to raise; the abstract does not spell out the scope, and only one empirical case is mentioned. No obvious circularity or fitting issues appear from the given claims. This work is aimed at people already working on uncertainty quantification who might value a compact way to see related methods together. A reader looking for a new paradigm or broad empirical validation will not find it. The unification and the extension are modest but coherent enough that the paper should go to referees so they can examine the derivations and request additional tests on other compositions.

Referee Report

3 major / 2 minor

Summary. The paper introduces Delta Variances, a family of methods for epistemic uncertainty quantification applicable to neural networks and arbitrary compositions of neural networks. It claims that these methods require only a single gradient computation, need no changes to architecture or training, recover known techniques as special cases via multiple theoretical derivations, provide a unified perspective, and yield competitive empirical results on a weather-simulator example with an NN-based step function, plus a beneficial natural extension.

Significance. If the derivations hold under general conditions and the single-example competitiveness generalizes, the approach would offer a convenient, low-cost way to obtain epistemic uncertainty estimates for composite models without retraining or architectural modification, unifying several existing techniques under one framework.

major comments (3)

[Theoretical derivations (multiple sections referenced in abstract)] The central claim of applicability to general compositions f ∘ g with a single gradient rests on unstated assumptions (e.g., linearity of the outer function, bounded higher-order terms, or specific differentiability conditions). No section explicitly enumerates these assumptions or proves the formula holds beyond special cases.
[Empirical evaluation (weather simulator example)] Empirical support is limited to a single weather-simulator case with an NN step function. No additional experiments test nonlinear outer functions, deeper inner networks, or other compositions to substantiate the generalization claim.
[Unified perspective and extension] The unified perspective and natural extension are presented as arising from the general view, but without explicit comparison tables or ablation studies showing how the extension improves over the base Delta Variances or recovered special cases, the benefit remains under-supported.

minor comments (2)

[Introduction / Methods] Notation for Delta Variances and related quantities should be introduced with a dedicated table or equation block early in the manuscript for clarity.
[Abstract / Theoretical sections] The abstract mentions 'multiple ways to derive Delta Variances' but the manuscript would benefit from a short summary table mapping each derivation to its recovered special cases.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated.

read point-by-point responses

Referee: [Theoretical derivations (multiple sections referenced in abstract)] The central claim of applicability to general compositions f ∘ g with a single gradient rests on unstated assumptions (e.g., linearity of the outer function, bounded higher-order terms, or specific differentiability conditions). No section explicitly enumerates these assumptions or proves the formula holds beyond special cases.

Authors: The derivations rely on the standard delta-method approximation, which assumes the outer function is differentiable and that higher-order terms are negligible for small perturbations around the mean. These conditions are implicit in the linearization step used throughout the paper. We will add an explicit subsection enumerating the assumptions and the scope under which the general formula applies to compositions. revision: yes
Referee: [Empirical evaluation (weather simulator example)] Empirical support is limited to a single weather-simulator case with an NN step function. No additional experiments test nonlinear outer functions, deeper inner networks, or other compositions to substantiate the generalization claim.

Authors: The weather-simulator example was selected to illustrate a practical composite model with an NN-based step function. The results demonstrate competitive epistemic uncertainty estimates at the cost of one gradient computation. We acknowledge the limited scope and will expand the discussion section to address generalization limits and outline conditions under which the approach extends to other compositions, without adding new experiments at this stage. revision: partial
Referee: [Unified perspective and extension] The unified perspective and natural extension are presented as arising from the general view, but without explicit comparison tables or ablation studies showing how the extension improves over the base Delta Variances or recovered special cases, the benefit remains under-supported.

Authors: The unified view is obtained by recovering existing methods as special cases via the different derivations. The empirical benefit of the extension is shown on the weather example. We will add a comparison table of recovered special cases and an ablation study quantifying the extension's improvement in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations presented as independent theoretical unifications

full rationale

The paper states it derives Delta Variances in multiple ways, with special cases recovering known techniques and a unified perspective on related methods. No quoted equations or text exhibit self-definitional reductions (e.g., a quantity defined in terms of itself), fitted parameters renamed as predictions, or load-bearing self-citations whose justification collapses to the current work. The approach is described as applying to general NN compositions without architectural changes, supported by theoretical discussion and one empirical example, making the derivation chain self-contained against external benchmarks rather than circular by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5682 in / 1008 out tokens · 35292 ms · 2026-05-23T02:36:55.033529+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3): 235--256

work page 2002
[2]

Bernstein, S. N. 1917. The Theory of Probabilities

work page 1917
[3]

Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; and Wierstra, D. 2015. Weight Uncertainty in Neural Network. In Bach, F.; and Blei, D., eds., Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, 1613--1622. Lille, France: PMLR

work page 2015
[4]

D.; and Weisberg, S

Cook, R. D.; and Weisberg, S. 1982. Residuals and Influence in Regression. Monographs on Statistics & Applied Probability. Chapman & Hall. ISBN 9780412242809

work page 1982
[5]

Cotes, R. 1722. Harmonia Mensurarum. Robert Smith

work page
[6]

D.; Schumi, J.; Schweinsberg, J.; and Ungar, L

de Veaux, R. D.; Schumi, J.; Schweinsberg, J.; and Ungar, L. H. 1998. Prediction Intervals for Neural Networks via Nonlinear Regression. Technometrics, 40(4): 273--282

work page 1998
[7]

Denker, J.; and LeCun, Y. 1990. Transforming Neural-Net Output Levels to Probability Distributions. In Lippmann, R.; Moody, J.; and Touretzky, D., eds., Advances in Neural Information Processing Systems, volume 3. Morgan-Kaufmann

work page 1990
[8]

Doob, J. L. 1935. The limiting distributions of certain statistics. Ann. Math. Stat., 6(3): 160--169

work page 1935
[9]

Dorfman, R. 1938. A note on the delta-method for finding variance formulae. Biometric Bulletin

work page 1938
[10]

Duff, M. 2002. Optimal Learning: Computational procedures for Bayes -adaptive Markov decision processes . Ph.D. thesis, University of Massachusetts Amherst

work page 2002
[11]

Huber Sandwich Estimator

Freedman, D. A. 2006. On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors”. The American Statistician, 60(4): 299--302

work page 2006
[12]

Gal, Y.; and Ghahramani, Z. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Balcan, M. F.; and Weinberger, K. Q., eds., Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, 1050--1059. New York, New York, USA: PMLR

work page 2016
[13]

Gauss, C. 1823. Theoria combinationis observationum erroribus minimis obnoxiae. H. Dieterich

work page
[14]

Gorroochurn, P. 2020. Who invented the delta method, really? Math. Intelligencer, 42(3): 46--49

work page 2020
[15]

Gull, S. F. 1989. Developments in Maximum Entropy Data Analysis, 53--71. Dordrecht: Springer Netherlands. ISBN 978-94-015-7860-8

work page 1989
[16]

Hampel, F. R. 1974. The Influence Curve and Its Role in Robust Estimation. Journal of the American Statistical Association, 69(346): 383--393

work page 1974
[17]

Heger, M. 1994. Consideration of Risk in Reinforcement Learning. In Machine Learning: Proceedings of the 11th International Conference, 105--111. Morgan Kaufmann Publishers, San Francisco, CA

work page 1994
[18]

Hodges, J. L. 1967. Efficiency in normal samples and tolerance of extreme values for some estimates of location. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 163--–186. Berkeley. University of California Press

work page 1967
[19]

Hwang, J. T. G.; and Ding, A. A. 1997. Prediction Intervals for Artificial Neural Networks. Journal of the American Statistical Association, 92(438): 748--757

work page 1997
[20]

Immer, A.; Korzepa, M.; and Bauer, M. 2021. Improving predictions of Bayesian neural nets via local linearization. In Banerjee, A.; and Fukumizu, K., eds., Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, 703--711. PMLR

work page 2021
[21]

Jaeckel, L. 1972. The Infinitesimal Jackknife. Bell Lab. Memorandum, MM72-1215-11

work page 1972
[22]

Kallus, N.; and McInerney, J. 2022. The Implicit Delta Method. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems, volume 35, 37471--37483. Curran Associates, Inc

work page 2022
[23]

Kelley, T. L. 1928. Crossroads in the mind of man; a study of differentiable mental abilities. Palo Alto: Stanford Univ. Press

work page 1928
[24]

Kleijn, B.; and van der Vaart, A. 2012. The Bernstein-Von-Mises theorem under misspecification . Electronic Journal of Statistics, 6(none): 354 -- 381

work page 2012
[25]

W.; and Liang, P

Koh, P. W.; and Liang, P. 2017. Understanding Black-box Predictions via Influence Functions. In Precup, D.; and Teh, Y. W., eds., Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 1885--1894. PMLR

work page 2017
[26]

Lakshminarayanan, B.; Pritzel, A.; and Blundell, C. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc

work page 2017
[27]

Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; Merose, A.; Hoyer, S.; Holland, G.; Vinyals, O.; Stott, J.; Pritzel, A.; Mohamed, S.; and Battaglia, P. 2023. Learning skillful medium-range global weather forecasting. Science, 382(6677): 1416--1421

work page 2023
[28]

a ge zum Gebrauche der Mathematik und deren Anwendung , volume 1, chapter 13 of Beytr \

Lambert, J. H. 1765. Beytr \"a ge zum Gebrauche der Mathematik und deren Anwendung , volume 1, chapter 13 of Beytr \"a ge zum Gebrauche der Mathematik und deren Anwendung . Verlag des Buchladens der Realschule

work page
[29]

Laplace, P. S. 1774. Mémoire sur la probabilité des causes par les événements. Mémoires de Mathématique et de Physique, 6

work page
[30]

Le Cam, L. 1953. On some asymptotic properties of maximum likelihood estimates and related Baye 's estimates . University of California Press, Berkeley

work page 1953
[31]

MacKay, D. J. C. 1992 a . Information-based objective functions for active data selection. Neural Computation, 4(2): 550--604

work page 1992
[32]

MacKay, D. J. C. 1992 b . A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4: 448--472

work page 1992
[33]

J.; Izmailov, P.; Garipov, T.; Vetrov, D

Maddox, W. J.; Izmailov, P.; Garipov, T.; Vetrov, D. P.; and Wilson, A. G. 2019. A Simple Baseline for Bayesian Uncertainty in Deep Learning. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d Alch\' e -Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc

work page 2019
[34]

Magnus, J. R. 1985. On Differentiating Eigenvalues and Eigenvectors. Econometric Theory, 1(2): 179--191

work page 1985
[35]

Mahalanobis, P. C. 1936. On The Generalized Distance in Statistics. Sankhyā: The Indian Journal of Statistics, Series A (2008-), 80: pp. S1--S7

work page 1936
[36]

Martens, J. 2014. New perspectives on the natural gradient method. CoRR, abs/1412.1193

work page arXiv 2014
[37]

Martens, J.; and Grosse, R. B. 2015. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, 2408--2417

work page 2015
[38]

Miller, R. G. 1974. The Jackknife--A Review. Biometrika, 61(1): 1--15

work page 1974
[39]

K.; Munthe-Kaas, A

Nilsen, G. K.; Munthe-Kaas, A. Z.; Skaug, H. J.; and Brun, M. 2022. Epistemic uncertainty quantification in deep learning classification by the Delta method. Neural Networks, 145: 164--176

work page 2022
[40]

M.; Dwaracherla, V.; IBRAHIMI, M.; Lu, X.; and Van Roy, B

Osband, I.; Wen, Z.; Asghari, S. M.; Dwaracherla, V.; IBRAHIMI, M.; Lu, X.; and Van Roy, B. 2023. Epistemic Neural Networks. In Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; and Levine, S., eds., Advances in Neural Information Processing Systems, volume 36, 2795--2823. Curran Associates, Inc

work page 2023
[41]

Quenouille, M. H. 1949. Approximate Tests of Correlation in Time-Series. Journal of the Royal Statistical Society. Series B (Methodological), 11(1): 68--84

work page 1949
[42]

Ritter, H.; Botev, A.; and Barber, D. 2018. A Scalable Laplace Approximation for Neural Networks. In International Conference on Learning Representations

work page 2018
[43]

Schnaus, D.; Lee, J.; Cremers, D.; and Triebel, R. 2023. Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks. In Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; and Scarlett, J., eds., Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning ...

work page 2023
[44]

Sun, Y.; Ming, Y.; Zhu, X.; and Li, Y. 2022. Out-of-Distribution Detection with Deep Nearest Neighbors. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 20827--20840. PMLR

work page 2022
[45]

Tibshirani, R. 1996. A Comparison of Some Error Estimates for Neural Network Models. Neural Computation, 8(1): 152--163

work page 1996
[46]

Tishby, N.; Levin, E.; and Solla, S. 1989. Consistent inference of probabilities in layered networks: Predictions and generalization. In Anon, ed., IJCNN Int Jt Conf Neural Network, 403--409. Publ by IEEE. IJCNN International Joint Conference on Neural Networks ; Conference date: 18-06-1989 Through 22-06-1989

work page 1989
[47]

Tukey, J. W. 1958. Bias and confidence in not-quite large samples (abstract). j-ANN-MATH-STAT, 29(2): 614--614

work page 1958
[48]

W.; and Gal, Y

Van Amersfoort, J.; Smith, L.; Teh, Y. W.; and Gal, Y. 2020. Uncertainty estimation using a single deep deterministic neural network. In Proceedings of the 37th International Conference on Machine Learning, ICML'20. JMLR.org

work page 2020
[49]

van der Vaart, A. W. 1998. Asymptotic Statistics. Cambridge University Press

work page 1998
[50]

von Mises, R. 1931. Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik , volume 1. Franz Deuticke

work page 1931
[51]

Wright, S. 1934. The method of path coefficients. Ann. Math. Stat., 5(3): 161--215

work page 1934
[52]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[53]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3): 235--256

work page 2002

[2] [2]

Bernstein, S. N. 1917. The Theory of Probabilities

work page 1917

[3] [3]

Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; and Wierstra, D. 2015. Weight Uncertainty in Neural Network. In Bach, F.; and Blei, D., eds., Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, 1613--1622. Lille, France: PMLR

work page 2015

[4] [4]

D.; and Weisberg, S

Cook, R. D.; and Weisberg, S. 1982. Residuals and Influence in Regression. Monographs on Statistics & Applied Probability. Chapman & Hall. ISBN 9780412242809

work page 1982

[5] [5]

Cotes, R. 1722. Harmonia Mensurarum. Robert Smith

work page

[6] [6]

D.; Schumi, J.; Schweinsberg, J.; and Ungar, L

de Veaux, R. D.; Schumi, J.; Schweinsberg, J.; and Ungar, L. H. 1998. Prediction Intervals for Neural Networks via Nonlinear Regression. Technometrics, 40(4): 273--282

work page 1998

[7] [7]

Denker, J.; and LeCun, Y. 1990. Transforming Neural-Net Output Levels to Probability Distributions. In Lippmann, R.; Moody, J.; and Touretzky, D., eds., Advances in Neural Information Processing Systems, volume 3. Morgan-Kaufmann

work page 1990

[8] [8]

Doob, J. L. 1935. The limiting distributions of certain statistics. Ann. Math. Stat., 6(3): 160--169

work page 1935

[9] [9]

Dorfman, R. 1938. A note on the delta-method for finding variance formulae. Biometric Bulletin

work page 1938

[10] [10]

Duff, M. 2002. Optimal Learning: Computational procedures for Bayes -adaptive Markov decision processes . Ph.D. thesis, University of Massachusetts Amherst

work page 2002

[11] [11]

Huber Sandwich Estimator

Freedman, D. A. 2006. On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors”. The American Statistician, 60(4): 299--302

work page 2006

[12] [12]

Gal, Y.; and Ghahramani, Z. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Balcan, M. F.; and Weinberger, K. Q., eds., Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, 1050--1059. New York, New York, USA: PMLR

work page 2016

[13] [13]

Gauss, C. 1823. Theoria combinationis observationum erroribus minimis obnoxiae. H. Dieterich

work page

[14] [14]

Gorroochurn, P. 2020. Who invented the delta method, really? Math. Intelligencer, 42(3): 46--49

work page 2020

[15] [15]

Gull, S. F. 1989. Developments in Maximum Entropy Data Analysis, 53--71. Dordrecht: Springer Netherlands. ISBN 978-94-015-7860-8

work page 1989

[16] [16]

Hampel, F. R. 1974. The Influence Curve and Its Role in Robust Estimation. Journal of the American Statistical Association, 69(346): 383--393

work page 1974

[17] [17]

Heger, M. 1994. Consideration of Risk in Reinforcement Learning. In Machine Learning: Proceedings of the 11th International Conference, 105--111. Morgan Kaufmann Publishers, San Francisco, CA

work page 1994

[18] [18]

Hodges, J. L. 1967. Efficiency in normal samples and tolerance of extreme values for some estimates of location. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 163--–186. Berkeley. University of California Press

work page 1967

[19] [19]

Hwang, J. T. G.; and Ding, A. A. 1997. Prediction Intervals for Artificial Neural Networks. Journal of the American Statistical Association, 92(438): 748--757

work page 1997

[20] [20]

Immer, A.; Korzepa, M.; and Bauer, M. 2021. Improving predictions of Bayesian neural nets via local linearization. In Banerjee, A.; and Fukumizu, K., eds., Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, 703--711. PMLR

work page 2021

[21] [21]

Jaeckel, L. 1972. The Infinitesimal Jackknife. Bell Lab. Memorandum, MM72-1215-11

work page 1972

[22] [22]

Kallus, N.; and McInerney, J. 2022. The Implicit Delta Method. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems, volume 35, 37471--37483. Curran Associates, Inc

work page 2022

[23] [23]

Kelley, T. L. 1928. Crossroads in the mind of man; a study of differentiable mental abilities. Palo Alto: Stanford Univ. Press

work page 1928

[24] [24]

Kleijn, B.; and van der Vaart, A. 2012. The Bernstein-Von-Mises theorem under misspecification . Electronic Journal of Statistics, 6(none): 354 -- 381

work page 2012

[25] [25]

W.; and Liang, P

Koh, P. W.; and Liang, P. 2017. Understanding Black-box Predictions via Influence Functions. In Precup, D.; and Teh, Y. W., eds., Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 1885--1894. PMLR

work page 2017

[26] [26]

Lakshminarayanan, B.; Pritzel, A.; and Blundell, C. 2017. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc

work page 2017

[27] [27]

Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; Merose, A.; Hoyer, S.; Holland, G.; Vinyals, O.; Stott, J.; Pritzel, A.; Mohamed, S.; and Battaglia, P. 2023. Learning skillful medium-range global weather forecasting. Science, 382(6677): 1416--1421

work page 2023

[28] [28]

a ge zum Gebrauche der Mathematik und deren Anwendung , volume 1, chapter 13 of Beytr \

Lambert, J. H. 1765. Beytr \"a ge zum Gebrauche der Mathematik und deren Anwendung , volume 1, chapter 13 of Beytr \"a ge zum Gebrauche der Mathematik und deren Anwendung . Verlag des Buchladens der Realschule

work page

[29] [29]

Laplace, P. S. 1774. Mémoire sur la probabilité des causes par les événements. Mémoires de Mathématique et de Physique, 6

work page

[30] [30]

Le Cam, L. 1953. On some asymptotic properties of maximum likelihood estimates and related Baye 's estimates . University of California Press, Berkeley

work page 1953

[31] [31]

MacKay, D. J. C. 1992 a . Information-based objective functions for active data selection. Neural Computation, 4(2): 550--604

work page 1992

[32] [32]

MacKay, D. J. C. 1992 b . A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4: 448--472

work page 1992

[33] [33]

J.; Izmailov, P.; Garipov, T.; Vetrov, D

Maddox, W. J.; Izmailov, P.; Garipov, T.; Vetrov, D. P.; and Wilson, A. G. 2019. A Simple Baseline for Bayesian Uncertainty in Deep Learning. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d Alch\' e -Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc

work page 2019

[34] [34]

Magnus, J. R. 1985. On Differentiating Eigenvalues and Eigenvectors. Econometric Theory, 1(2): 179--191

work page 1985

[35] [35]

Mahalanobis, P. C. 1936. On The Generalized Distance in Statistics. Sankhyā: The Indian Journal of Statistics, Series A (2008-), 80: pp. S1--S7

work page 1936

[36] [36]

Martens, J. 2014. New perspectives on the natural gradient method. CoRR, abs/1412.1193

work page arXiv 2014

[37] [37]

Martens, J.; and Grosse, R. B. 2015. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, 2408--2417

work page 2015

[38] [38]

Miller, R. G. 1974. The Jackknife--A Review. Biometrika, 61(1): 1--15

work page 1974

[39] [39]

K.; Munthe-Kaas, A

Nilsen, G. K.; Munthe-Kaas, A. Z.; Skaug, H. J.; and Brun, M. 2022. Epistemic uncertainty quantification in deep learning classification by the Delta method. Neural Networks, 145: 164--176

work page 2022

[40] [40]

M.; Dwaracherla, V.; IBRAHIMI, M.; Lu, X.; and Van Roy, B

Osband, I.; Wen, Z.; Asghari, S. M.; Dwaracherla, V.; IBRAHIMI, M.; Lu, X.; and Van Roy, B. 2023. Epistemic Neural Networks. In Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; and Levine, S., eds., Advances in Neural Information Processing Systems, volume 36, 2795--2823. Curran Associates, Inc

work page 2023

[41] [41]

Quenouille, M. H. 1949. Approximate Tests of Correlation in Time-Series. Journal of the Royal Statistical Society. Series B (Methodological), 11(1): 68--84

work page 1949

[42] [42]

Ritter, H.; Botev, A.; and Barber, D. 2018. A Scalable Laplace Approximation for Neural Networks. In International Conference on Learning Representations

work page 2018

[43] [43]

Schnaus, D.; Lee, J.; Cremers, D.; and Triebel, R. 2023. Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks. In Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; and Scarlett, J., eds., Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning ...

work page 2023

[44] [44]

Sun, Y.; Ming, Y.; Zhu, X.; and Li, Y. 2022. Out-of-Distribution Detection with Deep Nearest Neighbors. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 20827--20840. PMLR

work page 2022

[45] [45]

Tibshirani, R. 1996. A Comparison of Some Error Estimates for Neural Network Models. Neural Computation, 8(1): 152--163

work page 1996

[46] [46]

Tishby, N.; Levin, E.; and Solla, S. 1989. Consistent inference of probabilities in layered networks: Predictions and generalization. In Anon, ed., IJCNN Int Jt Conf Neural Network, 403--409. Publ by IEEE. IJCNN International Joint Conference on Neural Networks ; Conference date: 18-06-1989 Through 22-06-1989

work page 1989

[47] [47]

Tukey, J. W. 1958. Bias and confidence in not-quite large samples (abstract). j-ANN-MATH-STAT, 29(2): 614--614

work page 1958

[48] [48]

W.; and Gal, Y

Van Amersfoort, J.; Smith, L.; Teh, Y. W.; and Gal, Y. 2020. Uncertainty estimation using a single deep deterministic neural network. In Proceedings of the 37th International Conference on Machine Learning, ICML'20. JMLR.org

work page 2020

[49] [49]

van der Vaart, A. W. 1998. Asymptotic Statistics. Cambridge University Press

work page 1998

[50] [50]

von Mises, R. 1931. Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik , volume 1. Franz Deuticke

work page 1931

[51] [51]

Wright, S. 1934. The method of path coefficients. Ann. Math. Stat., 5(3): 161--215

work page 1934

[52] [52]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[53] [53]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page