Position: The Time for Sampling Is Now! Charting a New Course for Bayesian Deep Learning
Pith reviewed 2026-05-22 09:19 UTC · model grok-4.3
The pith
Sampling-based inference has reached computational parity with optimization-based methods for Bayesian neural networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that SAI has achieved computational parity with optimization-based methods and is at the verge of superseding such methods for effective and efficient inference in BNNs. SAI can yield superior prediction performance through model averaging, serve as the foundation for a plethora of possible downstream tasks, and provide crucial insights into the landscape of BNNs. Overcoming misconceptions is a necessary first step, and the community must focus on sufficient exploration of the posterior landscape and high-fidelity distillation of posterior samples.
What carries the argument
Sufficient exploration of the posterior landscape and high-fidelity distillation of posterior samples as the two core problems whose solution will unlock the advantages of sampling-based inference in BNNs.
If this is right
- BNNs become a more widely adopted principled paradigm for uncertainty quantification.
- Superior prediction performance is achieved through model averaging.
- SAI provides the foundation for various downstream tasks.
- Crucial insights into the BNN landscape become available.
Where Pith is reading between the lines
- Further research could explore how posterior exploration techniques from statistics integrate with neural network training.
- Efficient sample distillation might enable scalable Bayesian inference in large-scale models.
- This approach could influence uncertainty handling in applications requiring robust decision making under uncertainty.
Load-bearing premise
The primary barriers to wider use of sampling-based inference are persistent misconceptions about its feasibility and efficiency.
What would settle it
Empirical evidence demonstrating that sampling-based inference remains substantially more computationally expensive or produces inferior results compared to optimization-based methods in practical BNN applications would disprove the position.
Figures
read the original abstract
The practical adoption of sampling-based inference (SAI) in Bayesian neural networks (BNNs) remains limited, partly due to persistent misconceptions about the feasibility and efficiency of sampling. This position paper argues that SAI has achieved computational parity with optimization-based methods and is at the verge of superseding such methods for effective and efficient inference in BNNs. This development should be in the interest of the whole community, promoting BNNs as a principled paradigm with its long-standing yet unfulfilled promise of providing principled uncertainty quantification for neural networks. SAI can even do more -- yielding superior prediction performance through model averaging, serving as the foundation for a plethora of possible downstream tasks, and providing crucial insights into the landscape of BNNs. In order to make such a change happen and unfold the potential of sampling, overcoming current misconceptions is a necessary first step. The next step is to realign research efforts toward addressing remaining challenges in SAI. In particular, the community must focus on two core problems: sufficient exploration of the posterior landscape and high-fidelity distillation of posterior samples for efficient downstream inference. By addressing conceptual and practical obstacles, we can unlock the full potential of SAI and establish it as a central tool in Bayesian deep learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This position paper argues that sampling-based inference (SAI) in Bayesian neural networks (BNNs) has now reached computational parity with optimization-based methods, that misconceptions remain the primary barrier to adoption, and that the community should therefore redirect efforts toward sufficient posterior exploration and high-fidelity sample distillation to realize SAI's advantages in uncertainty quantification, model averaging, and downstream tasks.
Significance. If the parity claim is supported by the cited literature, the paper could usefully reorient Bayesian deep learning research toward sampling methods, potentially delivering the long-promised principled uncertainty estimates and the additional benefits of model averaging. The explicit identification of two concrete research problems (posterior exploration and sample distillation) supplies a clear agenda. The manuscript offers no new algorithms, proofs, or experiments but synthesizes an argument for a shift in priorities.
major comments (1)
- [Abstract] Abstract: the claim that 'SAI has achieved computational parity with optimization-based methods' is presented as an established fact without accompanying benchmarks, derivations, or explicit citations to large-scale comparisons (wall-clock time, memory, or ESS per unit resource) on modern architectures such as ResNets or transformers; this assertion is load-bearing for the subsequent recommendation to realign research efforts.
minor comments (1)
- The abstract would be strengthened by a single sentence or footnote listing the key prior works that are taken to establish parity, so readers can immediately locate the supporting evidence.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review of our position paper. The feedback highlights a key point about the presentation of our central claim, which we address directly below. We have revised the manuscript to strengthen the supporting evidence while preserving the overall argument.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'SAI has achieved computational parity with optimization-based methods' is presented as an established fact without accompanying benchmarks, derivations, or explicit citations to large-scale comparisons (wall-clock time, memory, or ESS per unit resource) on modern architectures such as ResNets or transformers; this assertion is load-bearing for the subsequent recommendation to realign research efforts.
Authors: We appreciate the referee drawing attention to the need for clearer grounding of this claim. The abstract is intentionally concise, but the full manuscript synthesizes results from recent literature on scalable sampling methods (e.g., variants of HMC, SG-MCMC, and ensemble-based approaches) that report wall-clock times, memory footprints, and effective sample sizes competitive with standard optimization on ResNet-scale models and other modern architectures. These works are cited in Sections 2 and 3. We acknowledge that explicit head-to-head transformer benchmarks remain limited in the cited studies and that the abstract itself did not include direct pointers. In the revised version we have (i) added two key citations to the abstract, (ii) inserted a short paragraph in the introduction that summarizes the relevant efficiency metrics from the literature, and (iii) clarified that the parity claim refers to practical, end-to-end training and inference costs rather than theoretical asymptotic rates. This is a partial revision: we retain the position that parity has been reached in many contemporary settings, but we now make the evidentiary basis more explicit. revision: partial
Circularity Check
Position paper states external field assessment; no derivation or self-referential reduction present
full rationale
The document is a position paper whose central claim—that sampling-based inference has reached computational parity with optimization methods—is framed as an observation drawn from the existing literature rather than a quantity constructed or fitted inside the paper. No equations, parameter fits, or predictive steps appear in the provided text or abstract. The call to address posterior exploration and sample distillation follows directly from the stated position without reducing to a tautology, renamed input, or load-bearing self-citation chain. The argument remains self-contained as a synthesis of external developments and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Sampling-based inference has achieved computational parity with optimization-based methods in Bayesian neural networks.
- ad hoc to paper The main obstacles to adoption are misconceptions together with insufficient posterior exploration and high-fidelity sample distillation.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This position paper argues that SAI has achieved computational parity with optimization-based methods and is at the verge of superseding such methods for effective and efficient inference in BNNs.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
bde: A Python Package for Bayesian Deep Ensembles via MILE
Arvanitis, V ., Aslanidis, A., Sommer, E., and Rügamer, D. bde: A Python Package for Bayesian Deep Ensembles via MILE.arXiv preprint arXiv:2605.14146,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., 9 Position: The Time for Sampling Is Now! Charting a New Course for Bayesian Deep Learning Askell, A., Agarwal, S., Herbert-V oss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigl...
work page 1901
-
[3]
Laplace Redux – Effortless Bayesian Deep Learning
Daxberger, E., Kristiadi, A., Immer, A., Eschenhagen, R., Bauer, M., and Hennig, P. Laplace Redux – Effortless Bayesian Deep Learning. In35th Conference on Neural Information Processing Systems (NeurIPS 2021), 2021a. Daxberger, E., Nalisnick, E., Allingham, J. U., Antorán, J., and Hernández-Lobato, J. M. Bayesian Deep Learning via Subnetwork Inference. In...
work page 2021
-
[4]
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., and Wilson, A. G. Averaging Weights Leads to Wider Optima and Better Generalization. InProceedings of the 34th Conference on Uncertainty in Artificial Intelligence 2018,
work page 2018
-
[5]
Jawla, D. and Kelleher, J. Layer wise scaled gaussian priors for markov chain monte carlo sampled deep bayesian neural networks.Frontiers in Artificial Intelli- gence, V olume 8 - 2025,
work page 2025
-
[6]
doi: 10.3389/frai.2025.1444891
ISSN 2624-8212. doi: 10.3389/frai.2025.1444891. Kaiser, J., Schwethelm, K., Rueckert, D., and Kaissis, G. Laplace sample information: Data informativeness through a bayesian lens. InThe Thirteenth International Conference on Learning Representations,
-
[7]
Li, G., Lin, G., Zhang, Z., and Zhou, Q
URLhttps://arxiv.org/abs/2504.18911. Li, G., Lin, G., Zhang, Z., and Zhou, Q. Fast replica ex- change stochastic gradient langevin dynamics.arXiv preprint arXiv:2301.01898,
-
[9]
Rønning, O., Nalisnick, E., Ley, C., Smyth, P., and Hamel- ryck, T
URL https: //arxiv.org/abs/2412.08876. Rønning, O., Nalisnick, E., Ley, C., Smyth, P., and Hamel- ryck, T. ELBOing stein: Variational bayes with stein mixture inference. InThe Thirteenth International Con- ference on Learning Representations,
-
[10]
Sheinkman, A. and Wade, S. The architecture and eval- uation of bayesian neural networks.arXiv preprint arXiv:2503.11808,
-
[11]
Wang, T., Zhu, J.-Y ., Torralba, A., and Efros, A. A. Dataset distillation.arXiv preprint arXiv:1811.10959,
work page internal anchor Pith review arXiv
-
[12]
Willard, B. T. and Louf, R. Efficient guided generation for large language models.arXiv preprint arXiv:2307.09702,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
13 Position: The Time for Sampling Is Now! Charting a New Course for Bayesian Deep Learning A. Experimental Details Runtime IllustrationTo calculate the runtime results shown in Figure 2 we were using the code from Kobialka et al. (2026). We adopt the same UCI benchmark setting as in their Table 2, use the airfoil dataset (Dua & Graff,
work page 2026
-
[14]
and extend their analysis by reporting average runtimes measured on a standard 10-core CPU. To ensure a fair comparison under equal computational budgets, we additionally include a 35-member Deep Ensemble—the most performant competing method. B. Clarification on Terminology: SAI vs. SBI Throughout this paper, we utilize the acronym SAI to denote Sampling-...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.