pith. machine review for the scientific record. sign in

arxiv: 2605.11168 · v2 · submitted 2026-05-11 · 📊 stat.ME · stat.CO· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Variational predictive resampling

Chris Holmes, David T. Frazier, Jack Jewson, Laura Battaglia, Stefano Cortinovis

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:47 UTC · model grok-4.3

classification 📊 stat.ME stat.COstat.ML
keywords variational inferencepredictive resamplingBayesian posteriormean-field approximationposterior samplingGaussian location modeluncertainty quantificationregression models
0
0 comments X

The pith

Variational predictive resampling converges to the exact Bayesian posterior where mean-field variational inference retains an asymptotic gap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bayesian inference supplies principled uncertainty but MCMC sampling becomes prohibitive for many modern models. Variational inference scales efficiently yet mean-field families typically produce over-concentrated posteriors that miss dependence. Variational predictive resampling addresses this by repeatedly imputing future observations from the current variational predictive, updating the variational approximation after each imputation, and collecting the implied parameter values. In a Gaussian location model the resulting distribution converges to the exact posterior, while the optimal mean-field variational approximation keeps a fixed error. Experiments on linear regression, logistic regression, and hierarchical mixed-effects models show that the method recovers missed posterior dependence and improves uncertainty quantification at costs competitive with MCMC.

Core claim

Given a prior-likelihood pair, variational predictive resampling imputes future observations from the current variational predictive, updates the variational approximation after each imputation, and records the parameter value implied by the completed sample. Under stated conditions the law of the returned parameter is well-defined and its finite-horizon approximations converge to this limit. In the tractable Gaussian location model with mean-field variational predictives, the limiting distribution equals the exact Bayesian posterior, whereas the optimal mean-field variational approximation retains a non-vanishing asymptotic gap.

What carries the argument

The predictive resampling loop that imputes future observations from the current variational predictive and updates the variational approximation to define the limiting distribution of the parameter.

If this is right

  • In linear and logistic regression VPR recovers posterior dependence that mean-field variational inference misses.
  • In hierarchical linear mixed-effects models VPR improves posterior uncertainty quantification relative to mean-field variational inference.
  • VPR remains computationally competitive with, and often faster than, MCMC while delivering better posterior approximations.
  • Finite-horizon truncations of VPR converge to the limiting distribution as the horizon grows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The convergence property could serve as a benchmark for testing when richer variational families are needed beyond mean-field.
  • The imputation mechanism may extend naturally to models where exact conditionals are unavailable, provided a good variational predictive can still be formed.
  • Scaling the number of imputations with data dimension might reveal trade-offs between accuracy and compute in high-dimensional settings.

Load-bearing premise

Imputing future observations from the current variational predictive produces a well-defined limiting distribution for the parameter that equals the true posterior under the given prior-likelihood conditions.

What would settle it

In the Gaussian location model, increase the number of imputations in variational predictive resampling and verify whether the empirical distribution of collected parameter values matches the analytically known exact posterior within Monte Carlo error, while the optimal mean-field variational approximation does not.

Figures

Figures reproduced from arXiv: 2605.11168 by Chris Holmes, David T. Frazier, Jack Jewson, Laura Battaglia, Stefano Cortinovis.

Figure 1
Figure 1. Figure 1: Logistic regression posterior. The lower triangle, diagonal, and upper triangle show, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: Empirical coverage vs. d. Right: Sampling efficiency vs. d (log-scale y-axis): min ESS/s over coefficients for MCMC, paths/s for VPR. 95% CI error bars, n = 3d. 4.2 Non-conjugate model: logistic regression Bayesian logistic regression with a multivariate Gaussian prior on the coefficients β ∈ R d is a classical non-conjugate inference example where MF-VI is not available in closed form. As a result, … view at source ↗
read the original abstract

Bayesian inference provides principled uncertainty quantification, but accurate posterior sampling with MCMC can be computationally prohibitive for modern applications. Variational inference (VI) offers a scalable alternative and often yields accurate predictive distributions, but cheap variational families such as mean-field (MF) can produce over-concentrated approximations that miss posterior dependence. We propose variational predictive resampling (VPR), a scalable posterior sampling method that exploits VI's predictive strength within a predictive-resampling framework to better approximate the Bayesian posterior. Given a prior-likelihood pair, VPR repeatedly imputes future observations from the current variational predictive, updates the variational approximation after each imputation, and records the parameter value implied by the completed sample. We establish conditions under which the law of the parameter returned by VPR is well defined and show that its finite-horizon approximation converges to this limit. In a tractable Gaussian location model, we show that VPR with MF variational predictives converges to the exact Bayesian posterior, whereas the optimal MF-VI approximation retains a non-vanishing asymptotic gap. Experiments on linear regression, logistic regression, and hierarchical linear mixed-effects models demonstrate that VPR substantially improves posterior uncertainty quantification and recovers posterior dependence missed by MF-VI, while remaining computationally competitive with, and often more efficient than, MCMC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes variational predictive resampling (VPR), a method that iteratively imputes future observations from the current variational predictive, updates the variational approximation after each imputation, and records the implied parameter value to approximate the posterior. It establishes conditions for the limiting law to be well-defined and for finite-horizon approximations to converge to it, proves that in a Gaussian location model VPR with mean-field variational predictives recovers the exact posterior (unlike optimal MF-VI, which retains an asymptotic gap), and reports empirical gains in posterior uncertainty quantification and dependence recovery for linear regression, logistic regression, and hierarchical linear mixed-effects models while remaining competitive with MCMC.

Significance. If the central convergence claims hold, the work provides a scalable bridge between variational inference and sampling that exploits VI's predictive accuracy to correct mean-field deficiencies such as over-concentration and missed dependence. The exact recovery result in the Gaussian case would be a notable theoretical contribution, and the empirical improvements suggest practical value for uncertainty quantification in models where full MCMC is costly.

major comments (2)
  1. [Gaussian location model analysis] Gaussian location model section: the claim that VPR with MF variational predictives converges to the exact posterior (while MF-VI retains a gap) requires an explicit derivation of the stationary distribution induced by the imputation-update process; because the MF family cannot represent posterior dependence, the mechanism by which conjugacy recovers the full posterior law must be shown via the fixed-point equation or limiting measure to rule out implicit bias from the predictive imputation step.
  2. [Theoretical results] Convergence theorem (abstract and theoretical development): the conditions under which the law of the returned parameter is well-defined and finite-horizon approximations converge need to be stated with precise assumptions on the prior-likelihood pair and variational family; without these, it is unclear whether the result extends beyond the Gaussian case or relies on unstated regularity conditions.
minor comments (2)
  1. [Method description] Clarify notation for the variational predictive versus the updated approximation after each imputation to avoid ambiguity in the algorithmic description.
  2. [Experiments] In the experimental sections, report effective sample sizes or convergence diagnostics for the MCMC baselines to make efficiency comparisons more precise.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and positive overall assessment. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and derivations.

read point-by-point responses
  1. Referee: Gaussian location model section: the claim that VPR with MF variational predictives converges to the exact posterior (while MF-VI retains a gap) requires an explicit derivation of the stationary distribution induced by the imputation-update process; because the MF family cannot represent posterior dependence, the mechanism by which conjugacy recovers the full posterior law must be shown via the fixed-point equation or limiting measure to rule out implicit bias from the predictive imputation step.

    Authors: We agree that an explicit derivation of the stationary distribution will strengthen the presentation and clarify the mechanism. In the revised manuscript we will add a detailed derivation of the fixed-point equation for the imputation-update process in the Gaussian location model. This will explicitly show how conjugacy propagates the full posterior law through the limiting measure, even though each variational predictive is mean-field, thereby confirming the absence of implicit bias from the imputation step. revision: yes

  2. Referee: Convergence theorem (abstract and theoretical development): the conditions under which the law of the returned parameter is well-defined and finite-horizon approximations converge need to be stated with precise assumptions on the prior-likelihood pair and variational family; without these, it is unclear whether the result extends beyond the Gaussian case or relies on unstated regularity conditions.

    Authors: We thank the referee for this observation. While the manuscript states the conditions under which the law is well-defined and finite-horizon approximations converge, we will revise the theoretical development section to list the precise assumptions explicitly (including requirements on the prior-likelihood pair and regularity conditions on the variational family). We will also clarify the scope of the general result and its relation to the Gaussian case. revision: yes

Circularity Check

0 steps flagged

No significant circularity; convergence derived from process definition and model assumptions

full rationale

The paper defines VPR via repeated imputation from the current variational predictive, variational updates, and recording of the implied parameter. It then establishes conditions for a well-defined limiting law and proves convergence of the finite-horizon version to that limit. In the Gaussian location model the proof shows the MF-VPR limit coincides with the exact posterior (while optimal MF-VI does not). This is a direct consequence of the update rule plus conjugacy; it does not reduce to a fitted quantity renamed as a prediction, nor to a self-citation chain, nor to an ansatz smuggled in. No load-bearing self-citations or self-definitional steps appear. The derivation remains self-contained against the stated prior-likelihood pair and the explicit process definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard Bayesian model assumptions plus the existence of a variational family whose predictive can be sampled and updated iteratively; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption A prior-likelihood pair exists such that the variational predictive is well-defined and the imputation process yields a limiting distribution.
    Invoked when stating conditions under which the law of the returned parameter is well defined.

pith-pipeline@v0.9.0 · 5527 in / 1202 out tokens · 45783 ms · 2026-05-14T20:47:33.395438+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

65 extracted references · 15 canonical work pages · 1 internal anchor

  1. [1]

    Uci machine learning repository, 2007

    Arthur Asuncion, David Newman, et al. Uci machine learning repository, 2007

  2. [2]

    On free energy barriers in gaussian priors and failure of cold start mcmc for high-dimensional unimodal distributions

    Afonso S Bandeira, Antoine Maillard, Richard Nickl, and Sven Wang. On free energy barriers in gaussian priors and failure of cold start mcmc for high-dimensional unimodal distributions. Philosophical Transactions of the Royal Society A, 381(2247):20220150, 2023

  3. [3]

    Bayesian predictive inference beyond martingales

    Marco Battiston and Lorenzo Cappello. Bayesian predictive inference beyond martingales. arXiv preprint arXiv:2507.21874, 2025

  4. [4]

    University College London, 2003

    Matthew James Beal.Variational algorithms for approximate Bayesian inference. University College London, 2003

  5. [5]

    Limit theorems for a class of identically distributed random variables.The Annals of Probability, 32(3):2029–2052, 2004

    Patrizia Berti, Luca Pratelli, and Pietro Rigo. Limit theorems for a class of identically distributed random variables.The Annals of Probability, 32(3):2029–2052, 2004

  6. [6]

    Springer, 2006

    Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning, volume 4. Springer, 2006

  7. [7]

    Variational inference: A review for statisticians.Journal of the American statistical Association, 112(518):859–877, 2017

    David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians.Journal of the American statistical Association, 112(518):859–877, 2017

  8. [8]

    JAX: composable transformations of Python+NumPy programs, 2018

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/jax-ml/jax

  9. [9]

    Vari- ational inference and model selection with generalized evidence bounds

    Liqun Chen, Chenyang Tao, Ruiyi Zhang, Ricardo Henao, and Lawrence Carin Duke. Vari- ational inference and model selection with generalized evidence bounds. InInternational conference on machine learning, pages 893–902. PMLR, 2018

  10. [10]

    La prévision: ses lois logiques, ses sources subjectives.Annales de l’institut Henri Poincaré, 7(1):1–68, 1937

    Bruno De Finetti. La prévision: ses lois logiques, ses sources subjectives.Annales de l’institut Henri Poincaré, 7(1):1–68, 1937

  11. [11]

    The DeepMind JAX Ecosystem, 2020

    DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Lena ...

  12. [12]

    Robust bayesian inference for simulator-based models via the mmd posterior bootstrap

    Charita Dellaporta, Jeremias Knoblauch, Theodoros Damoulas, and François-Xavier Briol. Robust bayesian inference for simulator-based models via the mmd posterior bootstrap. In International Conference on Artificial Intelligence and Statistics, pages 943–970. PMLR, 2022

  13. [13]

    Importance weighting and variational inference.Advances in neural information processing systems, 31, 2018

    Justin Domke and Daniel R Sheldon. Importance weighting and variational inference.Advances in neural information processing systems, 31, 2018

  14. [14]

    Application of the theory of martingales.Le calcul des probabilites et ses applications, pages 23–27, 1949

    Joseph L Doob. Application of the theory of martingales.Le calcul des probabilites et ses applications, pages 23–27, 1949

  15. [15]

    Bootstrap methods: another look at the jackknife

    Bradley Efron. Bootstrap methods: another look at the jackknife. InBreakthroughs in statistics: Methodology and distribution, pages 569–593. Springer, 1992

  16. [16]

    Is in-context learning in large language models bayesian? a martingale perspective.arXiv preprint arXiv:2406.00793, 2024

    Fabian Falck, Ziyu Wang, and Chris Holmes. Is in-context learning in large language models bayesian? a martingale perspective.arXiv preprint arXiv:2406.00793, 2024

  17. [17]

    Asymptotics for parametric martingale posteriors, October 2024

    Edwin Fong and Andrew Yiu. Asymptotics for parametric martingale posteriors, October 2024. URLhttp://arxiv.org/abs/2410.17692. arXiv:2410.17692

  18. [18]

    Bayesian Quantile Estimation and Regression with Martingale Posteriors, June 2024

    Edwin Fong and Andrew Yiu. Bayesian Quantile Estimation and Regression with Martingale Posteriors, June 2024. URLhttp://arxiv.org/abs/2406.03358. arXiv:2406.03358

  19. [19]

    Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap

    Edwin Fong, Simon Lyddon, and Chris Holmes. Scalable nonparametric sampling from multimodal posteriors with the posterior bootstrap. InInternational Conference on Machine Learning, pages 1952–1962. PMLR, 2019

  20. [20]

    Martingale posterior distributions.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(5):1357–1391, 2023

    Edwin Fong, Chris Holmes, and Stephen G Walker. Martingale posterior distributions.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(5):1357–1391, 2023

  21. [21]

    Prediction-based uncertainty quantification for exchangeable sequences.Philosophical Transactions of the Royal Society A, 381(2247):20220142, 2023

    Sandra Fortini and Sonia Petrone. Prediction-based uncertainty quantification for exchangeable sequences.Philosophical Transactions of the Royal Society A, 381(2247):20220142, 2023

  22. [22]

    Exchangeability, prediction and predictive modeling in bayesian statistics.Statistical Science, 40(1):40–67, 2025

    Sandra Fortini and Sonia Petrone. Exchangeability, prediction and predictive modeling in bayesian statistics.Statistical Science, 40(1):40–67, 2025

  23. [23]

    Loss-based variational bayes prediction.Journal of Computational and Graphical Statistics, 34(1):84–95, 2025

    David T Frazier, Ruben Loaiza-Maya, Gael M Martin, and Bonsoo Koo. Loss-based variational bayes prediction.Journal of Computational and Graphical Statistics, 34(1):84–95, 2025

  24. [24]

    Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association, 85(410):398–409, 1990

    Alan E Gelfand and Adrian FM Smith. Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association, 85(410):398–409, 1990

  25. [25]

    Cambridge university press, 2007

    Andrew Gelman and Jennifer Hill.Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2007

  26. [26]

    A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

    Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

  27. [27]

    Flax: A neural network library and ecosystem for JAX, 2024

    Jonathan Heek, Anselm Levskaya, Avital Oliver, Marvin Ritter, Bertrand Rondepierre, Andreas Steiner, and Marc van Zee. Flax: A neural network library and ecosystem for JAX, 2024. URL http://github.com/google/flax. Version 0.12.3

  28. [28]

    beta-vae: Learning basic visual concepts with a constrained variational framework

    Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. InInternational conference on learning representations, 2017

  29. [29]

    Stochastic structured variational inference

    Matthew Hoffman and David Blei. Stochastic structured variational inference. InArtificial Intelligence and Statistics, pages 361–369. PMLR, 2015. 11

  30. [30]

    Stochastic variational inference.the Journal of machine Learning research, 14(1):1303–1347, 2013

    Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. Stochastic variational inference.the Journal of machine Learning research, 14(1):1303–1347, 2013

  31. [31]

    The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo.J

    Matthew D Hoffman, Andrew Gelman, et al. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo.J. Mach. Learn. Res., 15(1):1593–1623, 2014

  32. [32]

    Holmes and Stephen G

    Chris C. Holmes and Stephen G. Walker. Statistical inference with exchangeability and mar- tingales.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381(2247):20220143, March 2023. doi: 10.1098/rsta.2022.0143. URL https://royalsocietypublishing.org/doi/full/10.1098/rsta.2022.0143. Pub- lisher: Royal Society

  33. [33]

    Bayesian parameter estimation via variational methods

    Tommi S Jaakkola and Michael I Jordan. Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1):25–37, 2000

  34. [34]

    No free lunch for approximate mcmc

    James E Johndrow, Natesh S Pillai, and Aaron Smith. No free lunch for approximate mcmc. arXiv preprint arXiv:2010.12514, 2020

  35. [35]

    Bayesian online natural gradient (bong).Advances in Neural Information Processing Systems, 37:131104–131153, 2024

    Matt Jones, Peter Chang, and Kevin Murphy. Bayesian online natural gradient (bong).Advances in Neural Information Processing Systems, 37:131104–131153, 2024

  36. [36]

    An in- troduction to variational methods for graphical models.Machine learning, 37(2):183–233, 1999

    Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. An in- troduction to variational methods for graphical models.Machine learning, 37(2):183–233, 1999

  37. [37]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  38. [38]

    Logistic Variational Bayes Revisited, June 2024

    Michael Komodromos, Marina Evangelou, and Sarah Filippi. Logistic Variational Bayes Revisited, June 2024. URL http://arxiv.org/abs/2406.00713. arXiv:2406.00713 [stat]

  39. [39]

    General Bayesian updating and the loss-likelihood bootstrap.Biometrika, 106(2):465–478, June 2019

    S P Lyddon, C C Holmes, and S G Walker. General Bayesian updating and the loss-likelihood bootstrap.Biometrika, 106(2):465–478, June 2019. ISSN 0006-3444. doi: 10.1093/biomet/ asz006. URLhttps://doi.org/10.1093/biomet/asz006

  40. [40]

    Nonparametric learning from bayesian models with randomized objective functions.Advances in neural information processing systems, 31, 2018

    Simon Lyddon, Stephen Walker, and Chris C Holmes. Nonparametric learning from bayesian models with randomized objective functions.Advances in neural information processing systems, 31, 2018

  41. [41]

    Cambridge university press, 2003

    David JC MacKay.Information theory, inference and learning algorithms. Cambridge university press, 2003

  42. [42]

    Divergence measures and message passing

    Thomas Minka. Divergence measures and message passing. Technical Report MSR-TR-2005- 173, Microsoft Research, 2005. URL https://www.microsoft.com/en-us/research/ publication/divergence-measures-and-message-passing/

  43. [43]

    MIT press, 2023

    Kevin P Murphy.Probabilistic machine learning: Advanced topics. MIT press, 2023

  44. [44]

    Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors

    Thomas Nagler and David Rügamer. Uncertainty Quantification for Prior-Fitted Networks using Martingale Posteriors. InICLR 2025 Workshop on Frontiers in Probabilistic Inference, March

  45. [45]

    URLhttps://openreview.net/forum?id=iGHWtpVolr

  46. [46]

    Newton and Adrian E

    Michael A. Newton and Adrian E. Raftery. Approximate Bayesian Inference with the Weighted Likelihood Bootstrap.Journal of the Royal Statistical Society: Series B (Methodological), 56 (1):3–26, January 1994. ISSN 0035-9246. doi: 10.1111/j.2517-6161.1994.tb01956.x. URL https://doi.org/10.1111/j.2517-6161.1994.tb01956.x

  47. [47]

    Tabmgp: Martingale posterior with tabpfn.arXiv preprint arXiv:2510.25154, 2025

    Kenyon Ng, Edwin Fong, David T Frazier, Jeremias Knoblauch, and Susan Wei. Tabmgp: Martingale posterior with tabpfn.arXiv preprint arXiv:2510.25154, 2025

  48. [48]

    Gaussian variational approximation with a factor covariance structure.Journal of Computational and Graphical Statistics, 27(3): 465–478, 2018

    Victor M-H Ong, David J Nott, and Michael S Smith. Gaussian variational approximation with a factor covariance structure.Journal of Computational and Graphical Statistics, 27(3): 465–478, 2018. 12

  49. [49]

    Composable effects for flexible and acceler- ated probabilistic programming in numpyro.arXiv preprint arXiv:1912.11554, 2019

    Du Phan, Neeraj Pradhan, and Martin Jankowiak. Composable effects for flexible and acceler- ated probabilistic programming in numpyro.arXiv preprint arXiv:1912.11554, 2019

  50. [50]

    On the fundamental limitations of multi-proposal markov chain monte carlo algorithms.Biometrika, 112(2):asaf019, 2025

    Francesco Pozza and Giacomo Zanella. On the fundamental limitations of multi-proposal markov chain monte carlo algorithms.Biometrika, 112(2):asaf019, 2025

  51. [51]

    Variational bayes for high-dimensional linear regression with sparse priors.Journal of the American Statistical Association, 117(539):1270–1281, 2022

    Kolyan Ray and Botond Szabó. Variational bayes for high-dimensional linear regression with sparse priors.Journal of the American Statistical Association, 117(539):1270–1281, 2022

  52. [52]

    Variational inference with normalizing flows

    Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538. PMLR, 2015

  53. [53]

    The bayesian bootstrap.The annals of statistics, pages 130–134, 1981

    Donald B Rubin. The bayesian bootstrap.The annals of statistics, pages 130–134, 1981

  54. [54]

    D. M. Titterington and Bo Wang. Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model.Bayesian Analysis, 1(3):625–650, 2006

  55. [55]

    Cambridge University Press, 2011

    Richard Eric Turner and Maneesh Sahani.Two problems with variational expectation maximi- sation for time series models, page 104–124. Cambridge University Press, 2011

  56. [56]

    Cambridge university press, 2000

    Aad W van der Vaart.Asymptotic statistics, volume 3. Cambridge university press, 2000

  57. [57]

    Weak convergence

    Aad W van der Vaart and Jon A Wellner. Weak convergence. InWeak convergence and empirical processes: with applications to statistics, pages 16–28. Springer, 1996

  58. [58]

    Openml: networked science in machine learning.ACM SIGKDD Explorations Newsletter, 15(2):49–60, 2014

    Joaquin Vanschoren, Jan N Van Rijn, Bernd Bischl, and Luis Torgo. Openml: networked science in machine learning.ACM SIGKDD Explorations Newsletter, 15(2):49–60, 2014

  59. [59]

    Inadequacy of interval estimates corresponding to variational bayesian approximations

    Bo Wang and D Michael Titterington. Inadequacy of interval estimates corresponding to variational bayesian approximations. InInternational workshop on artificial intelligence and statistics, pages 373–380. PMLR, 2005

  60. [60]

    Yixin Wang and David M. Blei. Frequentist Consistency of Variational Bayes.Journal of the American Statistical Association, 114(527):1147–1161, July 2019. ISSN 0162-1459, 1537- 274X. doi: 10.1080/01621459.2018.1473776. URL http://arxiv.org/abs/1705.03439. arXiv:1705.03439 [stat]

  61. [61]

    On Uncertainty Quantification for Near-Bayes Optimal Al- gorithms, October 2024

    Ziyu Wang and Chris Holmes. On Uncertainty Quantification for Near-Bayes Optimal Al- gorithms, October 2024. URL http://arxiv.org/abs/2403.19381. arXiv:2403.19381 [stat]

  62. [62]

    Williamson

    Luhuan Wu and Sinead A. Williamson. Posterior Uncertainty Quantification in Neural Networks using Data Augmentation. InProceedings of The 27th International Confer- ence on Artificial Intelligence and Statistics, pages 3376–3384. PMLR, April 2024. URL https://proceedings.mlr.press/v238/wu24e.html. ISSN: 2640-3498

  63. [63]

    Moment martingale posteriors for semiparametric predictive bayes.arXiv preprint arXiv:2507.18148, 2025

    Yiu Yin Yung, Stephen Lee, and Edwin Fong. Moment martingale posteriors for semiparametric predictive bayes.arXiv preprint arXiv:2507.18148, 2025

  64. [64]

    Convergence rates of variational posterior distributions.The Annals of Statistics, 48(4):2180–2207, 2020

    Fengshuo Zhang and Chao Gao. Convergence rates of variational posterior distributions.The Annals of Statistics, 48(4):2180–2207, 2020. 13 A Proofs of theoretical contributions This section contains proofs and regularity conditions for all of the results presented in Section 3. A.1 Proof of Theorem 3.1 Proof.Firstly, write the total-variation distance (TVD...

  65. [65]

    Table S1 summarises their characteristics

    (german, telescope) and OpenML [57] (mozilla4, phoneme, skin). Table S1 summarises their characteristics. For each dataset and each of 100 random train/test splits, we draw a training set of size ntr = 100 uniformly at random, and use the remaining samples as a held-out test set. All continuous features are standardised to zero mean and unit variance usin...