Piecewise Deterministic Markov Processes for Bayesian Neural Networks
Pith reviewed 2026-05-24 09:54 UTC · model grok-4.3
The pith
A generic adaptive thinning scheme makes PDMP samplers practical for Bayesian neural networks by efficiently sampling their model-specific inhomogeneous Poisson processes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a generic and adaptive thinning scheme for sampling from the inhomogeneous Poisson processes that arise in PDMP samplers for BNNs. This scheme accelerates the application of PDMPs for inference in BNNs, making the methods computationally feasible and yielding improvements in predictive accuracy, MCMC mixing performance, and uncertainty measurements compared to other approximate inference schemes.
What carries the argument
The generic and adaptive thinning scheme for sampling from model-specific inhomogeneous Poisson processes in PDMPs for BNNs.
If this is right
- PDMP-based inference on BNNs becomes computationally feasible at scales where standard MCMC cannot subsample the likelihood.
- Predictive accuracy improves relative to variational inference and other approximate schemes.
- MCMC mixing performance improves over competing methods.
- Uncertainty measurements become more informative than those from methods that impose posterior independence assumptions.
Where Pith is reading between the lines
- The thinning construction may apply directly to PDMP formulations on other models that induce similar inhomogeneous Poisson processes.
- Parallel or distributed implementations of the thinned process could be tested to measure further wall-clock reductions.
- The method could be benchmarked on very large networks to check whether the per-iteration cost remains sublinear in dataset size.
Load-bearing premise
The adaptive thinning scheme can reliably sample from the inhomogeneous Poisson processes without introducing bias or overhead that cancels the subsampling benefit.
What would settle it
Apply the PDMP sampler with the new thinning to a small BNN whose posterior can be computed exactly by enumeration or quadrature, then test whether the generated samples match the true posterior in total variation or in low-order moments.
Figures
read the original abstract
Inference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subsampling, though introduce a model specific inhomogenous Poisson Process (IPPs) which is difficult to sample from. This work introduces a new generic and adaptive thinning scheme for sampling from these IPPs, and demonstrates how this approach can accelerate the application of PDMPs for inference in BNNs. Experimentation illustrates how inference with these methods is computationally feasible, can improve predictive accuracy, MCMC mixing performance, and provide informative uncertainty measurements when compared against other approximate inference schemes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a new generic and adaptive thinning scheme enables efficient sampling from the model-specific inhomogeneous Poisson processes (IPPs) that arise when applying Piecewise Deterministic Markov Process (PDMP) samplers to Bayesian Neural Networks (BNNs). This makes PDMP-based inference computationally feasible for BNNs (via subsampling), and experiments demonstrate gains in predictive accuracy, MCMC mixing performance, and uncertainty quantification relative to other approximate inference methods such as variational inference.
Significance. If the adaptive thinning scheme is unbiased and the overhead remains low enough to preserve the subsampling advantage, the work would supply a practical route to exact (non-variational) posterior sampling for BNNs that avoids independence assumptions while retaining scalability; this would be a notable addition to the set of MCMC methods usable on modern neural-network models.
major comments (2)
- [Section describing the adaptive thinning algorithm (likely §3 or §4)] The central claim that the adaptive thinning scheme produces exact (unbiased) samples from the BNN-specific IPPs is load-bearing for all reported performance gains. No verification on a low-dimensional proxy (e.g., logistic regression) where the IPP can be sampled exactly is described; without such a check it remains possible that the reported improvements in mixing or accuracy arise from an altered stationary distribution rather than correct PDMP dynamics.
- [Experimental section (likely §5)] The experiments compare predictive accuracy and mixing against other approximate schemes, but do not report diagnostics confirming that the PDMP chain with the new thinning rule has the correct invariant distribution (e.g., via total-variation distance to a gold-standard sampler on a small BNN or via the expected value of a known test function).
minor comments (2)
- [Abstract and experimental results] Clarify whether the reported 'MCMC mixing performance' refers to standard autocorrelation times or to a PDMP-specific metric such as the number of velocity flips per unit time.
- [Method sections] Notation for the dominating intensity and adaptation rule should be introduced once and used consistently; several symbols appear to be redefined between the general PDMP background and the BNN-specific application.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important aspects of validating the correctness of the proposed adaptive thinning scheme. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Section describing the adaptive thinning algorithm (likely §3 or §4)] The central claim that the adaptive thinning scheme produces exact (unbiased) samples from the BNN-specific IPPs is load-bearing for all reported performance gains. No verification on a low-dimensional proxy (e.g., logistic regression) where the IPP can be sampled exactly is described; without such a check it remains possible that the reported improvements in mixing or accuracy arise from an altered stationary distribution rather than correct PDMP dynamics.
Authors: The adaptive thinning procedure is constructed to be unbiased by extending the standard thinning algorithm to the model-specific intensity function, with the acceptance probability derived to match the target IPP exactly (see the derivation in Section 3). Nevertheless, we agree that an explicit numerical verification on a low-dimensional proxy such as logistic regression would strengthen the manuscript. In the revision we will add such a check, comparing the empirical distribution of event times obtained via the adaptive scheme against an exact sampler (e.g., via numerical quadrature of the intensity) and confirming that the resulting PDMP chain preserves the known invariant distribution. revision: yes
-
Referee: [Experimental section (likely §5)] The experiments compare predictive accuracy and mixing against other approximate schemes, but do not report diagnostics confirming that the PDMP chain with the new thinning rule has the correct invariant distribution (e.g., via total-variation distance to a gold-standard sampler on a small BNN or via the expected value of a known test function).
Authors: We acknowledge that direct confirmation of the invariant distribution on even modestly sized BNNs is computationally demanding. We will augment the experimental section with additional diagnostics, including the long-run average of simple test functions (e.g., posterior mean of selected weights) on a small fully-connected network where a gold-standard HMC run is feasible, as well as standard MCMC convergence metrics. These additions will be included in the revised manuscript. revision: yes
Circularity Check
No circularity: new adaptive thinning scheme presented as independent contribution with experimental validation
full rationale
The paper introduces a novel generic adaptive thinning scheme for sampling model-specific IPPs arising in PDMPs applied to BNNs. The abstract and description frame this as a new algorithmic contribution whose unbiasedness and efficiency are asserted as properties of the proposed method, then validated through experimentation on predictive accuracy, mixing, and uncertainty. No equations or steps reduce a claimed prediction or result to a fitted parameter or self-citation by construction. No self-definitional, fitted-input-called-prediction, or load-bearing self-citation patterns appear. The derivation chain is self-contained against external benchmarks (comparisons to other inference schemes) and does not rely on renaming known results or smuggling ansatzes via prior self-work.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
new generic and adaptive thinning scheme for sampling from these IPPs
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
piecewise deterministic dynamics... event rate... transition kernel
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adaptive schemes for piecewise deterministic monte carlo algorithms
Andrea Bertazzi and Joris Bierkens. Adaptive schemes for piecewise deterministic monte carlo algorithms. arXiv preprint arXiv:2012.13924,
-
[2]
Joris Bierkens, Sebastiano Grazzi, Kengo Kamatani, and Gareth Roberts. The boomerang sampler. arXiv preprint arXiv:2006.13777,
-
[3]
Joshua V Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, and Rif A Saurous. Ten- sorflow distributions. arXiv preprint arXiv:1711.10604, abs/1711.10604,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Geometric ergodicity of the bouncy particle sampler
ALAIN DURMUS, ARNAUD GUILLIN, and PIERRE MONMARCHÉ. Geometric ergodicity of the bouncy particle sampler. The Annals of Applied Probability, 30 (5):2069–2098,
work page 2069
-
[5]
Bayesian infer- ence for large scale image classification
Jonathan Heek and Nal Kalchbrenner. Bayesian infer- ence for large scale image classification. arXiv preprint arXiv:1908.03491,
-
[6]
What are bayesian neu- ral network posteriors really like? arXiv preprint arXiv:2104.14421,
Pavel Izmailov, Sharad Vikram, Matthew D Hoffman, and Andrew Gordon Wilson. What are bayesian neu- ral network posteriors really like? arXiv preprint arXiv:2104.14421,
-
[7]
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Chunyuan Li, Changyou Chen, David Carlson, and Lawrence Carin. Preconditioned stochastic gradient langevin dynamics for deep neural networks. arXiv preprint arXiv:1512.07666,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Maddox, Pavel Izmailov, Timur Garipov, Dmitry P
Wesley J. Maddox, Pavel Izmailov, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. A simple baseline for bayesian uncertainty in deep learning. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Gar- nett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neura...
work page 2019
-
[9]
neurips.cc/paper/2019/hash/ 118921efba23fc329e6560b27861f0c2-Abstract
URL https://proceedings. neurips.cc/paper/2019/hash/ 118921efba23fc329e6560b27861f0c2-Abstract. html. Stephan Mandt, Matthew D Hoffman, and David M Blei. Stochastic gradient descent as approximate bayesian in- ference. Journal of Machine Learning Research, 18:1–35,
work page 2019
-
[10]
The True Cost of Stochastic Gradient Langevin Dynamics
Tigran Nagapetyan, Andrew B Duncan, Leonard Hasen- clever, Sebastian J V ollmer, Lukasz Szpruch, and Kon- stantinos Zygalakis. The true cost of stochastic gradient langevin dynamics. arXiv preprint arXiv:1706.02692 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Reading digits in natural images with unsupervised feature learning
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bis- sacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised fea- ture learning, volume 2011, page 5,
work page 2011
-
[12]
Piecewise-Deterministic Markov Chain Monte Carlo
Paul Vanetti, Alexandre Bouchard-Côté, George Deligianni- dis, and Arnaud Doucet. Piecewise-deterministic markov chain monte carlo. arXiv preprint arXiv:1707.05296 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Generalized Bouncy Particle Sampler
Changye Wu and Christian P Robert. Generalized bouncy particle sampler. arXiv preprint arXiv:1706.04781, art. arXiv:1706.04781, Jun
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion- mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Piecewise Deterministic Markov Processes for Bayesian Neural Networks
with Corrigendum. arXiv:2302.08724v2 [stat.ML] 19 Oct 2023 Table 1: Summary of predictive performance with and timings as the scaling value ofαis increased for the PDMP samplers demonstrated within. All models are fit to the MNIST dataset using the Lenet5 architecture. α Inference ACC NLL ECC Time α= 1.0 BPS 0.9896 0.0536 2.66 71 σBPS 0.9923 0.0227 0.4127...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
In a similar vein, we would state that any Bayesian neural network user would have a difficult time honestly saying their inference strategy has sufficiently explored the posterior, including the work proposed here. Previous research has investigated gold-standard MCMC methods for larger networks ?, though were unable to obtain a sufficient number of samp...
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.