arxiv: 2603.18640 · v3 · submitted 2026-03-19 · 📊 stat.ML · cs.LG· math.PR

Recognition: no theorem link

A Theoretical Comparison of No-U-Turn Sampler Variants: Necessary and Sufficient Convergence Conditions and Mixing Time Analysis under Gaussian Targets

Samuel Gruffaz , Kyurae Kim , Fares Guehtar , Hadrien Duval-decaix , Pac\^ome Trautmann

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:58 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.PR

keywords No-U-Turn SamplerNUTSgeometric ergodicitymixing timeMarkov chain Monte CarloGaussian target

0 comments

The pith

NUTS-mul and NUTS-BPS achieve O(d^{1/4}) mixing times on standard Gaussians with strictly smaller constants for NUTS-BPS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares the two main variants of the No-U-Turn Sampler, which differ in how they select the next state from a trajectory using either multinomial or biased progressive sampling. It derives necessary conditions for geometric ergodicity that apply to both variants and sufficient conditions that apply to the multinomial version. For the standard Gaussian target, when the sampler begins inside the typical set, the mixing time of each variant scales as O(d^{1/4}) up to logarithmic factors, yet the leading constants remain strictly smaller for the biased progressive version. These results supply the first mixing-time bound for the biased progressive sampler and the first sufficient ergodicity conditions for the multinomial sampler.

Core claim

We derive the first necessary conditions for geometric ergodicity for both NUTS-mul and NUTS-BPS. We establish the first sufficient conditions for geometric ergodicity and ergodicity for NUTS-mul. We obtain the first mixing time result for NUTS-BPS on a standard Gaussian distribution. Our results show that NUTS-mul and NUTS-BPS exhibit nearly identical qualitative behavior, with geometric ergodicity depending on the tail properties of the target distribution. When initialized in the typical set of the canonical Gaussian measure, the mixing times of both variants scale as O(d^{1/4}) up to logarithmic factors, with strictly smaller associated constants for NUTS-BPS.

What carries the argument

The index-selection rule inside each NUTS trajectory, comparing multinomial sampling against biased progressive sampling, which governs how the next state is drawn from the set of points generated by the leapfrog integrator.

If this is right

Geometric ergodicity of both variants requires the target to possess sufficiently light tails.
Sufficient conditions for geometric ergodicity are now available for the multinomial variant.
Both variants deliver mixing times of order O(d^{1/4}) on the standard Gaussian when started in the typical set.
The leading constants in these mixing-time bounds are strictly smaller for the biased progressive variant than for the multinomial variant.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The quantitative edge of the biased progressive variant may favor its use in high-dimensional problems where the number of gradient evaluations per effective sample is the dominant cost.
Because the qualitative convergence behavior is nearly the same, software implementations can select between the two variants primarily on the basis of coding simplicity or numerical stability rather than fundamental ergodicity differences.
Analogous mixing-time statements may hold for targets whose tails match those of the Gaussian, such as certain elliptical distributions.

Load-bearing premise

The target must satisfy specific tail properties or be exactly the standard Gaussian, and the chain must begin inside the typical set, for the mixing-time bounds to apply.

What would settle it

A direct simulation of either variant on a d-dimensional standard Gaussian, started from the typical set, that produces mixing time growing faster than C d^{1/4} log d for any fixed C would falsify the claimed scaling.

Figures

Figures reproduced from arXiv: 2603.18640 by Fares Guehtar, Hadrien Duval-decaix, Kyurae Kim, Pac\^ome Trautmann, Samuel Gruffaz.

**Figure 1.** Figure 1: Scheme of a dynamic HMC algorithm Definition 1 (From [19]). We define the dynamic HMC scheme associated to an orbit selection kernel Ph and index selection kernel Qh as the Markov chain (Qk)k∈N defined by the following steps that define Qk+1 given Qk: (1) Sample Pk+1 with distribution N(0,Id). (2) Sample Ik+1 with distribution Ph(· | Qk, Pk+1). (3) Sample Jk+1 with distribution Qh(· | Ik+1, Qk, Pk+1). (4) … view at source ↗

**Figure 2.** Figure 2: Scheme of the construction of the index set sampled with [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Construction of probabilities q BPS h in the example of sampling q0, p0 and Φ ◦(3) h (q0, p0) with q BPS h (·|I − 3, Φ ◦(3) h (q0, p0)). Intuitive understanding and comparison of qMUL h , q BPS h . We aknowledge that (4) is not self-explanatory, thus we consider the ideal case where H(Φ◦(T) h (q0, p0)) = H(q0, p0) for any T ∈ Z, i.e. there is no integration error. In this ideal case, as already explained i… view at source ↗

**Figure 4.** Figure 4: Lebesgue density related to the limit time distribution [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: For an index set I = [−1 : 6], the multinouilli distribution Multinouilli((qi)i∈I) with normalized weights P i∈I qi = 1 is split into its maximal uniform part on I last, i.e., |I last| mini∈I last qiU(I last) (shown in blue) and the remaning part (1 − |I last| mini∈I last qi)multinouilli((ai)i∈I) with ai = qi − 1I last (i) mini∈I last qi . This principle is applied to qi = q BPS h (i|I, q, p)/ P j∈I q BPS … view at source ↗

**Figure 6.** Figure 6: Graphs of the functions involved in (18). h : t ∈ [−π, π] → π − |t| π2 g : t ∈ [−π, π] → cot(t) π − |t| π2 t h t g g [PITH_FULL_IMAGE:figures/full_fig_p034_6.png] view at source ↗

**Figure 7.** Figure 7: Graphs of the functions involved in (17). A.6 Proof of Theorem 11 The sketch of the proof is already provided after Theorem 9. The remaining task is to determine how to choose the constants so that (14) and (31) are satisfied, which is given by Lemma 7 and completed by Lemma 2 and Lemma 1. B Proof of Section 5: Necessary conditions for geometric ergodicity. B.1 Mistake in [41, Corollary 2.3] [41, Corollary… view at source ↗

read the original abstract

The No-U-Turn Sampler (NUTS) is the computational workhorse of modern Bayesian software libraries, yet its qualitative and quantitative convergence guarantees were established only recently. A significant gap remains in the theoretical comparison of its two main variants: NUTS-mul and NUTS-BPS, which use multinomial sampling and biased progressive sampling, respectively, for index selection. In this paper, we address this gap in three contributions. First, we derive the first necessary conditions for geometric ergodicity for both variants. Second, we establish the first sufficient conditions for geometric ergodicity and ergodicity for NUTS-mul. Third, we obtain the first mixing time result for NUTS-BPS on a standard Gaussian distribution. Our results show that NUTS-mul and NUTS-BPS exhibit nearly identical qualitative behavior, with geometric ergodicity depending on the tail properties of the target distribution. However, they differ quantitatively in their convergence rates. More precisely, when initialized in the typical set of the canonical Gaussian measure, the mixing times of both NUTS-mul and NUTS-BPS scale as $O(d^{1/4})$ up to logarithmic factors, where $d$ denotes the dimension. Nevertheless, the associated constants are strictly smaller for NUTS-BPS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first O(d^{1/4}) mixing-time bounds for NUTS-BPS on Gaussians and necessary ergodicity conditions for both variants, but only from the typical set.

read the letter

The main things to know are that this work supplies the first necessary conditions for geometric ergodicity in both NUTS-mul and NUTS-BPS, the first sufficient conditions for NUTS-mul, and the first mixing-time result for NUTS-BPS. On a standard Gaussian, both variants mix in O(d^{1/4}) steps up to logs once started in the typical set, with modestly smaller constants for the BPS version. That is the concrete quantitative comparison the abstract promises, and it lines up with the positioning against prior NUTS theory. The qualitative picture is also clear: ergodicity for either variant hinges on the target's tail decay, and the two samplers track each other closely in that respect. The derivations stay within standard tail assumptions and avoid self-referential quantities, so the logic does not look circular on the surface. The comparison of constants is the part that could actually influence implementation choices in high dimensions. The soft spot is exactly the one flagged in the stress-test note. The mixing-time claim is conditioned on initialization inside the typical set where the norm is already order sqrt(d). Nothing in the abstract or claims addresses how long it takes to reach that set from an arbitrary starting point, and if the tree-building or acceptance steps make that burn-in phase scale worse, the headline rate does not govern practical wall-clock performance. That is a real limitation for anyone who would use the bounds to set run lengths. The paper is aimed at MCMC theorists and people who maintain or extend NUTS implementations in Bayesian software. A reader who cares about explicit rates rather than just qualitative ergodicity will find the comparison useful. It is worth sending to peer review because the claims are specific, the gap they target is real, and the results are stated in a form that referees can check directly against the proofs.

Referee Report

1 major / 1 minor

Summary. The paper derives the first necessary conditions for geometric ergodicity of both NUTS-mul and NUTS-BPS, the first sufficient conditions for geometric ergodicity and ergodicity of NUTS-mul, and the first mixing-time bounds for NUTS-BPS (and a comparison for NUTS-mul) on standard Gaussian targets. It shows that both variants exhibit similar qualitative behavior governed by tail properties of the target, but that, when initialized in the typical set, both achieve mixing times of order O(d^{1/4}) up to logarithmic factors, with strictly smaller constants for NUTS-BPS.

Significance. If the derivations hold, the work supplies the first quantitative mixing-time results for NUTS-BPS together with a direct comparison of the two index-selection mechanisms that are used in all modern NUTS implementations. This fills a notable gap in the theoretical analysis of the most widely deployed MCMC algorithm in Bayesian software.

major comments (1)

[Abstract and mixing-time analysis] The O(d^{1/4}) mixing-time claim (Abstract and the mixing-time section) is explicitly conditioned on initialization inside the typical set of the canonical Gaussian. No quantitative bound is supplied on the time required to reach this set from an arbitrary starting point; if the tree-construction or acceptance mechanism causes this burn-in phase to scale worse than d^{1/4}, the headline rate does not govern practical performance from cold starts.

minor comments (1)

Notation for the two variants (NUTS-mul vs. NUTS-BPS) and for the typical-set radius should be introduced once and used consistently; occasional switches between “multinomial” and “mul” slow reading.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our contributions and for highlighting this important point on the scope of the mixing-time results. We address the comment below.

read point-by-point responses

Referee: The O(d^{1/4}) mixing-time claim (Abstract and the mixing-time section) is explicitly conditioned on initialization inside the typical set of the canonical Gaussian. No quantitative bound is supplied on the time required to reach this set from an arbitrary starting point; if the tree-construction or acceptance mechanism causes this burn-in phase to scale worse than d^{1/4}, the headline rate does not govern practical performance from cold starts.

Authors: We agree that the O(d^{1/4}) mixing-time bounds (up to log factors) are derived under the explicit assumption of initialization inside the typical set, as stated in the abstract and the mixing-time section. This conditioning is standard in high-dimensional MCMC theory because the typical set concentrates the target measure (probability approaching 1 as d grows) and represents the regime in which the sampler operates after a short transient. For the standard Gaussian, the probability of lying outside the typical set decays exponentially in d, so the time to enter it is expected to be negligible compared to the mixing time; however, we do not supply a rigorous quantitative bound on this burn-in phase. We will revise the manuscript to add a clarifying paragraph in the introduction and conclusion that (i) reiterates the conditioning, (ii) explains why the typical-set regime is the relevant one for the headline rate, and (iii) flags a complete cold-start analysis as an interesting direction for future work. This constitutes a partial revision: we strengthen the exposition without claiming a new bound that is absent from the current derivations. revision: partial

Circularity Check

0 steps flagged

No circularity; mixing-time bounds derived from explicit tail assumptions and typical-set initialization

full rationale

The paper's derivation chain establishes necessary conditions for geometric ergodicity from tail properties of the target, sufficient conditions for NUTS-mul, and quantitative mixing-time bounds O(d^{1/4}) for both variants on the standard Gaussian, all conditioned on initialization in the typical set. These steps rely on standard Markov-chain drift and minorization arguments applied to the NUTS transition kernel; no quantity is defined in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing premise reduces to a self-citation. The explicit conditioning on the typical set is stated up front rather than smuggled in, so the claimed scaling is a genuine consequence of the stated assumptions rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard domain assumptions from MCMC theory about the regularity and tail behavior of the target distribution, particularly for the Gaussian case.

axioms (1)

domain assumption Target distribution has log-density satisfying regularity and tail decay conditions sufficient for geometric ergodicity
Invoked to derive the necessary and sufficient conditions stated in the abstract.

pith-pipeline@v0.9.0 · 5556 in / 1244 out tokens · 40503 ms · 2026-05-15T08:58:26.182696+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

[1]

Hamiltonian monte carlo for efficient gaussian sampling: long and random steps.Journal of Machine Learning Research, 25(348):1– 30, 2024

Simon Apers, Sander Gribling, and Dániel Szilágyi. Hamiltonian monte carlo for efficient gaussian sampling: long and random steps.Journal of Machine Learning Research, 25(348):1– 30, 2024

work page 2024
[2]

Adaptive stereographic mcmc

Cameron Bell, Krzystof Łatuszyński, and Gareth O Roberts. Adaptive stereographic mcmc. arXiv preprint arXiv:2408.11780, 2024

work page arXiv 2024
[3]

A Conceptual Introduction to Hamiltonian Monte Carlo

Michael Betancourt. A conceptual introduction to Hamiltonian Monte Carlo.arXiv preprint arXiv:1701.02434, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

Eli Bingham, Jonathan P Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theo- fanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D Goodman. Pyro: Deep universal probabilistic programming.Journal of machine learning research, 20(28):1–6, 2019

work page 2019
[5]

The within-orbit adaptive leapfrog no-u-turn sampler.arXiv preprint arXiv:2506.18746, 2025

Nawaf Bou-Rabee, Bob Carpenter, Tore Selland Kleppe, and Sifan Liu. The within-orbit adaptive leapfrog no-u-turn sampler.arXiv preprint arXiv:2506.18746, 2025

work page arXiv 2025
[6]

Incorporating local step-size adaptivity into the no-u-turn sampler using gibbs self-tuning.The Journal of Chemical Physics, 163(8), 2025

Nawaf Bou-Rabee, Bob Carpenter, Tore Selland Kleppe, and Milo Marsden. Incorporating local step-size adaptivity into the no-u-turn sampler using gibbs self-tuning.The Journal of Chemical Physics, 163(8), 2025

work page 2025
[7]

Gist: Gibbs self-tuning for locally adaptive hamiltonian monte carlo.arXiv preprint arXiv:2404.15253, 2024

Nawaf Bou-Rabee, Bob Carpenter, and Milo Marsden. Gist: Gibbs self-tuning for locally adaptive hamiltonian monte carlo.arXiv preprint arXiv:2404.15253, 2024

work page arXiv 2024
[8]

Mixing time guarantees for unadjusted hamiltonian monte carlo.Bernoulli, 29(1):75–104, 2023

Nawaf Bou-Rabee and Andreas Eberle. Mixing time guarantees for unadjusted hamiltonian monte carlo.Bernoulli, 29(1):75–104, 2023

work page 2023
[9]

Coupling and convergence for hamiltonian monte carlo.The Annals of applied probability, 30(3):1209–1250, 2020

Nawaf Bou-Rabee, Andreas Eberle, and Raphael Zimmer. Coupling and convergence for hamiltonian monte carlo.The Annals of applied probability, 30(3):1209–1250, 2020. 24

work page 2020
[10]

Mixing of the no-u-turn sampler and the geometry of gaussian concentration.arXiv preprint arXiv:2410.06978, 2024

Nawaf Bou-Rabee and Stefan Oberdörster. Mixing of the no-u-turn sampler and the geometry of gaussian concentration.arXiv preprint arXiv:2410.06978, 2024

work page arXiv 2024
[11]

Randomized Hamiltonian Monte Carlo.The Annals of Applied Probability, 27(4):2159–2194, 2017

Nawaf Bou-Rabee and Jesús María Sanz-Serna. Randomized Hamiltonian Monte Carlo.The Annals of Applied Probability, 27(4):2159–2194, 2017

work page 2017
[12]

Geometric integrators and the Hamiltonian Monte Carlo method.Acta Numerica, 27:113–206, 2018

Nawaf Bou-Rabee and Jesús María Sanz-Serna. Geometric integrators and the Hamiltonian Monte Carlo method.Acta Numerica, 27:113–206, 2018

work page 2018
[13]

Stan: A probabilistic programming language.Journal of statistical software, 76(1), 2017

Bob Carpenter, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. Stan: A probabilistic programming language.Journal of statistical software, 76(1), 2017

work page 2017
[14]

Fast mixing of metropolized hamiltonian monte carlo: Benefits of multi-step gradients.Journal of Machine Learning Research, 21(92):1–72, 2020

Yuansi Chen, Raaz Dwivedi, Martin J Wainwright, and Bin Yu. Fast mixing of metropolized hamiltonian monte carlo: Benefits of multi-step gradients.Journal of Machine Learning Research, 21(92):1–72, 2020

work page 2020
[15]

Probabilistic magnetotelluric inversion with adaptive regularisation using the no-u-turns sam- pler.Pure and Applied Geophysics, 175(8):2881–2894, 2018

Dennis Conway, Janelle Simpson, Yohannes Didana, Joseph Rugari, and Graham Heinson. Probabilistic magnetotelluric inversion with adaptive regularisation using the no-u-turns sam- pler.Pure and Applied Geophysics, 175(8):2881–2894, 2018

work page 2018
[16]

PhD thesis, University of Deusto

Miguel Fernández de Retana Uribe.A Bayesian Approach to Modeling Tamoxifen Resistance in Breast Cancer Cells Through Adaptive Hamiltonian Monte Carlo Posterior Sampling. PhD thesis, University of Deusto

work page
[17]

Springer, 2018

Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier.Markov chains. Springer, 2018

work page 2018
[18]

Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid Monte Carlo.Physics letters B, 195(2):216–222, 1987

work page 1987
[19]

On the convergence of dynamic implementations of hamiltonian monte carlo and no u-turn samplers

Alain Durmus, Samuel Gruffaz, Miika Kailas, Eero Saksman, and Matti Vihola. On the convergence of dynamic implementations of hamiltonian monte carlo and no u-turn samplers. arXiv preprint arXiv:2307.03460, 2023

work page arXiv 2023
[20]

On the convergence of Hamiltonian Monte Carlo.The Annals of Statistics, April 2017

Alain Durmus, Eric Moulines, and Eero Saksman. On the convergence of Hamiltonian Monte Carlo.The Annals of Statistics, April 2017

work page 2017
[21]

Turing: a language for flexible probabilistic inference

Hong Ge, Kai Xu, and Zoubin Ghahramani. Turing: a language for flexible probabilistic inference. InInternational Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1682–1690, 2018

work page 2018
[22]

Riemann manifold langevin and Hamiltonian Monte Carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214, 2011

Mark Girolami and Ben Calderhead. Riemann manifold langevin and Hamiltonian Monte Carlo methods.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214, 2011

work page 2011
[23]

Riemannian metric learning: Closer to you than you imagine

Samuel Gruffaz and Josua Sassen. Riemannian metric learning: Closer to you than you imagine. 2025

work page 2025
[24]

Monte carlo sampling methods using markov chains and their applications

W Keith Hastings. Monte carlo sampling methods using markov chains and their applications. 1970

work page 1970
[25]

The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.J

Matthew D Hoffman and Andrew Gelman. The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.J. Mach. Learn. Res., 15(1):1593–1623, 2014. 25

work page 2014
[26]

Solving poisson’s equation for wasserstein contractive markov chains.arXiv preprint arXiv:2602.19119, 2026

Julian Hofstadler. Solving poisson’s equation for wasserstein contractive markov chains.arXiv preprint arXiv:2602.19119, 2026

work page arXiv 2026
[27]

Necessary conditions for geometric and polynomial ergodicity of random-walk-type.Bernoulli, 9(4):559–578, 2003

Søren F Jarner and Richard L Tweedie. Necessary conditions for geometric and polynomial ergodicity of random-walk-type.Bernoulli, 9(4):559–578, 2003

work page 2003
[28]

Lower bounds on metropolized sampling meth- ods for well-conditioned distributions.Advances in Neural Information Processing Systems, 34:18812–18824, 2021

Yin Tat Lee, Ruoqi Shen, and Kevin Tian. Lower bounds on metropolized sampling meth- ods for well-conditioned distributions.Advances in Neural Information Processing Systems, 34:18812–18824, 2021

work page 2021
[29]

Springer, 2001

Jun S Liu and Jun S Liu.Monte Carlo strategies in scientific computing, volume 75. Springer, 2001

work page 2001
[30]

The barker proposal: Combining robustness and efficiency in gradient-based mcmc.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):496–523, 2022

Samuel Livingstone and Giacomo Zanella. The barker proposal: Combining robustness and efficiency in gradient-based mcmc.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):496–523, 2022

work page 2022
[31]

Mixing of hamiltonian monte carlo on strongly log-concave distributions 2: Numerical integrators

Oren Mangoubi and Aaron Smith. Mixing of hamiltonian monte carlo on strongly log-concave distributions 2: Numerical integrators. InThe 22nd international conference on artificial intelligence and statistics, pages 586–595. PMLR, 2019

work page 2019
[32]

Mixing of hamiltonian monte carlo on strongly log-concave distributions: Continuous dynamics.The Annals of Applied Probability, 31(5):2019–2045, 2021

Oren Mangoubi and Aaron Smith. Mixing of hamiltonian monte carlo on strongly log-concave distributions: Continuous dynamics.The Annals of Applied Probability, 31(5):2019–2045, 2021

work page 2019
[33]

Equation of state calculations by fast computing machines.The journal of chemical physics, 21(6):1087–1092, 1953

Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. Equation of state calculations by fast computing machines.The journal of chemical physics, 21(6):1087–1092, 1953

work page 1953
[34]

Springer Science & Business Media, 2012

Sean P Meyn and Richard L Tweedie.Markov chains and stochastic stability. Springer Science & Business Media, 2012

work page 2012
[35]

Bayesian learning via stochastic dynamics.Advances in neural information processing systems, 5, 1992

Radford Neal. Bayesian learning via stochastic dynamics.Advances in neural information processing systems, 5, 1992

work page 1992
[36]

An improved acceptance procedure for the hybrid monte carlo algorithm

Radford M Neal. An improved acceptance procedure for the hybrid monte carlo algorithm. Journal of Computational Physics, 111(1):194–203, 1994

work page 1994
[37]

Radford M. Neal. MCMC using Hamiltonian Dynamics. In Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng, editors,Handbook of Markov chain Monte Carlo. Chapman & Hall/CRC PRess, 2011

work page 2011
[38]

On accelerated mixing of the no-u-turn sampler.arXiv preprint arXiv:2507.13259, 2025

Stefan Oberdörster. On accelerated mixing of the no-u-turn sampler.arXiv preprint arXiv:2507.13259, 2025

work page arXiv 2025
[39]

Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms.Biometrika, 83(1):95–110, 1996

Gareth O Roberts and Richard L Tweedie. Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms.Biometrika, 83(1):95–110, 1996

work page 1996
[40]

Wiecki, and Christopher Fonnesbeck

John Salvatier, Thomas V. Wiecki, and Christopher Fonnesbeck. Probabilistic programming in Python using PyMC3.PeerJ Computer Science, 2:e55, apr 2016

work page 2016
[41]

On the geometric ergodicity of Hamiltonian Monte Carlo.Bernoulli, 25(4A):3109–3138, 2019

Simon Byrne Samuel Livingstone, Michael Betancourt and Mark Girolami. On the geometric ergodicity of Hamiltonian Monte Carlo.Bernoulli, 25(4A):3109–3138, 2019

work page 2019
[42]

high enough mass

Christof Seiler, Simon Rubinstein-Salzedo, and Susan Holmes. Positive curvature and hamil- tonian monte carlo.Advances in Neural Information Processing Systems, 27, 2014. 26 A Proof of Section 4: Mixing time. A.1 Proof of Proposition 10 Proof.(15) (16) are respecitively given by [10, Lemma 7] and [10, Lemma 8]. Note that the authors assume thatπ=N(0, I d)...

work page 2014
[43]

max( √ d, α0)Eb−1 log(2ϵ−1)3/2 ≤d Proof.Usingr= √ d log(ϵ−132) + 8 log(H) 1/2 = √ d log(ϵ−132) + 8 log(Eb−1 log(2ϵ−1)) 1/2 , we have EX0∼µ PQk∼Kk h(X0,·), k≥0(min{k≥0 :Q k /∈DαH } ≤H) =E X0∼µ 1Dα0 (X0)PQk∼Kk h(X0,·), k≥0(min{k≥0 :Q k /∈DαH } ≤H) 28 +E X0∼µ 1Dcα0 (X0)PQk∼Kk h(X0,·), k≥0(min{k≥0 :Q k /∈DαH } ≤H) ≤4Hexp(− r2 8d) + ϵ 8 ≤ ϵ 8 + ϵ 8 = ϵ 4 Using...

work page
[44]

max( √ d, α0)Eb−1 log(2ϵ−1)3/2 In the following, we setD=D α withα= (1 + √

work page
[45]

max( √ d, α0)Eb−1 log(2ϵ−1)3/2 in order to verify the exit time condition. First note that in the case of NUTS-mul or NUTS-BPS, by using that for anyα≤d, we have√ d≤diam(D α)≤2 √ 2d,A2 is verified as long as 2E 4 exp(−r2 8d) +β2∆ +C reg2 √ 2dexp(−ρ(E−1)) +c≤1−b ,(31) 2∆ = h2 max(α, r) + h4d 4 whereβ= 1for NUTS-mul andβ= 2for NUTS-BPS. First note that to c...

work page
[46]

max( √ d, α0) 1/2 The functionF ∗ :η∈(0,1/2)→(1−3η−ϵ/4) 3/2η1/2 is maximized withη ∗ = (1−ϵ/4)/12and F ∗(η∗)≥0.186whenϵ≤0.01. Proof.We useE=C ′ log(d)and r= √ d log(ϵ−132) + 8 log(Eb−1 log(2ϵ−1)) 1/2 , ϵ −1 ≤exp( b √ d (C ′ log(d)(1 + √ 2)) !2/3 )/2 such that 8Eexp(− r2 8d)≤ϵ/4blog(ϵ −1)−1 ≤ϵ/4,max( √ d, r)≤α= (1+ √

work page
[47]

max( √ d, α0)Eb−1 log(2ϵ−1)3/2 ≤d Foranyα≤d, wehavediam(D α)≤2 √ d,C ′ isselectedsuchthatC reg2 √ 2dexp(−ρ(E−1))≤η, i.e., C ′ = 1 2ρ + log(ηCreg2 exp(ρ) √ 2)/log(d) 30 Choosingh≤ ¯h=c ′ max(α, √ d)−1/2 log(d)−1/2 withc ′ <1such that, 2βE(h2 max(α, r) + h4d 4 )≤2βC ′((c′)2 + (c′)4 log(d)−1)≤4βC ′(c′)2 , and choosingc ′ such that4C ′(c′)2 ≤η, i.e.c ′ = (η/4...

work page
[48]

Hence, there existsR ′′ >0such that for any(q, p)∈B(q 0, R′′)×B(p 0, R′′), M(q, p)≥ 1 2 M(q0, p0)>0

Finally,(q, p)7→M(q, p) is continuous. Hence, there existsR ′′ >0such that for any(q, p)∈B(q 0, R′′)×B(p 0, R′′), M(q, p)≥ 1 2 M(q0, p0)>0. LetR= min(R ′, R′′), then for any(q, p)∈B(q 0, R)×B(p 0, R), ˜KMUL h ((q, p),E)≥ 1 4 M(q0, p0)>0 Therefore, for anyq∈B(q 0, R),K MUL(q,E)≥M(q 0, p0) R B(p0,R) ρ0(p) dp/4. Proposition 22.AssumeH1, for anyR >0,B(0, R)is...

work page
[49]

Applying the expression of the leapfrog scheme yields q−1 =q ′ 1 , p −1 =−p ′ −1 , and then H(q −1, p−1) =H(q ′ 1,−p ′

work page
[50]

=H(q ′ 1, p′ 1). For anyR >0, ifq 0, p0 satisfy that|q 0| ≥Rand|p 0| ≤ |q 0|γ, then it is the same forq0,−p 0 and applying again the result at the end of the proof of [20, Proposition 7 or 8], we have (26). D.3 Proof of Theorem 19 Proof.Noticing thatH3 andH4 impliesH1 separately, the ergodicity ofK MUL is given by Theo- rem 17 and it remains to show the F...

work page