Hessian-informed, Coordinate Friendly Hamiltonian Monte Carlo in Linear Time

Alexandre Bouchard-C\^ot\'e; Nikola Surjanovic; Son Luu; Trevor Campbell; Zuheng Xu

arxiv: 2606.05935 · v1 · pith:6CX3TIEBnew · submitted 2026-06-04 · 📊 stat.CO · stat.ME

Hessian-informed, Coordinate Friendly Hamiltonian Monte Carlo in Linear Time

Son Luu , Nikola Surjanovic , Zuheng Xu , Trevor Campbell , Alexandre Bouchard-C\^ot\'e This is my paper

Pith reviewed 2026-06-27 22:42 UTC · model grok-4.3

classification 📊 stat.CO stat.ME

keywords Hamiltonian Monte CarloRiemannian HMCcoordinate friendlylinear timeMCMCpreconditioninggeneralized linear modelsgraphical models

0 comments

The pith

RHMC fixed-point iterations with diagonal preconditioners reduce from O(d^2) to O(d) time for targets with coordinate-friendly structure via compute-graph manipulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a computational rewrite that makes Riemannian Hamiltonian Monte Carlo feasible in high dimensions. For targets whose structure allows coordinate-wise operations to be isolated, the method performs the fixed-point iterations needed for diagonal Hessian preconditioning in linear rather than quadratic time per step. This covers generalized linear models and both dense and sparse graphical models. The rewrite is expressed as a graph transformation, so it can be applied automatically even when the target is treated as a black box. Experiments indicate that the resulting sampler produces higher-quality samples per unit of compute time than standard HMC and NUTS variants that use either fixed or position-dependent preconditioners.

Core claim

For targets possessing coordinate-friendly structure, the fixed-point iterations of diagonal-preconditioned RHMC can be executed in O(d) rather than O(d^2) time by manipulating the compute graph so that only linear-cost operations are performed while the mathematical correctness of each iteration is preserved. The class includes generalized linear models together with certain dense and sparse graphical models, and the same graph manipulation can be automated for black-box targets.

What carries the argument

Compute-graph manipulation of the fixed-point iterations required by diagonal-preconditioned RHMC, which isolates coordinate-wise contributions to achieve linear scaling.

If this is right

RHMC becomes practical for high-dimensional generalized linear models without incurring quadratic cost per iteration.
The same linear-time property holds for both dense and sparse graphical models that satisfy the coordinate-friendly condition.
Automation through graph manipulation extends the method to black-box targets that internally possess the required structure.
Empirical sample quality per unit compute time exceeds that of position-independent and position-dependent HMC baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The graph-rewrite technique could be adapted to lower the cost of other Hessian-informed MCMC algorithms that currently scale quadratically.
In regimes with thousands of dimensions the linear scaling may make previously intractable models accessible to position-dependent preconditioning.
The breadth of coordinate-friendly structures could be mapped by applying the method to additional families of graphical models beyond those tested.

Load-bearing premise

The target must possess a coordinate-friendly structure that permits the graph manipulation to deliver true linear cost while leaving the fixed-point iteration mathematically unchanged.

What would settle it

A wall-clock timing experiment on a logistic-regression target that shows iteration cost growing faster than linearly with dimension, or that yields samples whose distribution deviates from the known posterior.

Figures

Figures reproduced from arXiv: 2606.05935 by Alexandre Bouchard-C\^ot\'e, Nikola Surjanovic, Son Luu, Trevor Campbell, Zuheng Xu.

**Figure 2.** Figure 2: Trajectory comparison between CG-RHMC and standard RHMC for an 8- [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Wall-clock time (in seconds) per Hamiltonian gradient call versus dimension [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: First coordinate 2-Wasserstein distance convergence over time for various HMC [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: (Left) KS test statistics of each parameter between the NRPT chain (reference posterior) and the two competing algorithms NUTS and CG-RNUTS in log-log scale (lower is better) for the horseshoe logistic regression problem on the prostate cancer data with similar time budgets where each point represents a parameter. (Right) Marginal density comparison for the circled point. (200 times that of CG-RNUTS for th… view at source ↗

**Figure 6.** Figure 6: Box plot of maximum marginal KS statistics of CG-RNUTS and NUTS compared [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: Different graph notions for logistic regression model in Example [PITH_FULL_IMAGE:figures/full_fig_p043_7.png] view at source ↗

**Figure 8.** Figure 8: Different graph notions for autoregression model in Example [PITH_FULL_IMAGE:figures/full_fig_p045_8.png] view at source ↗

**Figure 9.** Figure 9: Bivariate marginals obtained by CG-RNUTS and NUTS with similar time bud [PITH_FULL_IMAGE:figures/full_fig_p054_9.png] view at source ↗

**Figure 10.** Figure 10: Bivariate marginals obtained by CG-RNUTS and NUTS with similar time [PITH_FULL_IMAGE:figures/full_fig_p055_10.png] view at source ↗

**Figure 11.** Figure 11: Box plot of median (top), mean (middle) and minimum (bottom) marginal KS [PITH_FULL_IMAGE:figures/full_fig_p056_11.png] view at source ↗

**Figure 12.** Figure 12: Box plot of sampling time (left) and total time (right) for CG-RNUTS and [PITH_FULL_IMAGE:figures/full_fig_p057_12.png] view at source ↗

**Figure 13.** Figure 13: Fixed-point iteration scaling for funnel targets with increasing dimensions. [PITH_FULL_IMAGE:figures/full_fig_p057_13.png] view at source ↗

**Figure 14.** Figure 14: Fixed-point iteration scaling for synthetic horseshoe logistic regression targets [PITH_FULL_IMAGE:figures/full_fig_p058_14.png] view at source ↗

**Figure 15.** Figure 15: Density of samples from RNUTS chains using different metric regularizations [PITH_FULL_IMAGE:figures/full_fig_p058_15.png] view at source ↗

**Figure 16.** Figure 16: Trace plots of RNUTS chains using different mass matrix regularizations on a [PITH_FULL_IMAGE:figures/full_fig_p059_16.png] view at source ↗

**Figure 17.** Figure 17: Running quartile plots of different parameters obtained from the NRPT chain [PITH_FULL_IMAGE:figures/full_fig_p059_17.png] view at source ↗

**Figure 18.** Figure 18: Restart rate over adaptation rounds of the NRPT chain for the horseshoe logistic [PITH_FULL_IMAGE:figures/full_fig_p060_18.png] view at source ↗

read the original abstract

Riemannian Hamiltonian Monte Carlo (RHMC) is a promising MCMC methodology thanks to its ability to accommodate position-dependent preconditioning and multi-step proposals. While RHMC performs well in low dimensions, it becomes infeasible in high dimensions due to its $O(d^3)$ cost per fixed-point iteration, where $d$ is the dimension of the target density. Even when the position-dependent preconditioner is based on the diagonal of the Hessian, the cost is still $O(d^2)$ per fixed-point iteration. In this paper, we propose a computational method to reduce the computational complexity of RHMC fixed-point iterations with diagonal preconditioners from $O(d^2)$ to $O(d)$ for targets with ``coordinate friendly'' structures. This distribution class includes generalized linear models as well as other dense and sparse graphical models. The method is expressed as manipulating the compute graph and can therefore be automated to work on black box targets. Finally, we show empirically that our implementation of RHMC results in better sample quality per unit of compute time for various target distributions compared to state-of-the-art HMC NUTS algorithms with both position-independent and position-dependent preconditioners.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a graph-rewrite trick that drops diagonal RHMC fixed-point cost to linear time on GLMs and similar models, but the claim that the rewritten iteration stays exactly equivalent to the original needs explicit checking.

read the letter

The central point is that the authors show how to reorder operations on the compute graph for diagonal-preconditioned RHMC so that each fixed-point iteration costs O(d) instead of O(d^{2}) when the target has coordinate-friendly structure. That reduction is the actual new piece; prior RHMC work already handled diagonal metrics but left the quadratic cost untouched.

The approach is practical. It targets a genuine bottleneck in high-dimensional sampling for GLMs and certain graphical models, and the empirical section reports better effective samples per unit compute than both standard NUTS and other position-dependent baselines. The automation via graph manipulation is also a plus for black-box use.

The soft spot sits in the correctness argument. The method rewrites the iteration while asserting that the numerical solution remains identical, which is required for the leapfrog step to stay symplectic and reversible. The abstract presents this equivalence as automatic once the structure is coordinate-friendly, yet the stress-test concern is fair: without an inductive proof or algebraic identity that the reordering leaves the fixed-point equation unchanged for general cases, a small discrepancy could invalidate the acceptance probability. If the full paper supplies that identity, the issue disappears; if it only shows empirical agreement on a few models, the claim is weaker than stated.

The work is aimed at researchers building or tuning MCMC for structured statistical models. Anyone already running HMC on GLMs or sparse graphs will find the implementation details and timing results useful. It is solid enough on the algorithmic idea and the empirical side to merit referee time, even if the equivalence proof needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims that for targets with coordinate-friendly structure (including GLMs and certain graphical models), the fixed-point iteration inside diagonal-preconditioned RHMC can be rewritten via compute-graph manipulation to achieve O(d) cost per iteration while remaining numerically identical to the original O(d²) version; the resulting method is automatable for black-box targets and empirically yields higher effective samples per unit compute time than NUTS with both static and position-dependent preconditioners.

Significance. If the rewritten iteration is provably equivalent, the work would supply a practical route to position-dependent HMC at linear cost for an important model class, with the automation angle offering a path toward integration in probabilistic programming frameworks. The empirical comparisons provide concrete evidence of wall-clock gains on the tested targets.

major comments (2)

[§3 (compute-graph manipulation)] The central claim that the compute-graph rewrite leaves the fixed-point equation unchanged (and therefore preserves symplecticity and reversibility) is load-bearing yet unsupported by an explicit algebraic identity or inductive argument. No section derives that the reordered operations return exactly the same numerical solution for arbitrary coordinate-friendly targets; the abstract and §3 treat equivalence as automatic once the structure is present.
[§3, Algorithm 1] The O(d) complexity claim rests on the assumption that cross-coordinate dependencies introduced by the diagonal metric in the momentum update are fully captured by the reordered graph; if any dependency is inadvertently dropped, the integrator ceases to be measure-preserving. The manuscript supplies no general proof that this does not occur for the stated class of models.

minor comments (2)

[§2] Notation for the fixed-point map and the diagonal preconditioner is introduced without a compact reference equation; adding a single displayed equation early in §2 would improve readability.
[§5] The experimental section reports effective sample size per second but does not tabulate the raw iteration counts or the observed per-iteration flop counts; including these would allow direct verification of the claimed linear scaling.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the importance of rigorously establishing equivalence and measure preservation. We address the two major comments below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§3 (compute-graph manipulation)] The central claim that the compute-graph rewrite leaves the fixed-point equation unchanged (and therefore preserves symplecticity and reversibility) is load-bearing yet unsupported by an explicit algebraic identity or inductive argument. No section derives that the reordered operations return exactly the same numerical solution for arbitrary coordinate-friendly targets; the abstract and §3 treat equivalence as automatic once the structure is present.

Authors: We agree that an explicit derivation is needed. The reordering operates only on independent sub-expressions permitted by the coordinate-friendly structure (i.e., the Hessian diagonal and gradient factorizations that admit per-coordinate updates). In the revision we will insert a short inductive argument in §3 showing that, for any target whose log-density and Hessian diagonal admit the stated factorization, the fixed-point map computed on the reordered graph is algebraically identical to the original map. This identity immediately implies that symplecticity and reversibility are unchanged. revision: yes
Referee: [§3, Algorithm 1] The O(d) complexity claim rests on the assumption that cross-coordinate dependencies introduced by the diagonal metric in the momentum update are fully captured by the reordered graph; if any dependency is inadvertently dropped, the integrator ceases to be measure-preserving. The manuscript supplies no general proof that this does not occur for the stated class of models.

Authors: The coordinate-friendly class is defined precisely so that the only cross-coordinate dependencies arise through the shared diagonal metric; the graph rewrite is constructed to retain every such dependency. We will augment the revision with a general argument (again inductive) that every momentum-update term present in the original O(d²) iteration appears unchanged in the reordered O(d) version, thereby preserving the measure. The same argument covers the models listed in the paper (GLMs and the indicated graphical models). revision: yes

Circularity Check

0 steps flagged

No circularity; algorithmic graph rewrite is an independent computational technique.

full rationale

The paper's core contribution is a compute-graph manipulation that rewrites fixed-point iterations for diagonal-preconditioned RHMC to achieve O(d) cost on coordinate-friendly targets. The abstract and claims present this as an explicit algorithmic transformation that preserves the original iteration's output for the stated distribution class, without any self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the claimed equivalence to prior author work. No derivation chain collapses by construction; the method is offered as a verifiable rewrite applicable to black-box targets, making the result self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of Hamiltonian dynamics and fixed-point solvers plus the existence of coordinate-friendly structure in the target class; no free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Standard properties of Hamiltonian dynamics and fixed-point iterations hold for the RHMC proposal.
Invoked implicitly when claiming the fixed-point iteration remains valid after graph manipulation.
domain assumption Targets belong to the coordinate-friendly class (GLMs, dense/sparse graphical models) for which the linear-time reduction applies.
The O(d) claim is conditioned on this structural property of the target.

pith-pipeline@v0.9.1-grok · 5750 in / 1263 out tokens · 23285 ms · 2026-06-27T22:42:18.697794+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 2 canonical work pages

[1]

and Hazan, E

Agarwal, N., Bullins, B. and Hazan, E. [2017], ‘Second-order stochastic optimization for machine learning in linear time’, Journal of Machine Learning Research 18(116), 1– 40

2017
[2]

[2013a], A general metric for Riemannian manifold Hamiltonian Monte 29 Carlo, in ‘International Conference on Geometric Science of Information’, Springer, pp

Betancourt, M. [2013a], A general metric for Riemannian manifold Hamiltonian Monte 29 Carlo, in ‘International Conference on Geometric Science of Information’, Springer, pp. 327–334
[3]

[2017], ‘A conceptual introduction to Hamiltonian Monte Carlo’, arXiv:1701.02434

Betancourt, M. [2017], ‘A conceptual introduction to Hamiltonian Monte Carlo’, arXiv:1701.02434

Pith/arXiv arXiv 2017
[4]

Betancourt, M. J. [2013 b], ‘Generalizing the No-U-Turn sampler to Riemannian man- ifolds’, arXiv:1304.1920

Pith/arXiv arXiv 2013
[5]

and Roberts, G

Bierkens, J., Fearnhead, P. and Roberts, G. [2019], ‘The Zig-zag process and super-eﬀicient sampling for Bayesian analysis of big data’, The Annals of Statistics 47(3), 1288–1320

2019
[6]

and Bouchard-Côté, A

Biron-Lattes, M., Surjanovic, N., Syed, S., Campbell, T. and Bouchard-Côté, A. [2024], AutoMALA: Locally adaptive Metropolis-adjusted Langevin algorithm, in ‘Proceed- ings of The 27th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 4600–4608

2024
[7]

Bou-Rabee, N., Carpenter, B., Kleppe, T. S. and Liu, S. [2025], ‘The within-orbit adaptive leapfrog no-u-turn sampler’, arXiv preprint arXiv:2506.18746

arXiv 2025
[8]

Bouchard-Côté, A., Vollmer, S. J. and Doucet, A. [2018], ‘The Bouncy Particle sam- pler: A nonreversible rejection-free Markov chain Monte Carlo method’, Journal of the American Statistical Association 113(522), 855–867

2018
[9]

Brofos, J. A. and Lederman, R. R. [2021 a], ‘On numerical considerations for Rieman- nian manifold Hamiltonian Monte Carlo’, arXiv:2111.09995

arXiv 2021
[10]

and Lederman, R

Brofos, J. and Lederman, R. R. [2021 b], Evaluating the implicit midpoint integrator for Riemannian Hamiltonian Monte Carlo, in ‘Proceedings of The 38th International Conference on Machine Learning’, PMLR, pp. 1072–1081. 30

2021
[11]

M., Polson, N

Carvalho, C. M., Polson, N. G. and Scott, J. G. [2009], Handling sparsity via the horse- shoe, in ‘Proceedings of The 12th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 73–80

2009
[12]

and Vempala, S

Chen, Z. and Vempala, S. S. [2022], ‘Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions’, Theory of Computing 18, 1–18

2022
[13]

and Sutton, M

Chevallier, A., Power, S. and Sutton, M. [2025], ‘Towards practical pdmp sampling: Metropolis adjustments, locally adaptive step-sizes, and nuts-based time lengths’, arXiv preprint arXiv:2503.11479

Pith/arXiv arXiv 2025
[14]

[1988], ‘Global Monte Carlo algorithms for many-fermion systems’, Physi- cal Review D 38(4), 1228–1238

Creutz, M. [1988], ‘Global Monte Carlo algorithms for many-fermion systems’, Physi- cal Review D 38(4), 1228–1238

1988
[15]

Dalalyan, A. S. [2017], ‘Theoretical Guarantees for Approximate Sampling from Smooth and Log-Concave Densities’, Journal of the Royal Statistical Society Series B: Statistical Methodology 79(3), 651–676

2017
[16]

D., Pendleton, B

Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. [1987], ‘Hybrid Monte Carlo’, Physics letters B 195(2), 216–222

1987
[17]

and Vempala, S

Gatmiry, K., Kelner, J. and Vempala, S. S. [2024], Sampling polytopes with Rieman- nian HMC: Faster mixing via the Lewis weights barrier, in ‘Proceedings of The 37th Annual Conference on Learning Theory’, PMLR, pp. 1796–1881

2024
[18]

and Calderhead, B

Girolami, M. and Calderhead, B. [2011], ‘Riemann manifold Langevin and Hamilto- nian Monte Carlo methods’, Journal of the Royal Statistical Society Series B: Statis- tical Methodology 73(2), 123–214

2011
[19]

and Walther, A

Griewank, A. and Walther, A. [2008], Evaluating derivatives: Principles and techniques of algorithmic differentiation , SIAM. 31

2008
[20]

and Lubich, C

Hairer, E., Hochbruck, M., Iserles, A. and Lubich, C. [2006], ‘Geometric numerical integration’, Oberwolfach Reports 3(1), 805–882

2006
[21]

and Klami, A

Hartmann, M., Girolami, M. and Klami, A. [2022], Lagrangian manifold Monte Carlo on Monge patches, in ‘Proceedings of The 25th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 4764–4781

2022
[22]

and Livingstone, S

Hird, M. and Livingstone, S. [2023], ‘Quantifying the effectiveness of linear precondi- tioning in Markov chain Monte Carlo’, arXiv:2312.04898

arXiv 2023
[23]

and Livingstone, S

Hird, M. and Livingstone, S. [2026], ‘High-dimensional Adaptive MCMC with Reduced Computational Complexity’, arXiv:2604.09286

Pith/arXiv arXiv 2026
[24]

Hoffman, M. D. and Gelman, A. [2014], ‘The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo’, Journal of Machine Learning Research 15(47), 1593–1623

2014
[25]

and Shahbaba, B

Holbrook, A., Lan, S., Vandenberg-Rodes, A. and Shahbaba, B. [2018], ‘Geodesic Lagrangian Monte Carlo over the space of positive definite matrices: with applica- tion to Bayesian spectral density estimation’, Journal of Statistical Computation and Simulation 88(5), 982–1002

2018
[26]

[2018], ‘Don’t unroll adjoint: Differentiating SSA-form programs’, arXiv:1810.07951

Innes, M. [2018], ‘Don’t unroll adjoint: Differentiating SSA-form programs’, arXiv:1810.07951

Pith/arXiv arXiv 2018
[27]

Jordan, M. I. [2004], ‘Graphical Models’, Statistical Science 19(1), 140–155

2004
[28]

and Wallin, J

Kailas, M., Vihola, M. and Wallin, J. [2026], ‘Hierarchical Riemannian manifold Hamil- tonian Monte Carlo algorithms’, arXiv:2604.09832

Pith/arXiv arXiv 2026
[29]

Kleppe, T. S. [2018], ‘Modified Cholesky Riemann manifold Hamiltonian Monte Carlo: 32 exploiting sparsity for fast sampling of high-dimensional targets’, Statistics and Com- puting 28(4), 795–817

2018
[30]

Kleppe, T. S. [2022], ‘Connecting the Dots: Numerical Randomized Hamiltonian Monte Carlo with State-dependent Event Rates’, Journal of Computational and Graph- ical Statistics 31(4), 1238–1253

2022
[31]

Kleppe, T. S. [2024], ‘Log-density gradient covariance and automatic metric ten- sors for Riemann manifold Monte Carlo methods’, Scandinavian Journal of Statistics 51(3), 1206–1229

2024
[32]

and Vempala, S

Kook, Y., Lee, Y.-T., Shen, R. and Vempala, S. [2022], Sampling with Riemannian Hamiltonian Monte Carlo in a constrained space, in ‘Advances in Neural Information Processing Systems’, Vol. 35, Curran Associates, Inc., pp. 31684–31696

2022
[33]

T., Shen, R

Kook, Y., Lee, Y. T., Shen, R. and Vempala, S. [2023], Condition-number-independent Convergence Rate of Riemannian Hamiltonian Monte Carlo with Numerical Integra- tors, in ‘Proceedings of The 36th Conference on Learning Theory’, PMLR, pp. 4504– 4569

2023
[34]

and Von Behren, R

Langmore, I., Dikovsky, M., Geraedts, S., Norgaard, P. and Von Behren, R. [2019], ‘A condition number for Hamiltonian Monte Carlo’, arXiv:1905.09813

arXiv 2019
[35]

and Reich, S

Leimkuhler, B. and Reich, S. [2005], Simulating Hamiltonian dynamics , Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press

2005
[36]

[2021], ‘Geometric ergodicity of the random walk Metropolis with position-dependent proposal covariance’, Mathematics 9(4), 341

Livingstone, S. [2021], ‘Geometric ergodicity of the random walk Metropolis with position-dependent proposal covariance’, Mathematics 9(4), 341. 33

2021
[37]

and Wang, L

Lu, J. and Wang, L. [2022], ‘On explicit 𝐿2-convergence rate estimate for piecewise de- terministic Markov processes in MCMC algorithms’, The Annals of Applied Probability 32(2), 1333–1361

2022
[38]

and Bouchard-Cote, A

Luu, S., Xu, Z., Surjanovic, N., Biron-Lattes, M., Campbell, T. and Bouchard-Cote, A. [2025], Is Gibbs sampling faster than Hamiltonian Monte Carlo on GLMs?, in ‘Pro- ceedings of The 28th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 2881–2889

2025
[39]

Mackenze, P. B. [1989], ‘An improved hybrid Monte Carlo method’, Physics Letters B 226(3), 369–371

1989
[40]

and Swersky, K

Martens, J., Sutskever, I. and Swersky, K. [2012], Estimating the Hessian by Back- propagating Curvature, in ‘Proceedings of The 29th International Conference on Ma- chine Learning’, PMLR, pp. 963–970

2012
[41]

and Churavy, V

Moses, W. and Churavy, V. [2020], Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients, in ‘Advances in Neural Informa- tion Processing Systems’, Vol. 33, Curran Associates, Inc., pp. 12472–12485

2020
[42]

Neal, R. M. [2003], ‘Slice sampling’, The Annals of Statistics 31(3), 705–767

2003
[43]

Neal, R. M. [2011], MCMC using Hamiltonian dynamics, in ‘Handbook of Markov Chain Monte Carlo’, Chapman and Hall/CRC, pp. 113–162

2011
[44]

[2012], ‘Eﬀiciency of coordinate descent methods on huge-scale optimiza- tion problems’, SIAM Journal on Optimization 22(2), 341–362

Nesterov, Y. [2012], ‘Eﬀiciency of coordinate descent methods on huge-scale optimiza- tion problems’, SIAM Journal on Optimization 22(2), 341–362

2012
[45]

and Fraccaro, M

Paquet, U. and Fraccaro, M. [2018], ‘An eﬀicient implementation of Riemannian man- ifold Hamiltonian Monte Carlo for Gaussian process models’, arXiv:1810.11893 . 34

Pith/arXiv arXiv 2018
[46]

and Yin, W

Peng, Z., Wu, T., Xu, Y., Yan, M. and Yin, W. [2016], ‘Coordinate friendly structures, algorithms and applications’, arXiv:1601.00863

Pith/arXiv arXiv 2016
[47]

and Gorman, R

Sejnowski, T. and Gorman, R. [1988], ‘Connectionist bench (sonar, mines vs. rocks)’, UCI Machine Learning Repository. URL: https://doi.org/10.24432/C5T01Q

work page doi:10.24432/c5t01q 1988
[48]

and Baker, K

Sigillito, V., Wing, S., Hutton, L. and Baker, K. [1989], ‘Ionosphere’, UCI Machine Learning Repository. URL: https://doi.org/10.24432/C5W01B

work page doi:10.24432/c5w01b 1989
[49]

G., Ross, K., Jackson, D

Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D’Amico, A. V., Richie, J. P. et al. [2002], ‘Gene expression correlates of clinical prostate cancer behavior’, Cancer Cell 1(2), 203–209

2002
[50]

and Bouchard- Cote, A

Surjanovic, N., Biron-Lattes, M., Tiede, P., Syed, S., Campbell, T. and Bouchard- Cote, A. [2025], ‘Pigeons.jl: Distributed sampling from intractable distributions’, Pro- ceedings of The JuliaCon Conferences 7(69), 139

2025
[51]

and Doucet, A

Syed, S., Bouchard-Côté, A., Deligiannidis, G. and Doucet, A. [2022], ‘Non-reversible parallel tempering: A scalable highly parallel MCMC scheme’, Journal of the Royal Statistical Society Series B: Statistical Methodology 84(2), 321–350

2022
[52]

[2016], ‘Explicit symplectic approximation of nonseparable Hamiltonians: Al- gorithm and long time performance’, Physical Review E 94(4), 043303

Tao, M. [2016], ‘Explicit symplectic approximation of nonseparable Hamiltonians: Al- gorithm and long time performance’, Physical Review E 94(4), 043303

2016
[53]

[1998], ‘A note on Metropolis-Hastings kernels for general state spaces’, The Annals of Applied Probability 8(1), 1–9

Tierney, L. [1998], ‘A note on Metropolis-Hastings kernels for general state spaces’, The Annals of Applied Probability 8(1), 1–9

1998
[54]

and Doucet, A

Vanetti, P., Bouchard-Côté, A., Deligiannidis, G. and Doucet, A. [2017], ‘Piecewise- deterministic Markov chain Monte Carlo’, arXiv:1707.05296 . 35

Pith/arXiv arXiv 2017
[55]

and Klami, A

Williams, B., Yu, H., Hartmann, M. and Klami, A. [2024], Geometric No-U-Turn sam- plers: Concepts and evaluation, in ‘Proceedings of The 12th International Conference on Probabilistic Graphical Models (PGM)’, PMLR, pp. 327–347

2024
[56]

Wright, S. J. [2015], ‘Coordinate descent algorithms’, Mathematical programming 151(1), 3–34

2015
[57]

and Ge, H

Xu, K. and Ge, H. [2024], Practical Hamiltonian Monte Carlo on Riemannian man- ifolds via relativity theory, in ‘Proceedings of The 41st International Conference on Machine Learning’, PMLR

2024
[58]

and Ghahramani, Z

Xu, K., Ge, H., Tebbutt, W., Tarek, M., Trapp, M. and Ghahramani, Z. [2020], Ad- vancedHMC. jl: A robust, modular and eﬀicient implementation of advanced HMC algorithms, in ‘Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference’, PMLR, pp. 1–10

2020
[59]

and Mahoney, M

Yao, Z., Gholami, A., Shen, S., Mustafa, M., Keutzer, K. and Mahoney, M. [2021], Adahessian: An adaptive second order optimizer for machine learning, in ‘Proceedings of The AAAI Conference on Artificial Intelligence’, Vol. 35, pp. 10665–10673. 36 SUPPLEMENTARY MATERIAL A Implementation of coordinate friendly methods via compute graphs A.1 Compute graph Th...

2021

[1] [1]

and Hazan, E

Agarwal, N., Bullins, B. and Hazan, E. [2017], ‘Second-order stochastic optimization for machine learning in linear time’, Journal of Machine Learning Research 18(116), 1– 40

2017

[2] [2]

[2013a], A general metric for Riemannian manifold Hamiltonian Monte 29 Carlo, in ‘International Conference on Geometric Science of Information’, Springer, pp

Betancourt, M. [2013a], A general metric for Riemannian manifold Hamiltonian Monte 29 Carlo, in ‘International Conference on Geometric Science of Information’, Springer, pp. 327–334

[3] [3]

[2017], ‘A conceptual introduction to Hamiltonian Monte Carlo’, arXiv:1701.02434

Betancourt, M. [2017], ‘A conceptual introduction to Hamiltonian Monte Carlo’, arXiv:1701.02434

Pith/arXiv arXiv 2017

[4] [4]

Betancourt, M. J. [2013 b], ‘Generalizing the No-U-Turn sampler to Riemannian man- ifolds’, arXiv:1304.1920

Pith/arXiv arXiv 2013

[5] [5]

and Roberts, G

Bierkens, J., Fearnhead, P. and Roberts, G. [2019], ‘The Zig-zag process and super-eﬀicient sampling for Bayesian analysis of big data’, The Annals of Statistics 47(3), 1288–1320

2019

[6] [6]

and Bouchard-Côté, A

Biron-Lattes, M., Surjanovic, N., Syed, S., Campbell, T. and Bouchard-Côté, A. [2024], AutoMALA: Locally adaptive Metropolis-adjusted Langevin algorithm, in ‘Proceed- ings of The 27th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 4600–4608

2024

[7] [7]

Bou-Rabee, N., Carpenter, B., Kleppe, T. S. and Liu, S. [2025], ‘The within-orbit adaptive leapfrog no-u-turn sampler’, arXiv preprint arXiv:2506.18746

arXiv 2025

[8] [8]

Bouchard-Côté, A., Vollmer, S. J. and Doucet, A. [2018], ‘The Bouncy Particle sam- pler: A nonreversible rejection-free Markov chain Monte Carlo method’, Journal of the American Statistical Association 113(522), 855–867

2018

[9] [9]

Brofos, J. A. and Lederman, R. R. [2021 a], ‘On numerical considerations for Rieman- nian manifold Hamiltonian Monte Carlo’, arXiv:2111.09995

arXiv 2021

[10] [10]

and Lederman, R

Brofos, J. and Lederman, R. R. [2021 b], Evaluating the implicit midpoint integrator for Riemannian Hamiltonian Monte Carlo, in ‘Proceedings of The 38th International Conference on Machine Learning’, PMLR, pp. 1072–1081. 30

2021

[11] [11]

M., Polson, N

Carvalho, C. M., Polson, N. G. and Scott, J. G. [2009], Handling sparsity via the horse- shoe, in ‘Proceedings of The 12th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 73–80

2009

[12] [12]

and Vempala, S

Chen, Z. and Vempala, S. S. [2022], ‘Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions’, Theory of Computing 18, 1–18

2022

[13] [13]

and Sutton, M

Chevallier, A., Power, S. and Sutton, M. [2025], ‘Towards practical pdmp sampling: Metropolis adjustments, locally adaptive step-sizes, and nuts-based time lengths’, arXiv preprint arXiv:2503.11479

Pith/arXiv arXiv 2025

[14] [14]

[1988], ‘Global Monte Carlo algorithms for many-fermion systems’, Physi- cal Review D 38(4), 1228–1238

Creutz, M. [1988], ‘Global Monte Carlo algorithms for many-fermion systems’, Physi- cal Review D 38(4), 1228–1238

1988

[15] [15]

Dalalyan, A. S. [2017], ‘Theoretical Guarantees for Approximate Sampling from Smooth and Log-Concave Densities’, Journal of the Royal Statistical Society Series B: Statistical Methodology 79(3), 651–676

2017

[16] [16]

D., Pendleton, B

Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. [1987], ‘Hybrid Monte Carlo’, Physics letters B 195(2), 216–222

1987

[17] [17]

and Vempala, S

Gatmiry, K., Kelner, J. and Vempala, S. S. [2024], Sampling polytopes with Rieman- nian HMC: Faster mixing via the Lewis weights barrier, in ‘Proceedings of The 37th Annual Conference on Learning Theory’, PMLR, pp. 1796–1881

2024

[18] [18]

and Calderhead, B

Girolami, M. and Calderhead, B. [2011], ‘Riemann manifold Langevin and Hamilto- nian Monte Carlo methods’, Journal of the Royal Statistical Society Series B: Statis- tical Methodology 73(2), 123–214

2011

[19] [19]

and Walther, A

Griewank, A. and Walther, A. [2008], Evaluating derivatives: Principles and techniques of algorithmic differentiation , SIAM. 31

2008

[20] [20]

and Lubich, C

Hairer, E., Hochbruck, M., Iserles, A. and Lubich, C. [2006], ‘Geometric numerical integration’, Oberwolfach Reports 3(1), 805–882

2006

[21] [21]

and Klami, A

Hartmann, M., Girolami, M. and Klami, A. [2022], Lagrangian manifold Monte Carlo on Monge patches, in ‘Proceedings of The 25th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 4764–4781

2022

[22] [22]

and Livingstone, S

Hird, M. and Livingstone, S. [2023], ‘Quantifying the effectiveness of linear precondi- tioning in Markov chain Monte Carlo’, arXiv:2312.04898

arXiv 2023

[23] [23]

and Livingstone, S

Hird, M. and Livingstone, S. [2026], ‘High-dimensional Adaptive MCMC with Reduced Computational Complexity’, arXiv:2604.09286

Pith/arXiv arXiv 2026

[24] [24]

Hoffman, M. D. and Gelman, A. [2014], ‘The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo’, Journal of Machine Learning Research 15(47), 1593–1623

2014

[25] [25]

and Shahbaba, B

Holbrook, A., Lan, S., Vandenberg-Rodes, A. and Shahbaba, B. [2018], ‘Geodesic Lagrangian Monte Carlo over the space of positive definite matrices: with applica- tion to Bayesian spectral density estimation’, Journal of Statistical Computation and Simulation 88(5), 982–1002

2018

[26] [26]

[2018], ‘Don’t unroll adjoint: Differentiating SSA-form programs’, arXiv:1810.07951

Innes, M. [2018], ‘Don’t unroll adjoint: Differentiating SSA-form programs’, arXiv:1810.07951

Pith/arXiv arXiv 2018

[27] [27]

Jordan, M. I. [2004], ‘Graphical Models’, Statistical Science 19(1), 140–155

2004

[28] [28]

and Wallin, J

Kailas, M., Vihola, M. and Wallin, J. [2026], ‘Hierarchical Riemannian manifold Hamil- tonian Monte Carlo algorithms’, arXiv:2604.09832

Pith/arXiv arXiv 2026

[29] [29]

Kleppe, T. S. [2018], ‘Modified Cholesky Riemann manifold Hamiltonian Monte Carlo: 32 exploiting sparsity for fast sampling of high-dimensional targets’, Statistics and Com- puting 28(4), 795–817

2018

[30] [30]

Kleppe, T. S. [2022], ‘Connecting the Dots: Numerical Randomized Hamiltonian Monte Carlo with State-dependent Event Rates’, Journal of Computational and Graph- ical Statistics 31(4), 1238–1253

2022

[31] [31]

Kleppe, T. S. [2024], ‘Log-density gradient covariance and automatic metric ten- sors for Riemann manifold Monte Carlo methods’, Scandinavian Journal of Statistics 51(3), 1206–1229

2024

[32] [32]

and Vempala, S

Kook, Y., Lee, Y.-T., Shen, R. and Vempala, S. [2022], Sampling with Riemannian Hamiltonian Monte Carlo in a constrained space, in ‘Advances in Neural Information Processing Systems’, Vol. 35, Curran Associates, Inc., pp. 31684–31696

2022

[33] [33]

T., Shen, R

Kook, Y., Lee, Y. T., Shen, R. and Vempala, S. [2023], Condition-number-independent Convergence Rate of Riemannian Hamiltonian Monte Carlo with Numerical Integra- tors, in ‘Proceedings of The 36th Conference on Learning Theory’, PMLR, pp. 4504– 4569

2023

[34] [34]

and Von Behren, R

Langmore, I., Dikovsky, M., Geraedts, S., Norgaard, P. and Von Behren, R. [2019], ‘A condition number for Hamiltonian Monte Carlo’, arXiv:1905.09813

arXiv 2019

[35] [35]

and Reich, S

Leimkuhler, B. and Reich, S. [2005], Simulating Hamiltonian dynamics , Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press

2005

[36] [36]

[2021], ‘Geometric ergodicity of the random walk Metropolis with position-dependent proposal covariance’, Mathematics 9(4), 341

Livingstone, S. [2021], ‘Geometric ergodicity of the random walk Metropolis with position-dependent proposal covariance’, Mathematics 9(4), 341. 33

2021

[37] [37]

and Wang, L

Lu, J. and Wang, L. [2022], ‘On explicit 𝐿2-convergence rate estimate for piecewise de- terministic Markov processes in MCMC algorithms’, The Annals of Applied Probability 32(2), 1333–1361

2022

[38] [38]

and Bouchard-Cote, A

Luu, S., Xu, Z., Surjanovic, N., Biron-Lattes, M., Campbell, T. and Bouchard-Cote, A. [2025], Is Gibbs sampling faster than Hamiltonian Monte Carlo on GLMs?, in ‘Pro- ceedings of The 28th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 2881–2889

2025

[39] [39]

Mackenze, P. B. [1989], ‘An improved hybrid Monte Carlo method’, Physics Letters B 226(3), 369–371

1989

[40] [40]

and Swersky, K

Martens, J., Sutskever, I. and Swersky, K. [2012], Estimating the Hessian by Back- propagating Curvature, in ‘Proceedings of The 29th International Conference on Ma- chine Learning’, PMLR, pp. 963–970

2012

[41] [41]

and Churavy, V

Moses, W. and Churavy, V. [2020], Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients, in ‘Advances in Neural Informa- tion Processing Systems’, Vol. 33, Curran Associates, Inc., pp. 12472–12485

2020

[42] [42]

Neal, R. M. [2003], ‘Slice sampling’, The Annals of Statistics 31(3), 705–767

2003

[43] [43]

Neal, R. M. [2011], MCMC using Hamiltonian dynamics, in ‘Handbook of Markov Chain Monte Carlo’, Chapman and Hall/CRC, pp. 113–162

2011

[44] [44]

[2012], ‘Eﬀiciency of coordinate descent methods on huge-scale optimiza- tion problems’, SIAM Journal on Optimization 22(2), 341–362

Nesterov, Y. [2012], ‘Eﬀiciency of coordinate descent methods on huge-scale optimiza- tion problems’, SIAM Journal on Optimization 22(2), 341–362

2012

[45] [45]

and Fraccaro, M

Paquet, U. and Fraccaro, M. [2018], ‘An eﬀicient implementation of Riemannian man- ifold Hamiltonian Monte Carlo for Gaussian process models’, arXiv:1810.11893 . 34

Pith/arXiv arXiv 2018

[46] [46]

and Yin, W

Peng, Z., Wu, T., Xu, Y., Yan, M. and Yin, W. [2016], ‘Coordinate friendly structures, algorithms and applications’, arXiv:1601.00863

Pith/arXiv arXiv 2016

[47] [47]

and Gorman, R

Sejnowski, T. and Gorman, R. [1988], ‘Connectionist bench (sonar, mines vs. rocks)’, UCI Machine Learning Repository. URL: https://doi.org/10.24432/C5T01Q

work page doi:10.24432/c5t01q 1988

[48] [48]

and Baker, K

Sigillito, V., Wing, S., Hutton, L. and Baker, K. [1989], ‘Ionosphere’, UCI Machine Learning Repository. URL: https://doi.org/10.24432/C5W01B

work page doi:10.24432/c5w01b 1989

[49] [49]

G., Ross, K., Jackson, D

Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D’Amico, A. V., Richie, J. P. et al. [2002], ‘Gene expression correlates of clinical prostate cancer behavior’, Cancer Cell 1(2), 203–209

2002

[50] [50]

and Bouchard- Cote, A

Surjanovic, N., Biron-Lattes, M., Tiede, P., Syed, S., Campbell, T. and Bouchard- Cote, A. [2025], ‘Pigeons.jl: Distributed sampling from intractable distributions’, Pro- ceedings of The JuliaCon Conferences 7(69), 139

2025

[51] [51]

and Doucet, A

Syed, S., Bouchard-Côté, A., Deligiannidis, G. and Doucet, A. [2022], ‘Non-reversible parallel tempering: A scalable highly parallel MCMC scheme’, Journal of the Royal Statistical Society Series B: Statistical Methodology 84(2), 321–350

2022

[52] [52]

[2016], ‘Explicit symplectic approximation of nonseparable Hamiltonians: Al- gorithm and long time performance’, Physical Review E 94(4), 043303

Tao, M. [2016], ‘Explicit symplectic approximation of nonseparable Hamiltonians: Al- gorithm and long time performance’, Physical Review E 94(4), 043303

2016

[53] [53]

[1998], ‘A note on Metropolis-Hastings kernels for general state spaces’, The Annals of Applied Probability 8(1), 1–9

Tierney, L. [1998], ‘A note on Metropolis-Hastings kernels for general state spaces’, The Annals of Applied Probability 8(1), 1–9

1998

[54] [54]

and Doucet, A

Vanetti, P., Bouchard-Côté, A., Deligiannidis, G. and Doucet, A. [2017], ‘Piecewise- deterministic Markov chain Monte Carlo’, arXiv:1707.05296 . 35

Pith/arXiv arXiv 2017

[55] [55]

and Klami, A

Williams, B., Yu, H., Hartmann, M. and Klami, A. [2024], Geometric No-U-Turn sam- plers: Concepts and evaluation, in ‘Proceedings of The 12th International Conference on Probabilistic Graphical Models (PGM)’, PMLR, pp. 327–347

2024

[56] [56]

Wright, S. J. [2015], ‘Coordinate descent algorithms’, Mathematical programming 151(1), 3–34

2015

[57] [57]

and Ge, H

Xu, K. and Ge, H. [2024], Practical Hamiltonian Monte Carlo on Riemannian man- ifolds via relativity theory, in ‘Proceedings of The 41st International Conference on Machine Learning’, PMLR

2024

[58] [58]

and Ghahramani, Z

Xu, K., Ge, H., Tebbutt, W., Tarek, M., Trapp, M. and Ghahramani, Z. [2020], Ad- vancedHMC. jl: A robust, modular and eﬀicient implementation of advanced HMC algorithms, in ‘Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference’, PMLR, pp. 1–10

2020

[59] [59]

and Mahoney, M

Yao, Z., Gholami, A., Shen, S., Mustafa, M., Keutzer, K. and Mahoney, M. [2021], Adahessian: An adaptive second order optimizer for machine learning, in ‘Proceedings of The AAAI Conference on Artificial Intelligence’, Vol. 35, pp. 10665–10673. 36 SUPPLEMENTARY MATERIAL A Implementation of coordinate friendly methods via compute graphs A.1 Compute graph Th...

2021