pith. sign in

arxiv: 2606.05935 · v1 · pith:6CX3TIEBnew · submitted 2026-06-04 · 📊 stat.CO · stat.ME

Hessian-informed, Coordinate Friendly Hamiltonian Monte Carlo in Linear Time

Pith reviewed 2026-06-27 22:42 UTC · model grok-4.3

classification 📊 stat.CO stat.ME
keywords Hamiltonian Monte CarloRiemannian HMCcoordinate friendlylinear timeMCMCpreconditioninggeneralized linear modelsgraphical models
0
0 comments X

The pith

RHMC fixed-point iterations with diagonal preconditioners reduce from O(d^2) to O(d) time for targets with coordinate-friendly structure via compute-graph manipulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a computational rewrite that makes Riemannian Hamiltonian Monte Carlo feasible in high dimensions. For targets whose structure allows coordinate-wise operations to be isolated, the method performs the fixed-point iterations needed for diagonal Hessian preconditioning in linear rather than quadratic time per step. This covers generalized linear models and both dense and sparse graphical models. The rewrite is expressed as a graph transformation, so it can be applied automatically even when the target is treated as a black box. Experiments indicate that the resulting sampler produces higher-quality samples per unit of compute time than standard HMC and NUTS variants that use either fixed or position-dependent preconditioners.

Core claim

For targets possessing coordinate-friendly structure, the fixed-point iterations of diagonal-preconditioned RHMC can be executed in O(d) rather than O(d^2) time by manipulating the compute graph so that only linear-cost operations are performed while the mathematical correctness of each iteration is preserved. The class includes generalized linear models together with certain dense and sparse graphical models, and the same graph manipulation can be automated for black-box targets.

What carries the argument

Compute-graph manipulation of the fixed-point iterations required by diagonal-preconditioned RHMC, which isolates coordinate-wise contributions to achieve linear scaling.

If this is right

  • RHMC becomes practical for high-dimensional generalized linear models without incurring quadratic cost per iteration.
  • The same linear-time property holds for both dense and sparse graphical models that satisfy the coordinate-friendly condition.
  • Automation through graph manipulation extends the method to black-box targets that internally possess the required structure.
  • Empirical sample quality per unit compute time exceeds that of position-independent and position-dependent HMC baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph-rewrite technique could be adapted to lower the cost of other Hessian-informed MCMC algorithms that currently scale quadratically.
  • In regimes with thousands of dimensions the linear scaling may make previously intractable models accessible to position-dependent preconditioning.
  • The breadth of coordinate-friendly structures could be mapped by applying the method to additional families of graphical models beyond those tested.

Load-bearing premise

The target must possess a coordinate-friendly structure that permits the graph manipulation to deliver true linear cost while leaving the fixed-point iteration mathematically unchanged.

What would settle it

A wall-clock timing experiment on a logistic-regression target that shows iteration cost growing faster than linearly with dimension, or that yields samples whose distribution deviates from the known posterior.

Figures

Figures reproduced from arXiv: 2606.05935 by Alexandre Bouchard-C\^ot\'e, Nikola Surjanovic, Son Luu, Trevor Campbell, Zuheng Xu.

Figure 1
Figure 1. Figure 1: Visualization of samples obtained by NUTS and our CG-RNUTS method with [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Trajectory comparison between CG-RHMC and standard RHMC for an 8- [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Wall-clock time (in seconds) per Hamiltonian gradient call versus dimension [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: First coordinate 2-Wasserstein distance convergence over time for various HMC [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (Left) KS test statistics of each parameter between the NRPT chain (reference posterior) and the two competing algorithms NUTS and CG-RNUTS in log-log scale (lower is better) for the horseshoe logistic regression problem on the prostate cancer data with similar time budgets where each point represents a parameter. (Right) Marginal density comparison for the circled point. (200 times that of CG-RNUTS for th… view at source ↗
Figure 6
Figure 6. Figure 6: Box plot of maximum marginal KS statistics of CG-RNUTS and NUTS compared [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Different graph notions for logistic regression model in Example [PITH_FULL_IMAGE:figures/full_fig_p043_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Different graph notions for autoregression model in Example [PITH_FULL_IMAGE:figures/full_fig_p045_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Bivariate marginals obtained by CG-RNUTS and NUTS with similar time bud [PITH_FULL_IMAGE:figures/full_fig_p054_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Bivariate marginals obtained by CG-RNUTS and NUTS with similar time [PITH_FULL_IMAGE:figures/full_fig_p055_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Box plot of median (top), mean (middle) and minimum (bottom) marginal KS [PITH_FULL_IMAGE:figures/full_fig_p056_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Box plot of sampling time (left) and total time (right) for CG-RNUTS and [PITH_FULL_IMAGE:figures/full_fig_p057_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Fixed-point iteration scaling for funnel targets with increasing dimensions. [PITH_FULL_IMAGE:figures/full_fig_p057_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Fixed-point iteration scaling for synthetic horseshoe logistic regression targets [PITH_FULL_IMAGE:figures/full_fig_p058_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Density of samples from RNUTS chains using different metric regularizations [PITH_FULL_IMAGE:figures/full_fig_p058_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Trace plots of RNUTS chains using different mass matrix regularizations on a [PITH_FULL_IMAGE:figures/full_fig_p059_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Running quartile plots of different parameters obtained from the NRPT chain [PITH_FULL_IMAGE:figures/full_fig_p059_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Restart rate over adaptation rounds of the NRPT chain for the horseshoe logistic [PITH_FULL_IMAGE:figures/full_fig_p060_18.png] view at source ↗
read the original abstract

Riemannian Hamiltonian Monte Carlo (RHMC) is a promising MCMC methodology thanks to its ability to accommodate position-dependent preconditioning and multi-step proposals. While RHMC performs well in low dimensions, it becomes infeasible in high dimensions due to its $O(d^3)$ cost per fixed-point iteration, where $d$ is the dimension of the target density. Even when the position-dependent preconditioner is based on the diagonal of the Hessian, the cost is still $O(d^2)$ per fixed-point iteration. In this paper, we propose a computational method to reduce the computational complexity of RHMC fixed-point iterations with diagonal preconditioners from $O(d^2)$ to $O(d)$ for targets with ``coordinate friendly'' structures. This distribution class includes generalized linear models as well as other dense and sparse graphical models. The method is expressed as manipulating the compute graph and can therefore be automated to work on black box targets. Finally, we show empirically that our implementation of RHMC results in better sample quality per unit of compute time for various target distributions compared to state-of-the-art HMC NUTS algorithms with both position-independent and position-dependent preconditioners.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that for targets with coordinate-friendly structure (including GLMs and certain graphical models), the fixed-point iteration inside diagonal-preconditioned RHMC can be rewritten via compute-graph manipulation to achieve O(d) cost per iteration while remaining numerically identical to the original O(d²) version; the resulting method is automatable for black-box targets and empirically yields higher effective samples per unit compute time than NUTS with both static and position-dependent preconditioners.

Significance. If the rewritten iteration is provably equivalent, the work would supply a practical route to position-dependent HMC at linear cost for an important model class, with the automation angle offering a path toward integration in probabilistic programming frameworks. The empirical comparisons provide concrete evidence of wall-clock gains on the tested targets.

major comments (2)
  1. [§3 (compute-graph manipulation)] The central claim that the compute-graph rewrite leaves the fixed-point equation unchanged (and therefore preserves symplecticity and reversibility) is load-bearing yet unsupported by an explicit algebraic identity or inductive argument. No section derives that the reordered operations return exactly the same numerical solution for arbitrary coordinate-friendly targets; the abstract and §3 treat equivalence as automatic once the structure is present.
  2. [§3, Algorithm 1] The O(d) complexity claim rests on the assumption that cross-coordinate dependencies introduced by the diagonal metric in the momentum update are fully captured by the reordered graph; if any dependency is inadvertently dropped, the integrator ceases to be measure-preserving. The manuscript supplies no general proof that this does not occur for the stated class of models.
minor comments (2)
  1. [§2] Notation for the fixed-point map and the diagonal preconditioner is introduced without a compact reference equation; adding a single displayed equation early in §2 would improve readability.
  2. [§5] The experimental section reports effective sample size per second but does not tabulate the raw iteration counts or the observed per-iteration flop counts; including these would allow direct verification of the claimed linear scaling.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the importance of rigorously establishing equivalence and measure preservation. We address the two major comments below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3 (compute-graph manipulation)] The central claim that the compute-graph rewrite leaves the fixed-point equation unchanged (and therefore preserves symplecticity and reversibility) is load-bearing yet unsupported by an explicit algebraic identity or inductive argument. No section derives that the reordered operations return exactly the same numerical solution for arbitrary coordinate-friendly targets; the abstract and §3 treat equivalence as automatic once the structure is present.

    Authors: We agree that an explicit derivation is needed. The reordering operates only on independent sub-expressions permitted by the coordinate-friendly structure (i.e., the Hessian diagonal and gradient factorizations that admit per-coordinate updates). In the revision we will insert a short inductive argument in §3 showing that, for any target whose log-density and Hessian diagonal admit the stated factorization, the fixed-point map computed on the reordered graph is algebraically identical to the original map. This identity immediately implies that symplecticity and reversibility are unchanged. revision: yes

  2. Referee: [§3, Algorithm 1] The O(d) complexity claim rests on the assumption that cross-coordinate dependencies introduced by the diagonal metric in the momentum update are fully captured by the reordered graph; if any dependency is inadvertently dropped, the integrator ceases to be measure-preserving. The manuscript supplies no general proof that this does not occur for the stated class of models.

    Authors: The coordinate-friendly class is defined precisely so that the only cross-coordinate dependencies arise through the shared diagonal metric; the graph rewrite is constructed to retain every such dependency. We will augment the revision with a general argument (again inductive) that every momentum-update term present in the original O(d²) iteration appears unchanged in the reordered O(d) version, thereby preserving the measure. The same argument covers the models listed in the paper (GLMs and the indicated graphical models). revision: yes

Circularity Check

0 steps flagged

No circularity; algorithmic graph rewrite is an independent computational technique.

full rationale

The paper's core contribution is a compute-graph manipulation that rewrites fixed-point iterations for diagonal-preconditioned RHMC to achieve O(d) cost on coordinate-friendly targets. The abstract and claims present this as an explicit algorithmic transformation that preserves the original iteration's output for the stated distribution class, without any self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the claimed equivalence to prior author work. No derivation chain collapses by construction; the method is offered as a verifiable rewrite applicable to black-box targets, making the result self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of Hamiltonian dynamics and fixed-point solvers plus the existence of coordinate-friendly structure in the target class; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • standard math Standard properties of Hamiltonian dynamics and fixed-point iterations hold for the RHMC proposal.
    Invoked implicitly when claiming the fixed-point iteration remains valid after graph manipulation.
  • domain assumption Targets belong to the coordinate-friendly class (GLMs, dense/sparse graphical models) for which the linear-time reduction applies.
    The O(d) claim is conditioned on this structural property of the target.

pith-pipeline@v0.9.1-grok · 5750 in / 1263 out tokens · 23285 ms · 2026-06-27T22:42:18.697794+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 2 canonical work pages

  1. [1]

    and Hazan, E

    Agarwal, N., Bullins, B. and Hazan, E. [2017], ‘Second-order stochastic optimization for machine learning in linear time’, Journal of Machine Learning Research 18(116), 1– 40

  2. [2]

    [2013a], A general metric for Riemannian manifold Hamiltonian Monte 29 Carlo, in ‘International Conference on Geometric Science of Information’, Springer, pp

    Betancourt, M. [2013a], A general metric for Riemannian manifold Hamiltonian Monte 29 Carlo, in ‘International Conference on Geometric Science of Information’, Springer, pp. 327–334

  3. [3]

    [2017], ‘A conceptual introduction to Hamiltonian Monte Carlo’, arXiv:1701.02434

    Betancourt, M. [2017], ‘A conceptual introduction to Hamiltonian Monte Carlo’, arXiv:1701.02434

  4. [4]

    Betancourt, M. J. [2013 b], ‘Generalizing the No-U-Turn sampler to Riemannian man- ifolds’, arXiv:1304.1920

  5. [5]

    and Roberts, G

    Bierkens, J., Fearnhead, P. and Roberts, G. [2019], ‘The Zig-zag process and super-efficient sampling for Bayesian analysis of big data’, The Annals of Statistics 47(3), 1288–1320

  6. [6]

    and Bouchard-Côté, A

    Biron-Lattes, M., Surjanovic, N., Syed, S., Campbell, T. and Bouchard-Côté, A. [2024], AutoMALA: Locally adaptive Metropolis-adjusted Langevin algorithm, in ‘Proceed- ings of The 27th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 4600–4608

  7. [7]

    Bou-Rabee, N., Carpenter, B., Kleppe, T. S. and Liu, S. [2025], ‘The within-orbit adaptive leapfrog no-u-turn sampler’, arXiv preprint arXiv:2506.18746

  8. [8]

    Bouchard-Côté, A., Vollmer, S. J. and Doucet, A. [2018], ‘The Bouncy Particle sam- pler: A nonreversible rejection-free Markov chain Monte Carlo method’, Journal of the American Statistical Association 113(522), 855–867

  9. [9]

    Brofos, J. A. and Lederman, R. R. [2021 a], ‘On numerical considerations for Rieman- nian manifold Hamiltonian Monte Carlo’, arXiv:2111.09995

  10. [10]

    and Lederman, R

    Brofos, J. and Lederman, R. R. [2021 b], Evaluating the implicit midpoint integrator for Riemannian Hamiltonian Monte Carlo, in ‘Proceedings of The 38th International Conference on Machine Learning’, PMLR, pp. 1072–1081. 30

  11. [11]

    M., Polson, N

    Carvalho, C. M., Polson, N. G. and Scott, J. G. [2009], Handling sparsity via the horse- shoe, in ‘Proceedings of The 12th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 73–80

  12. [12]

    and Vempala, S

    Chen, Z. and Vempala, S. S. [2022], ‘Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions’, Theory of Computing 18, 1–18

  13. [13]

    and Sutton, M

    Chevallier, A., Power, S. and Sutton, M. [2025], ‘Towards practical pdmp sampling: Metropolis adjustments, locally adaptive step-sizes, and nuts-based time lengths’, arXiv preprint arXiv:2503.11479

  14. [14]

    [1988], ‘Global Monte Carlo algorithms for many-fermion systems’, Physi- cal Review D 38(4), 1228–1238

    Creutz, M. [1988], ‘Global Monte Carlo algorithms for many-fermion systems’, Physi- cal Review D 38(4), 1228–1238

  15. [15]

    Dalalyan, A. S. [2017], ‘Theoretical Guarantees for Approximate Sampling from Smooth and Log-Concave Densities’, Journal of the Royal Statistical Society Series B: Statistical Methodology 79(3), 651–676

  16. [16]

    D., Pendleton, B

    Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. [1987], ‘Hybrid Monte Carlo’, Physics letters B 195(2), 216–222

  17. [17]

    and Vempala, S

    Gatmiry, K., Kelner, J. and Vempala, S. S. [2024], Sampling polytopes with Rieman- nian HMC: Faster mixing via the Lewis weights barrier, in ‘Proceedings of The 37th Annual Conference on Learning Theory’, PMLR, pp. 1796–1881

  18. [18]

    and Calderhead, B

    Girolami, M. and Calderhead, B. [2011], ‘Riemann manifold Langevin and Hamilto- nian Monte Carlo methods’, Journal of the Royal Statistical Society Series B: Statis- tical Methodology 73(2), 123–214

  19. [19]

    and Walther, A

    Griewank, A. and Walther, A. [2008], Evaluating derivatives: Principles and techniques of algorithmic differentiation , SIAM. 31

  20. [20]

    and Lubich, C

    Hairer, E., Hochbruck, M., Iserles, A. and Lubich, C. [2006], ‘Geometric numerical integration’, Oberwolfach Reports 3(1), 805–882

  21. [21]

    and Klami, A

    Hartmann, M., Girolami, M. and Klami, A. [2022], Lagrangian manifold Monte Carlo on Monge patches, in ‘Proceedings of The 25th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 4764–4781

  22. [22]

    and Livingstone, S

    Hird, M. and Livingstone, S. [2023], ‘Quantifying the effectiveness of linear precondi- tioning in Markov chain Monte Carlo’, arXiv:2312.04898

  23. [23]

    and Livingstone, S

    Hird, M. and Livingstone, S. [2026], ‘High-dimensional Adaptive MCMC with Reduced Computational Complexity’, arXiv:2604.09286

  24. [24]

    Hoffman, M. D. and Gelman, A. [2014], ‘The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo’, Journal of Machine Learning Research 15(47), 1593–1623

  25. [25]

    and Shahbaba, B

    Holbrook, A., Lan, S., Vandenberg-Rodes, A. and Shahbaba, B. [2018], ‘Geodesic Lagrangian Monte Carlo over the space of positive definite matrices: with applica- tion to Bayesian spectral density estimation’, Journal of Statistical Computation and Simulation 88(5), 982–1002

  26. [26]

    [2018], ‘Don’t unroll adjoint: Differentiating SSA-form programs’, arXiv:1810.07951

    Innes, M. [2018], ‘Don’t unroll adjoint: Differentiating SSA-form programs’, arXiv:1810.07951

  27. [27]

    Jordan, M. I. [2004], ‘Graphical Models’, Statistical Science 19(1), 140–155

  28. [28]

    and Wallin, J

    Kailas, M., Vihola, M. and Wallin, J. [2026], ‘Hierarchical Riemannian manifold Hamil- tonian Monte Carlo algorithms’, arXiv:2604.09832

  29. [29]

    Kleppe, T. S. [2018], ‘Modified Cholesky Riemann manifold Hamiltonian Monte Carlo: 32 exploiting sparsity for fast sampling of high-dimensional targets’, Statistics and Com- puting 28(4), 795–817

  30. [30]

    Kleppe, T. S. [2022], ‘Connecting the Dots: Numerical Randomized Hamiltonian Monte Carlo with State-dependent Event Rates’, Journal of Computational and Graph- ical Statistics 31(4), 1238–1253

  31. [31]

    Kleppe, T. S. [2024], ‘Log-density gradient covariance and automatic metric ten- sors for Riemann manifold Monte Carlo methods’, Scandinavian Journal of Statistics 51(3), 1206–1229

  32. [32]

    and Vempala, S

    Kook, Y., Lee, Y.-T., Shen, R. and Vempala, S. [2022], Sampling with Riemannian Hamiltonian Monte Carlo in a constrained space, in ‘Advances in Neural Information Processing Systems’, Vol. 35, Curran Associates, Inc., pp. 31684–31696

  33. [33]

    T., Shen, R

    Kook, Y., Lee, Y. T., Shen, R. and Vempala, S. [2023], Condition-number-independent Convergence Rate of Riemannian Hamiltonian Monte Carlo with Numerical Integra- tors, in ‘Proceedings of The 36th Conference on Learning Theory’, PMLR, pp. 4504– 4569

  34. [34]

    and Von Behren, R

    Langmore, I., Dikovsky, M., Geraedts, S., Norgaard, P. and Von Behren, R. [2019], ‘A condition number for Hamiltonian Monte Carlo’, arXiv:1905.09813

  35. [35]

    and Reich, S

    Leimkuhler, B. and Reich, S. [2005], Simulating Hamiltonian dynamics , Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press

  36. [36]

    [2021], ‘Geometric ergodicity of the random walk Metropolis with position-dependent proposal covariance’, Mathematics 9(4), 341

    Livingstone, S. [2021], ‘Geometric ergodicity of the random walk Metropolis with position-dependent proposal covariance’, Mathematics 9(4), 341. 33

  37. [37]

    and Wang, L

    Lu, J. and Wang, L. [2022], ‘On explicit 𝐿2-convergence rate estimate for piecewise de- terministic Markov processes in MCMC algorithms’, The Annals of Applied Probability 32(2), 1333–1361

  38. [38]

    and Bouchard-Cote, A

    Luu, S., Xu, Z., Surjanovic, N., Biron-Lattes, M., Campbell, T. and Bouchard-Cote, A. [2025], Is Gibbs sampling faster than Hamiltonian Monte Carlo on GLMs?, in ‘Pro- ceedings of The 28th International Conference on Artificial Intelligence and Statistics’, PMLR, pp. 2881–2889

  39. [39]

    Mackenze, P. B. [1989], ‘An improved hybrid Monte Carlo method’, Physics Letters B 226(3), 369–371

  40. [40]

    and Swersky, K

    Martens, J., Sutskever, I. and Swersky, K. [2012], Estimating the Hessian by Back- propagating Curvature, in ‘Proceedings of The 29th International Conference on Ma- chine Learning’, PMLR, pp. 963–970

  41. [41]

    and Churavy, V

    Moses, W. and Churavy, V. [2020], Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients, in ‘Advances in Neural Informa- tion Processing Systems’, Vol. 33, Curran Associates, Inc., pp. 12472–12485

  42. [42]

    Neal, R. M. [2003], ‘Slice sampling’, The Annals of Statistics 31(3), 705–767

  43. [43]

    Neal, R. M. [2011], MCMC using Hamiltonian dynamics, in ‘Handbook of Markov Chain Monte Carlo’, Chapman and Hall/CRC, pp. 113–162

  44. [44]

    [2012], ‘Efficiency of coordinate descent methods on huge-scale optimiza- tion problems’, SIAM Journal on Optimization 22(2), 341–362

    Nesterov, Y. [2012], ‘Efficiency of coordinate descent methods on huge-scale optimiza- tion problems’, SIAM Journal on Optimization 22(2), 341–362

  45. [45]

    and Fraccaro, M

    Paquet, U. and Fraccaro, M. [2018], ‘An efficient implementation of Riemannian man- ifold Hamiltonian Monte Carlo for Gaussian process models’, arXiv:1810.11893 . 34

  46. [46]

    and Yin, W

    Peng, Z., Wu, T., Xu, Y., Yan, M. and Yin, W. [2016], ‘Coordinate friendly structures, algorithms and applications’, arXiv:1601.00863

  47. [47]

    and Gorman, R

    Sejnowski, T. and Gorman, R. [1988], ‘Connectionist bench (sonar, mines vs. rocks)’, UCI Machine Learning Repository. URL: https://doi.org/10.24432/C5T01Q

  48. [48]

    and Baker, K

    Sigillito, V., Wing, S., Hutton, L. and Baker, K. [1989], ‘Ionosphere’, UCI Machine Learning Repository. URL: https://doi.org/10.24432/C5W01B

  49. [49]

    G., Ross, K., Jackson, D

    Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D’Amico, A. V., Richie, J. P. et al. [2002], ‘Gene expression correlates of clinical prostate cancer behavior’, Cancer Cell 1(2), 203–209

  50. [50]

    and Bouchard- Cote, A

    Surjanovic, N., Biron-Lattes, M., Tiede, P., Syed, S., Campbell, T. and Bouchard- Cote, A. [2025], ‘Pigeons.jl: Distributed sampling from intractable distributions’, Pro- ceedings of The JuliaCon Conferences 7(69), 139

  51. [51]

    and Doucet, A

    Syed, S., Bouchard-Côté, A., Deligiannidis, G. and Doucet, A. [2022], ‘Non-reversible parallel tempering: A scalable highly parallel MCMC scheme’, Journal of the Royal Statistical Society Series B: Statistical Methodology 84(2), 321–350

  52. [52]

    [2016], ‘Explicit symplectic approximation of nonseparable Hamiltonians: Al- gorithm and long time performance’, Physical Review E 94(4), 043303

    Tao, M. [2016], ‘Explicit symplectic approximation of nonseparable Hamiltonians: Al- gorithm and long time performance’, Physical Review E 94(4), 043303

  53. [53]

    [1998], ‘A note on Metropolis-Hastings kernels for general state spaces’, The Annals of Applied Probability 8(1), 1–9

    Tierney, L. [1998], ‘A note on Metropolis-Hastings kernels for general state spaces’, The Annals of Applied Probability 8(1), 1–9

  54. [54]

    and Doucet, A

    Vanetti, P., Bouchard-Côté, A., Deligiannidis, G. and Doucet, A. [2017], ‘Piecewise- deterministic Markov chain Monte Carlo’, arXiv:1707.05296 . 35

  55. [55]

    and Klami, A

    Williams, B., Yu, H., Hartmann, M. and Klami, A. [2024], Geometric No-U-Turn sam- plers: Concepts and evaluation, in ‘Proceedings of The 12th International Conference on Probabilistic Graphical Models (PGM)’, PMLR, pp. 327–347

  56. [56]

    Wright, S. J. [2015], ‘Coordinate descent algorithms’, Mathematical programming 151(1), 3–34

  57. [57]

    and Ge, H

    Xu, K. and Ge, H. [2024], Practical Hamiltonian Monte Carlo on Riemannian man- ifolds via relativity theory, in ‘Proceedings of The 41st International Conference on Machine Learning’, PMLR

  58. [58]

    and Ghahramani, Z

    Xu, K., Ge, H., Tebbutt, W., Tarek, M., Trapp, M. and Ghahramani, Z. [2020], Ad- vancedHMC. jl: A robust, modular and efficient implementation of advanced HMC algorithms, in ‘Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference’, PMLR, pp. 1–10

  59. [59]

    and Mahoney, M

    Yao, Z., Gholami, A., Shen, S., Mustafa, M., Keutzer, K. and Mahoney, M. [2021], Adahessian: An adaptive second order optimizer for machine learning, in ‘Proceedings of The AAAI Conference on Artificial Intelligence’, Vol. 35, pp. 10665–10673. 36 SUPPLEMENTARY MATERIAL A Implementation of coordinate friendly methods via compute graphs A.1 Compute graph Th...