pith. sign in

arxiv: 2509.09162 · v3 · submitted 2025-09-11 · 📊 stat.CO · math.PR

Divide, Interact, Sample: The Two-System Paradigm

Pith reviewed 2026-05-18 18:26 UTC · model grok-4.3

classification 📊 stat.CO math.PR
keywords two-system samplingMonte Carlo methodsensemble samplersmean-field approximationadaptive MCMCLangevin dynamicsinvariant distributionparallel MCMC
0
0 comments X

The pith

Splitting particle ensembles into two interacting subsystems unifies mean-field, ensemble-chain, and adaptive Monte Carlo sampling while preserving exact product invariant distributions for finite sizes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-system framework that divides an ensemble of particles into two subsystems which propose updates for each other in a symmetric alternating manner. This cross-system interaction ensures that memoryless versions keep the finite ensemble exactly invariant under the product target distribution. The construction unifies three historically separate approaches by interpreting ensemble-chain methods as finite approximations to mean-field ideals and supplying a discretization route from mean-field Langevin dynamics into practical parallel MCMC. It also recovers adaptive single-chain behavior through time averages and presents concrete overdamped and underdamped two-system Langevin samplers that report efficiency improvements over NUTS baselines on benchmarks and real posteriors.

Core claim

By splitting the particle ensemble into two subsystems that propose updates for each other symmetrically and alternately, the finite ensemble maintains the product distribution as its invariant for memoryless two-system samplers, with exact stationarity holding after adaptation freezes in finite-adaptive variants. This unifies mean-field, ensemble-chain, and adaptive samplers, reveals ensemble methods as finite-N approximations to ideal mean-field samplers, and provides a principled recipe for discretizing mean-field Langevin dynamics into tractable parallel MCMC algorithms. The same logic connects naturally to adaptive single-chain methods by swapping particle statistics for time averages,,

What carries the argument

The two-system construction: an ensemble divided into two interacting subsystems that propose updates to each other in a symmetric alternating fashion to enforce the exact invariant distribution.

If this is right

  • Ensemble-chain samplers can be interpreted as finite-N approximations to an ideal mean-field sampler.
  • Mean-field Langevin dynamics can be discretized into tractable parallel MCMC algorithms using the two-system recipe.
  • Adaptive single-chain methods are recovered in the long-time limit by replacing particle-based statistics with time-averaged statistics from one chain.
  • Novel two-system overdamped and underdamped Langevin MCMC samplers achieve higher effective sample sizes per gradient evaluation than NUTS.
  • The resulting samplers deliver markedly higher wall-clock throughput on higher-dimensional posteriors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The exactness guarantee for finite ensembles might extend to hybrid algorithms that mix two-system updates with other variance-reduction techniques.
  • Similar split-and-interact logic could be tested in non-MCMC settings such as particle-based optimization or variational methods.
  • Further experiments on very high-dimensional or multimodal targets would reveal whether the reported per-gradient gains hold when ensemble size is scaled up.

Load-bearing premise

Symmetric alternating cross-system proposals between the two subsystems preserve the exact invariant product distribution for any finite ensemble size without further restrictions on the proposal kernels or adaptation rules.

What would settle it

Implement a simple memoryless two-system sampler targeting a multivariate normal, run many iterations on a modest ensemble size, and check whether the joint empirical distribution of all 2N particles matches the product measure without detectable finite-size bias in moments or marginals.

Figures

Figures reproduced from arXiv: 2509.09162 by Daniel Paulin, Geoffrey M. Vasil, James Chok, Myung Won Lee.

Figure 1
Figure 1. Figure 1: Visualization of the randomized step size distribution. The step size h = γhmax is drawn from a mixture of a point mass at γ = 1 with weight β ∈ (0, 1), and a continuous component f(x) = 3(1 − x) 2 supported on (0, 1), with weight 1 − β. This construction encourages frequent large proposals while allowing occasional small, exploratory steps, improving robustness across varying curvature scales. 11 [PITH_F… view at source ↗
Figure 2
Figure 2. Figure 2: Median ESS/Grad vs. dimension on 45 posteriordb. Each dot is one posterior; indices (1–45) map to Appendix [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Posterior-mean accuracy vs. dimension on 45 posteriordb. For each of the 45 posteriors, we plot the maximum coordinate-wise absolute relative error (MCARE) in the posterior-mean estimate, maxj |µˆj − µ ⋆ j |/std(µ ∗ j ), against the dimension; the y-axis is on a log scale. Reference means µ ⋆ and standard deviation std(µ ∗ j ) are computed from the gold-standard reference draws distributed with posteriordb… view at source ↗
Figure 4
Figure 4. Figure 4: Histogram of parameter-wise Rb (Gelman–Rubin statistics) values across 45 posteriordb. For each of the 45 models and for every scalar parameter, we compute Rˆ after warmup and pool all values into a single distribution for three samplers: Coupled MAKLA (purple), 1sys-Adaptive MAKLA (green), and 2sys-Adaptive MAKLA (red). All methods concentrate extremely close to the ideal Rb = 1 (note the tight axis range… view at source ↗
read the original abstract

Mean-field, ensemble-chain, and adaptive samplers have historically been viewed as distinct approaches to Monte Carlo sampling. In this paper, we present a unifying {two-system} framework that brings all three under one roof. In our approach, an ensemble of particles is split into two interacting subsystems that propose updates for each other in a symmetric, alternating fashion. For the memoryless two-system samplers, this cross-system interaction ensures that the finite ensemble has $\rho^{\otimes 2N}$ as its invariant distribution; for finite-adaptive variants, exact stationarity applies after the adaptation phase is frozen. The two-system construction reveals that ensemble-chain samplers can be interpreted as finite-$N$ approximations to an ideal mean-field sampler; conversely, it provides a principled recipe for discretizing mean-field Langevin dynamics into tractable parallel MCMC algorithms. The framework also connects naturally to adaptive single-chain methods: by replacing particle-based statistics with time-averaged statistics from a single chain, one recovers analogous adaptive dynamics in the long-time limit without requiring a large ensemble. We derive novel two-system versions of both overdamped and underdamped Langevin MCMC samplers within this paradigm. Across synthetic benchmarks and real-world posterior inference tasks, these two-system samplers -- which use a single BCSS-2 integrator step per Metropolis--Hastings accept/reject, in contrast to the long-trajectory style of HMC/NUTS -- exhibit substantial performance gains over No-U-Turn Sampler baselines, achieving higher effective sample sizes per gradient evaluation and markedly higher wall-clock throughput. On higher-dimensional posteriors, the adaptive MAKLA-BCSS-2 methods remain stable and achieve substantially better per-gradient efficiency and wall-clock throughput than the NUTS variants in our benchmark suite.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a unifying two-system paradigm for Monte Carlo sampling in which an ensemble of particles is partitioned into two interacting subsystems that alternately propose updates for each other in a symmetric manner. It claims that memoryless two-system samplers have exactly ρ⊗2N as the invariant distribution of the finite ensemble, that finite-adaptive variants achieve exact stationarity after freezing adaptation, that ensemble methods approximate mean-field limits and vice versa, and that novel overdamped and underdamped Langevin two-system samplers (using a single BCSS-2 step per MH accept/reject) outperform NUTS baselines in ESS per gradient and wall-clock time on synthetic and real posterior tasks.

Significance. If the invariance claims hold under the stated conditions, the framework provides a coherent unification of mean-field, ensemble-chain, and adaptive MCMC approaches together with a concrete discretization recipe for mean-field Langevin dynamics. The reported efficiency gains on higher-dimensional posteriors would be of practical interest for parallel sampling, though the absence of explicit kernel reversibility conditions in the abstract raises a question about the scope of the exact-stationarity result.

major comments (2)
  1. [Abstract (memoryless two-system samplers paragraph)] Abstract, paragraph on memoryless two-system samplers: the claim that 'symmetric alternating cross-system interaction ensures that the finite ensemble has ρ⊗2N as its invariant distribution' for arbitrary proposal kernels is not automatically true. Preservation of the product measure requires that the joint transition kernel satisfy global balance (or detailed balance) with respect to ρ⊗2N; alternation symmetry alone does not guarantee the necessary cancellation of the proposal density ratio unless each kernel satisfies a reversibility condition with respect to ρ and the Metropolis-Hastings acceptance probability is written explicitly. The manuscript must state the precise form of the acceptance ratio and any required kernel properties.
  2. [Abstract (finite-adaptive variants)] Abstract (finite-adaptive variants): the statement that 'exact stationarity applies after the adaptation phase is frozen' needs an explicit argument showing that the frozen adaptation rule leaves the two-system transition kernel reversible with respect to ρ⊗2N; without this, the claim that stationarity holds for any finite ensemble size remains unsubstantiated.
minor comments (2)
  1. [Abstract] The abstract reports 'substantial performance gains' and 'higher effective sample sizes per gradient evaluation' without mentioning the number of independent runs, standard errors, or benchmark exclusion criteria; these details belong in the main text but their absence makes the strength of the empirical claims difficult to gauge from the summary alone.
  2. [Abstract] Notation ρ⊗2N is introduced without an immediate reminder that it denotes the product measure on the 2N-particle space; a brief parenthetical definition would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the invariance properties claimed in the abstract. We address each point below and have revised the manuscript to provide the requested clarifications and explicit arguments.

read point-by-point responses
  1. Referee: Abstract, paragraph on memoryless two-system samplers: the claim that 'symmetric alternating cross-system interaction ensures that the finite ensemble has ρ⊗2N as its invariant distribution' for arbitrary proposal kernels is not automatically true. Preservation of the product measure requires that the joint transition kernel satisfy global balance (or detailed balance) with respect to ρ⊗2N; alternation symmetry alone does not guarantee the necessary cancellation of the proposal density ratio unless each kernel satisfies a reversibility condition with respect to ρ and the Metropolis-Hastings acceptance probability is written explicitly. The manuscript must state the precise form of the acceptance ratio and any required kernel properties.

    Authors: We agree that the original abstract phrasing was too brief and that alternation symmetry alone is insufficient without additional conditions. The joint kernel preserves ρ⊗2N when each cross-system proposal kernel is reversible with respect to ρ and the acceptance probability is the standard Metropolis-Hastings ratio min(1, [ρ(x')/ρ(x)] ⋅ [q(y|x')/q(y|x)]), where q denotes the proposal density from the other subsystem. In the revised manuscript we have updated the abstract to reference these conditions and added an explicit derivation in Section 2.2 showing that the symmetric alternation produces the required cancellations for global balance. This does not change the scope of the result but makes the statement rigorous. revision: yes

  2. Referee: Abstract (finite-adaptive variants): the statement that 'exact stationarity applies after the adaptation phase is frozen' needs an explicit argument showing that the frozen adaptation rule leaves the two-system transition kernel reversible with respect to ρ⊗2N; without this, the claim that stationarity holds for any finite ensemble size remains unsubstantiated.

    Authors: We accept that an explicit argument was missing from the abstract and main text. Once adaptation is frozen the parameters become fixed constants, reducing the kernel to the memoryless two-system case already shown to satisfy detailed balance with respect to ρ⊗2N. Because the proof relies only on pairwise cross-system interactions and not on the value of N, exact stationarity holds for any finite ensemble size. We have inserted a short proof sketch in the revised Section 3.3 and added a clarifying clause to the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on interaction symmetry and Markov properties

full rationale

The paper presents the two-system framework as derived from the symmetry of alternating cross-system proposals between subsystems, which by construction and standard Markov chain theory yields ρ⊗2N as the invariant for memoryless samplers. This is not obtained by fitting parameters to the target outputs, self-defining the result, or relying on load-bearing self-citations. The unification of mean-field, ensemble, and adaptive methods follows from reinterpreting existing samplers via the split, without the central invariance claim reducing to its own inputs. The abstract explicitly ties stationarity to the interaction mechanism rather than assuming it. No equations or steps in the provided description exhibit the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review prevents exhaustive audit; the framework appears to rest on standard Markov-chain invariant-distribution theory and the novel two-system interaction rule.

axioms (1)
  • standard math Standard Markov chain theory guarantees that symmetric proposals yield the target as invariant distribution when detailed balance holds.
    Invoked to claim ρ⊗2N invariance for memoryless two-system samplers.
invented entities (1)
  • Two-system paradigm no independent evidence
    purpose: Unifying mean-field, ensemble-chain, and adaptive samplers via symmetric subsystem interaction
    New construction introduced to organize and derive the samplers; no independent falsifiable evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5853 in / 1502 out tokens · 59885 ms · 2026-05-18T18:26:33.494145+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    Y . F. Atchadé. An adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift. Methodology and Computing in Applied Probability, 8(2):235–254, June 2006

  2. [2]

    Comments on “Representations of knowledge in complex systems

    J. Besag. “Comments on “Representations of knowledge in complex systems" by U. Grenander and MI Miller. Journal of the Royal Statistical Society, Series B., 56:591–592, 1994

  3. [3]

    Bou-Rabee and S

    N. Bou-Rabee and S. Oberdörster. Mixing of Metropolis-adjusted Markov chains via couplings: The high acceptance regime.Electronic Journal of Probability, 29(none), Jan. 2024

  4. [4]

    Bou-Rabee and E

    N. Bou-Rabee and E. Vanden-Eijnden. Pathwise accuracy and ergodicity of Metropolized integrators for SDEs. Communications on Pure and Applied Mathematics, 63(5):655–696, Nov. 2009

  5. [5]

    Brosse, A

    N. Brosse, A. Durmus, E. Moulines, and S. Sabanis. The tamed unadjusted Langevin algorithm.Stochastic Processes and their Applications, 129(10):3638–3663, Oct. 2019

  6. [6]

    Buchholz, N

    A. Buchholz, N. Chopin, and P. E. Jacob. Adaptive tuning of Hamiltonian Monte Carlo within sequential Monte Carlo.Bayesian Analysis, 16(3), Sept. 2021

  7. [7]

    Carmona.Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applica- tions

    R. Carmona.Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applica- tions. Society for Industrial and Applied Mathematics, Feb. 2016

  8. [8]

    Carpenter, A

    B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. A. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A probabilistic programming language.Journal of Statistical Software, 76, 2017

  9. [9]

    Chaintron and A

    L.-P. Chaintron and A. Diez. Propagation of chaos: a review of models, methods and applications. I. Models and methods. working paper or preprint, Mar. 2022

  10. [10]

    Clarté, A

    G. Clarté, A. Diez, and J. Feydy. Collective proposal distributions for nonlinear MCMC samplers: Mean-field theory and fast implementation.Electronic Journal of Statistics, 16(2), Jan. 2022

  11. [11]

    Garbuno-Inigo, F

    A. Garbuno-Inigo, F. Hoffmann, W. Li, and A. M. Stuart. Interacting Langevin diffusions: Gradient structure and ensemble kalman sampler.SIAM Journal on Applied Dynamical Systems, 19(1):412–441, 2020

  12. [12]

    Gelman, J

    A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin.Bayesian Data Analysis. Chapman and Hall/CRC, Nov. 2013

  13. [13]

    Goodman and J

    J. Goodman and J. Weare. Ensemble samplers with affine invariance.Communications in applied mathematics and computational science, 5(1):65–80, 2010

  14. [14]

    W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57(1):97–109, Apr. 1970

  15. [15]

    M. D. Hoffman and P. Sountsov. Tuning-free Generalized Hamiltonian Monte Carlo. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors,Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 ofProceedings of Machine Learning Research, pages 7799–7813. PMLR, 28–30 Mar 2022

  16. [16]

    S. F. Jarner and E. Hansen. Geometric ergodicity of Metropolis algorithms.Stochastic Processes and their Applications, 85(2):341–361, Feb. 2000

  17. [17]

    E. T. Jaynes.Probability Theory: The Logic of Science. Cambridge University Press, Apr. 2003

  18. [18]

    Johnston, N

    T. Johnston, N. Makras, and S. Sabanis. Taming the interacting particle Langevin algorithm – the superlinear case, 2024

  19. [19]

    S. Kim, Q. Song, and F. Liang. Stochastic gradient Langevin dynamics with adaptive drifts.Journal of Statistical Computation and Simulation, 92(2):318–336, July 2021

  20. [20]

    Kuntz, J

    J. Kuntz, J. N. Lim, and A. M. Johansen. Particle algorithms for maximum likelihood training of latent variable models. In F. Ruiz, J. Dy, and J.-W. van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 5134–5180. PMLR, 25–27 Apr 2023

  21. [21]

    D. Lacker. Mean field games and interacting particle systems.preprint, 2018

  22. [22]

    Laitinen and M

    P. Laitinen and M. Vihola. An invitation to adaptive Markov chain Monte Carlo convergence theory, 2024

  23. [23]

    Langevin

    P. Langevin. Sur la théorie du mouvement brownien.C. R. Acad. Sci. (Paris) 146, pages 540–533, 1908. 31 Divide, Interact, Sample: The Two-System ParadigmA PREPRINT

  24. [24]

    Leimkuhler, R

    B. Leimkuhler, R. Lohmann, and P. Whalley. A Langevin sampling algorithm inspired by the Adam optimizer, 2025

  25. [25]

    Leimkuhler, C

    B. Leimkuhler, C. Matthews, and J. Weare. Ensemble preconditioning for Markov chain Monte Carlo simulation. Statistics and Computing, 28(2):277–290, 2018

  26. [26]

    on the theory of brownian motion

    D. S. Lemons and A. Gythiel. Paul Langevin’s 1908 paper “on the theory of brownian motion” [“sur la théorie du mouvement brownien, ” c. r. acad. sci. (paris) 146, 530–533 (1908)].American Journal of Physics, 65(11):1079–1081, Nov. 1997

  27. [27]

    Liang, C

    F. Liang, C. Liu, and R. J. Carroll.Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples. Wiley, July 2010

  28. [28]

    2024 , archiveprefix =

    M. Magnusson, J. Torgander, P.-C. Bürkner, L. Zhang, B. Carpenter, and A. Vehtari. posteriordb: Testing, benchmarking and developing bayesian inference algorithms.arXiv preprint arXiv:2407.04967, 2024

  29. [29]

    Marshall and G

    T. Marshall and G. Roberts. An adaptive approach to Langevin MCMC.Statistics and Computing, 22(5):1041–1057, Sept. 2011

  30. [30]

    Metropolis, A

    N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines.The Journal of Chemical Physics, 21(6):1087–1092, June 1953

  31. [31]

    C. Modi, A. Barnett, and B. Carpenter. Delayed rejection Hamiltonian Monte Carlo for sampling multiscale distributions.Bayesian Analysis, 19(3), Sept. 2024

  32. [32]

    Nüsken and S

    N. Nüsken and S. Reich. Note on interacting Langevin diffusions: Gradient structure and ensemble Kalman sampler by Garbuno-Inigo, Hoffmann, Li and Stuart, 2019

  33. [33]

    G. Parisi. Correlation Functions and Computer Simulations.Nucl. Phys. B, 180:378, 1981

  34. [34]

    G. A. Pavliotis.Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations. Springer New York, 2014

  35. [35]

    Richardson, L

    S. Richardson, L. Bottolo, and J. S. Rosenthal.Bayesian Models for Sparse Regression Analysis of High Dimensional Data*, page 539–568. Oxford University Press, Oct. 2011

  36. [36]

    Riou-Durand, P

    L. Riou-Durand, P. Sountsov, J. V ogrinc, C. Margossian, and S. Power. Adaptive tuning for Metropolis Adjusted Langevin Trajectories. In F. Ruiz, J. Dy, and J.-W. van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 8102–8116. PMLR, 25...

  37. [37]

    Riou-Durand and J

    L. Riou-Durand and J. V ogrinc. Metropolis Adjusted Langevin Trajectories: a robust alternative to Hamiltonian Monte Carlo, 2023

  38. [38]

    C. P. Robert and G. Casella.Monte Carlo Statistical Methods. Springer New York, 2004

  39. [39]

    G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions and their discrete approxi- mations.Bernoulli, 2(4):341 – 363, 1996

  40. [40]

    G. O. Roberts and R. L. Tweedie. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms.Biometrika, 83(1):95–110, 1996

  41. [41]

    Sprungk, S

    B. Sprungk, S. Weissmann, and J. Zech. Metropolis-adjusted interacting particle sampling.Statistics and Computing, 35(3), Mar. 2025

  42. [42]

    Sznitman

    A.-S. Sznitman. Topics in propagation of chaos. In P.-L. Hennequin, editor,Ecole d’Eté de Probabilités de Saint-Flour XIX — 1989, pages 165–251, Berlin, Heidelberg, 1991. Springer Berlin Heidelberg

  43. [43]

    Turok, C

    G. Turok, C. Modi, and B. Carpenter. Sampling from multiscale densities with delayed rejection Generalized Hamiltonian Monte Carlo, 2024

  44. [44]

    van de Schoot, S

    R. van de Schoot, S. Depaoli, R. King, B. Kramer, K. Märtens, M. G. Tadesse, M. Vannucci, A. Gelman, D. Veen, J. Willemsen, and C. Yau. Bayesian statistics and modelling.Nature Reviews Methods Primers, 1(1), Jan. 2021

  45. [45]

    Zhang, M

    L. Zhang, M. D. Risser, M. F. Wehner, and T. A. O’Brien. Leveraging extremal dependence to better characterize the 2021 Pacific Northwest heatwave.Journal of Agricultural, Biological and Environmental Statistics, June 2024. 32