pith. sign in

arxiv: 2605.21253 · v1 · pith:EILRE75Vnew · submitted 2026-05-20 · 📊 stat.ML · cs.LG

Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

Pith reviewed 2026-05-21 03:56 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords annealed Langevinsimulation-based inferencecompositional score-based inferenceWasserstein boundsposterior samplinghyperparameter tuningbridging densities
0
0 comments X

The pith

Wasserstein bounds provide explicit rules to tune annealed Langevin for accurate sampling in compositional SBI

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives Wasserstein bounds for annealed Langevin dynamics that use approximate composite scores in simulation-based inference. These bounds are translated into concrete decision rules for the step size, number of steps per level, and number of annealing levels to achieve a prescribed accuracy. In the Gaussian setting closed-form expressions are derived showing that one set of bridging densities allows larger steps and fewer total steps than the other. The rules are shown to generalize empirically to more complex problems, offering a theoretically grounded alternative to empirical hyperparameter choice.

Core claim

We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023).

What carries the argument

Wasserstein bounds for the sampling error incurred by annealed Langevin dynamics when the score is only approximate; the bounds justify treating the composite score as the exact score of a sequence of bridging densities and yield controllable bias.

If this is right

  • Hyperparameters can be selected to ensure a target sampling accuracy rather than chosen empirically.
  • The two composite score formulations differ in the step sizes they allow and the total number of Langevin steps required.
  • Tuning rules obtained from the Gaussian case provide a reliable starting point that generalizes to non-Gaussian problems.
  • Annealed Langevin dynamics can achieve controllable bias where reverse SDE sampling produces irreducible bias.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The analysis could be extended to derive bounds for other approximate sampling schemes in score-based models.
  • Improved composite score methods might be designed to maximize the allowable step size under these bounds.
  • In practice, the bounds could be used to adaptively choose the number of annealing levels based on the observed score quality.

Load-bearing premise

That the composite score from the chosen formulation can be viewed as the genuine score of tractable bridging densities, making the sampling bias controllable.

What would settle it

Measuring the actual Wasserstein distance achieved when applying the proposed step-size and step-count rules in a Gaussian model and verifying whether it stays within the prescribed accuracy bound.

Figures

Figures reproduced from arXiv: 2605.21253 by Camille Touron, Gabriel V. Cardoso, Julyan Arbel, Pedro L. C. Rodrigues.

Figure 1
Figure 1. Figure 1: The Annealed Langevin algorithm consists in T ULA steps in succession. It allows to sample from bridging densities {πtp } T −1 p=0 , represented in blue: the first ULA yields samples (in red) from πtT−1 , the second from πtT−2 , etc... and the last one from the target density πt0 . Green dashed arrows stand for the T ULA: each of them has to define its own constant step size htp and number of steps ktp (si… view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of the number of Langevin steps in the 5-dimensional Gaussian setting for different numbers n of conditional observations. Blue (resp. red) curves represent the mean number of steps k G t (resp. k L t ) at each annealing level when using Geffner’s (resp. Linhart’s) compositional score. We choose T = 10 annealing levels uniformly spaced over [0, 1]. The mean (and std) is computed over 5 different … view at source ↗
Figure 3
Figure 3. Figure 3: Evolution of the mean total number of Langevin steps PT p=0 ktp for different numbers of conditional observations n. These choices achieve a prescribed low final Wasserstein error for both compositional score formulations (see Appendix B.4). Blue curves stand for Geffner’s complexity while red curves represent that of Linhart. Mean and std are computed over 5 different seeds. Lemma 3 Let sϕ (resp. sλ) be a… view at source ↗
Figure 4
Figure 4. Figure 4: Mean final Wasserstein errors obtained in the Gaussian setting with annealed Langevin dynamics and prescribed hyperparameters from Section 3. Blue (resp. red) curves stand for Geffner (resp. Linhart) compositional score used in the sampling algorithm. We use T = 10 annealing levels, γ = 0.5 and ω = 0.5 (resp. 0.8) for d = 2, 5 (resp. d = 10). The empirical Wasserstein error indeed stays below the fixed thr… view at source ↗
Figure 5
Figure 5. Figure 5: Evolution of k G t and k L t over time for the Gaussian case. The switch point t˜(where blue and red curves cross) is unique and often appears from the second annealing level near t = 0, whatever the dimension of the space d or the number of conditional observations n. Mean and std are computed over 5 seeds, each model differs in its likelihood covariance matrix. The following figure shows one example in d… view at source ↗
Figure 6
Figure 6. Figure 6: Evolution of k G t and k L t over time: Example of switch point appearing later when d = 2. The colored figures on the right of each subplot correspond to the total number of steps PT p=0 ktp . In this specific case, blue and red curves cross later than the second annealing level. However, the complexity (in terms of total number of steps) remains lower when choosing Linhart’s composite score than that of … view at source ↗
Figure 7
Figure 7. Figure 7: The top two subplots show the mean final Wasserstein error (left) and total number of steps (right) for a strong non Gaussian GMM prior model (scales σ1 and σ2 are very small). The bottom two subplots present the same results (mean Wasserstein error on the left and total number of steps on the right) for a less non Gaussian model (with greater prior scales). We use T = 10 annealing levels, γ = 0.5 and ω = … view at source ↗
Figure 8
Figure 8. Figure 8: shows that hyperparameters chosen according to our decision rule indeed allow one to control the Wasserstein error of annealed Langevin algorithm for the different inference tasks. 0 10 20 30 40 50 n 0.1 0.0 0.1 0.2 0.3 0.4 0.5 2( ( (k0) 0 ), 0) GMM prior, d = 2 0 10 20 30 40 50 n GMM likelihood, d = 10 0 10 20 30 40 50 n SIR, d = 2 0 10 20 30 40 50 n Lotka Volterra, d = 4 Geffner Linhart [PITH_FULL_IMAGE… view at source ↗
read the original abstract

Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper derives Wasserstein bounds for annealed Langevin dynamics applied to approximate composite scores arising in compositional score-based simulation-based inference. These bounds are translated into explicit rules for choosing step sizes, steps per level, and number of annealing levels to achieve a target sampling accuracy. Closed-form expressions are obtained in the Gaussian setting, where it is proven that the bridging densities of Linhart et al. (2026) admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Empirical experiments indicate that the Gaussian-derived tuning generalizes to more complex problems.

Significance. If the central assumptions hold, the work supplies a theoretically grounded alternative to empirical hyperparameter selection for annealed Langevin in compositional SBI and clarifies theoretical distinctions between the two composite-score formulations. The closed-form Gaussian analysis and the explicit proof of relative advantage constitute clear strengths, as does the provision of concrete decision rules that guarantee a prescribed accuracy under the stated conditions.

major comments (1)
  1. [§2 and introduction] The Wasserstein bounds and the claim of controllable bias rest on the premise that each composite score equals the exact score of a sequence of tractable bridging densities (see the contrast with reverse-SDE bias in the introduction and the setup in §2). The manuscript does not establish that the composite scores are conservative (curl-free) or explicitly construct the corresponding densities at each annealing level. If the composite score has non-zero curl, no such density exists and the discretization/approximation error bounds cannot guarantee controllable bias, undermining both the hyperparameter rules and the Gaussian comparison.
minor comments (2)
  1. [§3] Notation for the composite scores and the annealing schedule could be made more uniform across sections to ease comparison between the Geffner and Linhart formulations.
  2. [§5] The empirical section would benefit from reporting the precise Wasserstein or other distance metrics used to verify generalization beyond the Gaussian case.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying this foundational point about the existence of bridging densities. We respond to the major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [§2 and introduction] The Wasserstein bounds and the claim of controllable bias rest on the premise that each composite score equals the exact score of a sequence of tractable bridging densities (see the contrast with reverse-SDE bias in the introduction and the setup in §2). The manuscript does not establish that the composite scores are conservative (curl-free) or explicitly construct the corresponding densities at each annealing level. If the composite score has non-zero curl, no such density exists and the discretization/approximation error bounds cannot guarantee controllable bias, undermining both the hyperparameter rules and the Gaussian comparison.

    Authors: We appreciate the referee highlighting this assumption. The manuscript positions annealed Langevin dynamics as an alternative that treats the composite score as the score of tractable bridging densities, thereby allowing controllable bias via the derived Wasserstein bounds (in contrast to the irreducible bias of the reverse SDE). We acknowledge that a general proof that arbitrary composite scores are curl-free is absent. In the Gaussian setting, however, the closed-form bridging densities are explicitly constructed as Gaussians whose scores coincide with the composites; these are necessarily conservative, and the explicit step-size and step-count comparisons follow directly from this construction. For the general case the composites are approximations to the joint score, and our bounds already account for score approximation error. We will revise §2 to state the assumption explicitly, supply the Gaussian density construction in full, and add a short discussion of the consequences should the composite exhibit non-zero curl (additional bias outside the current bounds). These changes will clarify the scope of the hyperparameter rules and the Gaussian comparison without altering the technical results. revision: yes

Circularity Check

0 steps flagged

No significant circularity: Wasserstein bounds derived from standard properties applied to explicit composite scores.

full rationale

The paper's core derivation applies known Wasserstein convergence results for Langevin dynamics to the composite scores treated as exact scores of bridging densities. Closed-form Gaussian expressions follow directly from the model assumptions without fitting or redefinition. Hyperparameter rules and comparisons between Geffner and Linhart formulations are obtained by explicit calculation of the resulting bounds. No load-bearing step reduces to a self-citation, fitted input renamed as prediction, or self-definitional equivalence; the assumption that composite scores define tractable bridging densities is stated explicitly as a modeling choice rather than derived from the outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard mathematical properties of Wasserstein distances and Langevin dynamics plus the domain assumption that composite scores define valid bridging densities.

axioms (2)
  • standard math Standard convergence properties of Langevin dynamics and Wasserstein distance bounds hold for approximate scores
    Invoked to translate error bounds into hyperparameter rules that guarantee prescribed accuracy.
  • domain assumption Composite scores from the two cited formulations can serve as scores of tractable bridging densities
    Allows annealed Langevin to replace reverse SDE sampling and produce controllable bias.

pith-pipeline@v0.9.0 · 5827 in / 1563 out tokens · 58515 ms · 2026-05-21T03:56:33.976292+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 2 internal anchors

  1. [1]

    Arruda, J., Pandey, V ., Sherry, C., Barroso, M., Intes, X., Hasenauer, J., and Radev, S. T. (2025). Composi- tional amortized inference for large-scale hierarchical Bayesian models.arXiv preprint arXiv:2505.14429

  2. [2]

    Bhatia, R., Jain, T., and Lim, Y . (2017). On the Bures-Wasserstein distance between positive definite matrices.Expositiones Mathematicae, 37

  3. [3]

    Bishop, C. M. (2006).Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg

  4. [4]

    Bradbury, J., Frostig, R., Hawkins, P., et al. (2018). Jax: composable transformations of python+numpy programs

  5. [5]

    Brosse, N., Durmus, A., Éric Moulines, and Sabanis, S. (2019). The tamed unadjusted Langevin algorithm. Stochastic Processes and their Applications, 129(10):3638–3663

  6. [6]

    Cranmer, K., Brehmer, J., and Louppe, G. (2020). The frontier of simulation-based inference.Proceedings of the National Academy of Sciences, 117(48):30055–30062

  7. [7]

    Dalalyan, A. S. and Karagulyan, A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311

  8. [8]

    arXiv preprint arXiv:2508.12939 , year=

    Deistler, M., Boelts, J., Steinbach, P., Moss, G., Moreau, T., Gloeckler, M., Rodrigues, P. L., Linhart, J., Lappalainen, J. K., Miller, B. K., et al. (2025). Simulation-based inference: A practical guide.arXiv preprint arXiv:2508.12939

  9. [9]

    B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W

    Du, Y ., Durkan, C., Strudel, R., Tenenbaum, J. B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W. (2023). Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC. InInternational Conference on Machine Learning

  10. [10]

    and Moulines, E

    Durmus, A. and Moulines, E. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.The Annals of Applied Probability, 27(3):1551 – 1587

  11. [11]

    Flamary, R., Courty, N., Gramfort, A., et al. (2021). Python optimal transport (pot): A python library for optimal transport analysis.Journal of Machine Learning Research, 22(78):1–8

  12. [12]

    Gao, Y ., Huang, J., and Jiao, Y . (2024). Gaussian interpolation flows.Journal of Machine Learning Research, 25(253):1–52

  13. [13]

    Geffner, T., Papamakarios, G., and Mnih, A. (2023). Compositional score modeling for simulation-based inference. InInternational Conference on Machine Learning

  14. [14]

    Gloeckler, M., Toyota, S., Fukumizu, K., and Macke, J. H. (2025). Compositional simulation-based inference for time series. InInternational Conference on Learning Representations

  15. [15]

    Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems

  16. [16]

    V ., Gramfort, A., Corff, S

    Linhart, J., Cardoso, G. V ., Gramfort, A., Corff, S. L., and Rodrigues, P. L. (2026). Diffusion posterior sampling for simulation-based inference in tall data settings.TMLR

  17. [17]

    Lueckmann, J.-M., Boelts, J., Greenberg, D., Goncalves, P., and Macke, J. (2021). Benchmarking simulation-based inference. In Banerjee, A. and Fukumizu, K., editors,Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 ofProceedings of Machine Learning Research, pages 343–351. PMLR

  18. [18]

    Phan, D., Pradhan, N., and Jankowiak, M. (2020). Composable effects for flexible and accelerated probabilistic programming in numpyro.arXiv preprint arXiv:1912.11554

  19. [19]

    T., Mertens, U

    Radev, S. T., Mertens, U. K., V oss, A., Ardizzone, L., and Köthe, U. (2020). Bayesflow: Learning complex stochastic models with invertible neural networks.IEEE transactions on neural networks and learning systems, 33(4):1452–1466

  20. [20]

    Rodrigues, P., Moreau, T., Louppe, G., and Gramfort, A. (2021). HNPE: leveraging global parameters for neural posterior estimation.Advances in Neural Information Processing Systems

  21. [21]

    and Wellner, J

    Saumard, A. and Wellner, J. A. (2014). Log-concavity and strong log-concavity: a review.Statistics surveys, 8:45–114. 11

  22. [22]

    Sharrock, L., Simons, J., Liu, S., and Beaumont, M. (2024). Sequential neural score estimation: Likelihood- free inference with conditional score based diffusion models. InInternational Conference on Machine Learning

  23. [23]

    Silveri, M. G. and Ocello, A. (2025). Beyond log-concavity and score regularity: Improved convergence bounds for score-based generative models in W2-distance. InInternational Conference on Machine Learning

  24. [24]

    and Ermon, S

    Song, Y . and Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. In 33rd Conference on Neural Information Processing Systems (NeurIPS)

  25. [25]

    and Ermon, S

    Song, Y . and Ermon, S. (2020). Improved techniques for training score-based generative models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA. Curran Associates Inc

  26. [26]

    P., Kumar, A., Ermon, S., and Poole, B

    Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. (2021). Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations

  27. [27]

    Touron, C., Victorino Cardoso, G., Arbel, J., and Coelho Rodrigues, P. L. (2025). Error analysis of a compositional score-based algorithm for simulation-based inference. InWorkshop on Principles of Generative Modeling at EurIPS 2025, Copenhagen, Denmark

  28. [28]

    Vaart, A. W. v. d. (1998).Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press

  29. [29]

    and Wibisono, A

    Vempala, S. and Wibisono, A. (2019). Rapid convergence of the unadjusted Langevin algorithm: Isoperime- try suffices. InAdvances in Neural Information Processing Systems. 12 Contents 1 Introduction 1 2 Wasserstein error bounds for annealed Langevin dynamics 3 3 Guidelines for fine-tuning annealed Langevin hyperparameters 5 4 Theoretical comparison of guid...

  30. [30]

    with their coefficients as per βt =r t ≥0 and αt = p 1−r 2 t ∈[0,1] . As f is assumed to be L-Lipschitz, then we get: (−Rt −L 2r2 t )Id ⪯ ∇2 logQ tf(θ t)⪯R tId whereR t := 5Lr3 t L+ 1q log( 1 rt ) ! Then we note that: −∇2 θ logp t(θt) =−∇ 2 θ logϕ(θ t)− ∇ 2 θ logQ tf(θ t) =I d − ∇2 θ logQ tf(θ t) 20 Using the previous bounds, we get : (1−R t)Id ⪯ −∇2 θ lo...