Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference
Pith reviewed 2026-05-21 03:56 UTC · model grok-4.3
The pith
Wasserstein bounds provide explicit rules to tune annealed Langevin for accurate sampling in compositional SBI
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023).
What carries the argument
Wasserstein bounds for the sampling error incurred by annealed Langevin dynamics when the score is only approximate; the bounds justify treating the composite score as the exact score of a sequence of bridging densities and yield controllable bias.
If this is right
- Hyperparameters can be selected to ensure a target sampling accuracy rather than chosen empirically.
- The two composite score formulations differ in the step sizes they allow and the total number of Langevin steps required.
- Tuning rules obtained from the Gaussian case provide a reliable starting point that generalizes to non-Gaussian problems.
- Annealed Langevin dynamics can achieve controllable bias where reverse SDE sampling produces irreducible bias.
Where Pith is reading between the lines
- The analysis could be extended to derive bounds for other approximate sampling schemes in score-based models.
- Improved composite score methods might be designed to maximize the allowable step size under these bounds.
- In practice, the bounds could be used to adaptively choose the number of annealing levels based on the observed score quality.
Load-bearing premise
That the composite score from the chosen formulation can be viewed as the genuine score of tractable bridging densities, making the sampling bias controllable.
What would settle it
Measuring the actual Wasserstein distance achieved when applying the proposed step-size and step-count rules in a Gaussian model and verifying whether it stays within the prescribed accuracy bound.
Figures
read the original abstract
Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives Wasserstein bounds for annealed Langevin dynamics applied to approximate composite scores arising in compositional score-based simulation-based inference. These bounds are translated into explicit rules for choosing step sizes, steps per level, and number of annealing levels to achieve a target sampling accuracy. Closed-form expressions are obtained in the Gaussian setting, where it is proven that the bridging densities of Linhart et al. (2026) admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Empirical experiments indicate that the Gaussian-derived tuning generalizes to more complex problems.
Significance. If the central assumptions hold, the work supplies a theoretically grounded alternative to empirical hyperparameter selection for annealed Langevin in compositional SBI and clarifies theoretical distinctions between the two composite-score formulations. The closed-form Gaussian analysis and the explicit proof of relative advantage constitute clear strengths, as does the provision of concrete decision rules that guarantee a prescribed accuracy under the stated conditions.
major comments (1)
- [§2 and introduction] The Wasserstein bounds and the claim of controllable bias rest on the premise that each composite score equals the exact score of a sequence of tractable bridging densities (see the contrast with reverse-SDE bias in the introduction and the setup in §2). The manuscript does not establish that the composite scores are conservative (curl-free) or explicitly construct the corresponding densities at each annealing level. If the composite score has non-zero curl, no such density exists and the discretization/approximation error bounds cannot guarantee controllable bias, undermining both the hyperparameter rules and the Gaussian comparison.
minor comments (2)
- [§3] Notation for the composite scores and the annealing schedule could be made more uniform across sections to ease comparison between the Geffner and Linhart formulations.
- [§5] The empirical section would benefit from reporting the precise Wasserstein or other distance metrics used to verify generalization beyond the Gaussian case.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying this foundational point about the existence of bridging densities. We respond to the major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: [§2 and introduction] The Wasserstein bounds and the claim of controllable bias rest on the premise that each composite score equals the exact score of a sequence of tractable bridging densities (see the contrast with reverse-SDE bias in the introduction and the setup in §2). The manuscript does not establish that the composite scores are conservative (curl-free) or explicitly construct the corresponding densities at each annealing level. If the composite score has non-zero curl, no such density exists and the discretization/approximation error bounds cannot guarantee controllable bias, undermining both the hyperparameter rules and the Gaussian comparison.
Authors: We appreciate the referee highlighting this assumption. The manuscript positions annealed Langevin dynamics as an alternative that treats the composite score as the score of tractable bridging densities, thereby allowing controllable bias via the derived Wasserstein bounds (in contrast to the irreducible bias of the reverse SDE). We acknowledge that a general proof that arbitrary composite scores are curl-free is absent. In the Gaussian setting, however, the closed-form bridging densities are explicitly constructed as Gaussians whose scores coincide with the composites; these are necessarily conservative, and the explicit step-size and step-count comparisons follow directly from this construction. For the general case the composites are approximations to the joint score, and our bounds already account for score approximation error. We will revise §2 to state the assumption explicitly, supply the Gaussian density construction in full, and add a short discussion of the consequences should the composite exhibit non-zero curl (additional bias outside the current bounds). These changes will clarify the scope of the hyperparameter rules and the Gaussian comparison without altering the technical results. revision: yes
Circularity Check
No significant circularity: Wasserstein bounds derived from standard properties applied to explicit composite scores.
full rationale
The paper's core derivation applies known Wasserstein convergence results for Langevin dynamics to the composite scores treated as exact scores of bridging densities. Closed-form Gaussian expressions follow directly from the model assumptions without fitting or redefinition. Hyperparameter rules and comparisons between Geffner and Linhart formulations are obtained by explicit calculation of the resulting bounds. No load-bearing step reduces to a self-citation, fitted input renamed as prediction, or self-definitional equivalence; the assumption that composite scores define tractable bridging densities is stated explicitly as a modeling choice rather than derived from the outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard convergence properties of Langevin dynamics and Wasserstein distance bounds hold for approximate scores
- domain assumption Composite scores from the two cited formulations can serve as scores of tractable bridging densities
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters... In the Gaussian setting, we obtain closed-form expressions for all relevant quantities
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 2 (Global Wasserstein error bound)... W2(L(θ(k0)_t0), π_t0) ≤ sum ...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Arruda, J., Pandey, V ., Sherry, C., Barroso, M., Intes, X., Hasenauer, J., and Radev, S. T. (2025). Composi- tional amortized inference for large-scale hierarchical Bayesian models.arXiv preprint arXiv:2505.14429
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Bhatia, R., Jain, T., and Lim, Y . (2017). On the Bures-Wasserstein distance between positive definite matrices.Expositiones Mathematicae, 37
work page 2017
-
[3]
Bishop, C. M. (2006).Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin, Heidelberg
work page 2006
-
[4]
Bradbury, J., Frostig, R., Hawkins, P., et al. (2018). Jax: composable transformations of python+numpy programs
work page 2018
-
[5]
Brosse, N., Durmus, A., Éric Moulines, and Sabanis, S. (2019). The tamed unadjusted Langevin algorithm. Stochastic Processes and their Applications, 129(10):3638–3663
work page 2019
-
[6]
Cranmer, K., Brehmer, J., and Louppe, G. (2020). The frontier of simulation-based inference.Proceedings of the National Academy of Sciences, 117(48):30055–30062
work page 2020
-
[7]
Dalalyan, A. S. and Karagulyan, A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311
work page 2019
-
[8]
arXiv preprint arXiv:2508.12939 , year=
Deistler, M., Boelts, J., Steinbach, P., Moss, G., Moreau, T., Gloeckler, M., Rodrigues, P. L., Linhart, J., Lappalainen, J. K., Miller, B. K., et al. (2025). Simulation-based inference: A practical guide.arXiv preprint arXiv:2508.12939
-
[9]
B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W
Du, Y ., Durkan, C., Strudel, R., Tenenbaum, J. B., Dieleman, S., Fergus, R., Sohl-Dickstein, J., Doucet, A., and Grathwohl, W. (2023). Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and MCMC. InInternational Conference on Machine Learning
work page 2023
-
[10]
Durmus, A. and Moulines, E. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.The Annals of Applied Probability, 27(3):1551 – 1587
work page 2017
-
[11]
Flamary, R., Courty, N., Gramfort, A., et al. (2021). Python optimal transport (pot): A python library for optimal transport analysis.Journal of Machine Learning Research, 22(78):1–8
work page 2021
-
[12]
Gao, Y ., Huang, J., and Jiao, Y . (2024). Gaussian interpolation flows.Journal of Machine Learning Research, 25(253):1–52
work page 2024
-
[13]
Geffner, T., Papamakarios, G., and Mnih, A. (2023). Compositional score modeling for simulation-based inference. InInternational Conference on Machine Learning
work page 2023
-
[14]
Gloeckler, M., Toyota, S., Fukumizu, K., and Macke, J. H. (2025). Compositional simulation-based inference for time series. InInternational Conference on Learning Representations
work page 2025
-
[15]
Ho, J., Jain, A., and Abbeel, P. (2020). Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems
work page 2020
-
[16]
Linhart, J., Cardoso, G. V ., Gramfort, A., Corff, S. L., and Rodrigues, P. L. (2026). Diffusion posterior sampling for simulation-based inference in tall data settings.TMLR
work page 2026
-
[17]
Lueckmann, J.-M., Boelts, J., Greenberg, D., Goncalves, P., and Macke, J. (2021). Benchmarking simulation-based inference. In Banerjee, A. and Fukumizu, K., editors,Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 ofProceedings of Machine Learning Research, pages 343–351. PMLR
work page 2021
-
[18]
Phan, D., Pradhan, N., and Jankowiak, M. (2020). Composable effects for flexible and accelerated probabilistic programming in numpyro.arXiv preprint arXiv:1912.11554
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[19]
Radev, S. T., Mertens, U. K., V oss, A., Ardizzone, L., and Köthe, U. (2020). Bayesflow: Learning complex stochastic models with invertible neural networks.IEEE transactions on neural networks and learning systems, 33(4):1452–1466
work page 2020
-
[20]
Rodrigues, P., Moreau, T., Louppe, G., and Gramfort, A. (2021). HNPE: leveraging global parameters for neural posterior estimation.Advances in Neural Information Processing Systems
work page 2021
-
[21]
Saumard, A. and Wellner, J. A. (2014). Log-concavity and strong log-concavity: a review.Statistics surveys, 8:45–114. 11
work page 2014
-
[22]
Sharrock, L., Simons, J., Liu, S., and Beaumont, M. (2024). Sequential neural score estimation: Likelihood- free inference with conditional score based diffusion models. InInternational Conference on Machine Learning
work page 2024
-
[23]
Silveri, M. G. and Ocello, A. (2025). Beyond log-concavity and score regularity: Improved convergence bounds for score-based generative models in W2-distance. InInternational Conference on Machine Learning
work page 2025
-
[24]
Song, Y . and Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. In 33rd Conference on Neural Information Processing Systems (NeurIPS)
work page 2019
-
[25]
Song, Y . and Ermon, S. (2020). Improved techniques for training score-based generative models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA. Curran Associates Inc
work page 2020
-
[26]
P., Kumar, A., Ermon, S., and Poole, B
Song, Y ., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., and Poole, B. (2021). Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations
work page 2021
-
[27]
Touron, C., Victorino Cardoso, G., Arbel, J., and Coelho Rodrigues, P. L. (2025). Error analysis of a compositional score-based algorithm for simulation-based inference. InWorkshop on Principles of Generative Modeling at EurIPS 2025, Copenhagen, Denmark
work page 2025
-
[28]
Vaart, A. W. v. d. (1998).Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press
work page 1998
-
[29]
Vempala, S. and Wibisono, A. (2019). Rapid convergence of the unadjusted Langevin algorithm: Isoperime- try suffices. InAdvances in Neural Information Processing Systems. 12 Contents 1 Introduction 1 2 Wasserstein error bounds for annealed Langevin dynamics 3 3 Guidelines for fine-tuning annealed Langevin hyperparameters 5 4 Theoretical comparison of guid...
work page 2019
-
[30]
with their coefficients as per βt =r t ≥0 and αt = p 1−r 2 t ∈[0,1] . As f is assumed to be L-Lipschitz, then we get: (−Rt −L 2r2 t )Id ⪯ ∇2 logQ tf(θ t)⪯R tId whereR t := 5Lr3 t L+ 1q log( 1 rt ) ! Then we note that: −∇2 θ logp t(θt) =−∇ 2 θ logϕ(θ t)− ∇ 2 θ logQ tf(θ t) =I d − ∇2 θ logQ tf(θ t) 20 Using the previous bounds, we get : (1−R t)Id ⪯ −∇2 θ lo...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.