pith. sign in

arxiv: 2502.04575 · v3 · pith:DPPVBZTNnew · submitted 2025-02-07 · 📊 stat.ML · cs.LG· cs.NA· math.NA· physics.comp-ph· stat.CO

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

Pith reviewed 2026-05-23 03:54 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NAphysics.comp-phstat.CO
keywords normalizing constant estimationannealed importance samplingJarzynski equalityoracle complexityGirsanov theoremoptimal transportreverse diffusion samplermultimodal distributions
0
0 comments X

The pith

Annealed importance sampling estimates the normalizing constant Z to relative error ε with Õ(d β² A² / ε⁴) oracle complexity under finite action assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first non-asymptotic complexity bound for annealed importance sampling when estimating the normalizing constant Z of an unnormalized density π ∝ e^{-V}. It shows that Õ(d β² A² / ε⁴) oracle queries suffice to achieve ε-relative error with high probability, where β is the smoothness parameter of V and A is the action of an interpolating curve of measures. A sympathetic reader would care because these annealing methods are standard tools for high-dimensional and multimodal problems in statistics and machine learning, yet had lacked rigorous quantitative guarantees. The analysis applies Girsanov's theorem and optimal transport to control variance without isoperimetric assumptions on the target.

Core claim

We derive an oracle complexity of Õ(d β² A² / ε⁴) for estimating Z within ε relative error with high probability using annealed importance sampling. This holds when there exists a curve of interpolating measures with finite action A between the target and a tractable reference. The analysis leverages Girsanov's theorem and optimal transport and does not require isoperimetric assumptions on the target distribution. To handle the large action of standard geometric interpolation, we introduce a reverse diffusion sampler algorithm, establish its complexity framework, and show empirically that it handles multimodality efficiently.

What carries the argument

the action A of a curve of probability measures interpolating between the target and reference distribution, which enters the complexity bound by controlling the variance of the estimator through Girsanov's theorem

Load-bearing premise

A curve of interpolating measures with finite action A exists between the target distribution and a tractable reference, allowing Girsanov's theorem to be applied to the underlying stochastic processes.

What would settle it

A controlled experiment on an isotropic Gaussian target with explicitly computable exact action A and exact sample requirements, checking whether the observed oracle calls scale as Õ(d β² A² / ε⁴) when d, β, A, and ε are varied.

Figures

Figures reproduced from arXiv: 2502.04575 by Molei Tao, Wei Guo, Yongxin Chen.

Figure 1
Figure 1. Figure 1: Illustration of the proof idea for Thm. 4. We present a high-level proof sketch using [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the samples from the modified Müller Brown distribution. The generated samples are [PITH_FULL_IMAGE:figures/full_fig_p044_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of the samples from the Gaussian mixture distribution. The generated samples are displayed on [PITH_FULL_IMAGE:figures/full_fig_p044_3.png] view at source ↗
read the original abstract

Given an unnormalized probability density $\pi\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when $\pi$ is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of $\widetilde{O}\left(\frac{d\beta^2{\mathcal{A}}^2}{\varepsilon^4}\right)$ for estimating $Z$ within $\varepsilon$ relative error with high probability, where $\beta$ is the smoothness of $V$ and $\mathcal{A}$ denotes the action of a curve of probability measures interpolating $\pi$ and a tractable reference distribution. Our analysis, leveraging Girsanov's theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to provide the first non-asymptotic oracle complexity analysis of annealed importance sampling (AIS) for estimating the normalizing constant Z of an unnormalized density π ∝ e^{-V(x)}. It derives a bound of Õ(d β² A² / ε⁴) for achieving relative error ε with high probability, where β measures smoothness of V and A is the action of an interpolating curve of measures connecting π to a tractable reference; the analysis applies Girsanov's theorem together with optimal transport and avoids explicit isoperimetric assumptions on the target. The paper also introduces a reverse-diffusion-sampler algorithm to mitigate large action under geometric interpolation and reports empirical gains on multimodal targets.

Significance. If the central bound is valid, the result would be significant: it supplies the first quantitative non-asymptotic guarantee for a family of methods (Jarzynski equality, AIS) that are standard in Bayesian statistics, statistical mechanics, and machine learning yet previously lacked complexity statements. The combination of Girsanov and optimal transport to remove isoperimetric hypotheses is technically interesting, and the reverse-diffusion proposal directly addresses a practical bottleneck of geometric paths. The framework is reusable for other annealing schedules.

major comments (2)
  1. [Abstract, §1] Abstract and analysis paragraph (§1): the claimed Õ(d β² A² / ε⁴) bound is obtained by using Girsanov to produce an unbiased estimator from the Radon-Nikodym derivative between forward and reverse processes along the interpolating curve. Girsanov yields a true martingale (hence unbiasedness) only when the exponential local martingale satisfies Novikov's condition E[exp(½ ∫ |u_t|² dt)] < ∞. Finite action A (presumably ∫ E[|u_t|²] dt < ∞) controls the L² norm but does not automatically imply the required exponential moment. The manuscript asserts that no isoperimetric assumptions on the target are needed, yet provides no explicit verification or supplementary integrability condition that would guarantee Novikov for arbitrary curves with only finite A. This step is load-bearing for the unbiasedness claim and therefore for the complexity bound.
  2. [Abstract, §1] Abstract and §1: the stated oracle complexity is expressed in terms of the external quantity A (action of the chosen interpolating curve). The manuscript does not indicate whether a curve with A independent of the target accuracy ε can always be selected, or whether constructing such a curve (and therefore controlling A) may itself depend on ε in a manner that alters the overall complexity. Without this clarification the bound cannot be read as a fully non-asymptotic guarantee in the usual sense.
minor comments (2)
  1. [Abstract] The abstract states that the analysis 'does not explicitly require isoperimetric assumptions,' but the precise regularity conditions placed on the interpolating curve (e.g., moment bounds on the Girsanov kernel) should be stated explicitly in the theorem statement for clarity.
  2. Notation: β is used for the smoothness parameter of V; a brief reminder of its precise definition (e.g., Lipschitz constant of ∇V) would help readers who are not already familiar with the paper's conventions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting these two technical points on the application of Girsanov's theorem and the interpretation of the oracle complexity. Both comments are addressed point-by-point below. We believe the core claims remain valid once the requested clarifications are supplied.

read point-by-point responses
  1. Referee: Girsanov yields a true martingale (hence unbiasedness) only when the exponential local martingale satisfies Novikov's condition E[exp(½ ∫ |u_t|² dt)] < ∞. Finite action A controls the L² norm but does not automatically imply the required exponential moment. No explicit verification or supplementary integrability condition is provided for arbitrary curves with only finite A.

    Authors: We agree that Novikov's condition is required for the exponential martingale to be a true martingale. Our analysis implicitly assumes the SDEs admit a well-defined Girsanov change of measure; finite A guarantees that the quadratic variation process is integrable in L², but does not by itself guarantee the exponential integrability. Under the standing Lipschitz and linear-growth assumptions already placed on the drift and diffusion coefficients (standard for the Langevin and reverse-diffusion processes considered), standard results in stochastic analysis (e.g., Theorem 5.1 in Karatzas & Shreve or Proposition 3.1 in Øksendal) ensure that Novikov holds locally and can be extended globally on compact time intervals. We will add a short paragraph after the statement of Girsanov's theorem (new Section 2.3) that explicitly invokes these conditions and notes that they are satisfied by the geometric and reverse-diffusion schedules analyzed later. This is a clarification rather than a change to the complexity bound itself. revision: partial

  2. Referee: The stated oracle complexity is expressed in terms of the external quantity A. The manuscript does not indicate whether a curve with A independent of the target accuracy ε can always be selected, or whether constructing such a curve may itself depend on ε in a manner that alters the overall complexity.

    Authors: A is the action of a fixed interpolating curve of measures chosen independently of the accuracy parameter ε; it is a property of the annealing schedule (geometric, arithmetic, or reverse-diffusion) and of the pair (π, reference). For any fixed schedule the value of A is therefore independent of ε, and the Õ(d β² A² / ε⁴) bound is fully non-asymptotic once the schedule is selected. When a practitioner chooses a schedule whose A grows with dimension or with the separation of modes, that growth appears explicitly in the complexity; the reverse-diffusion construction is introduced precisely to produce schedules whose A remains moderate. We will insert one clarifying sentence at the end of the first paragraph of Section 1 and a footnote in the complexity theorem stating that A is schedule-dependent but ε-independent. revision: yes

Circularity Check

0 steps flagged

No circularity: complexity bound parameterized by external action A using standard theorems

full rationale

The claimed oracle complexity Õ(d β² A² / ε⁴) is expressed directly in terms of the external quantity A (action of an interpolating curve of measures), with the derivation invoking Girsanov's theorem and optimal transport as independent mathematical tools. No step reduces a prediction to a fitted input, renames a known result, or relies on a load-bearing self-citation chain. The bound is not self-definitional; A is an input to the analysis rather than derived from the target result. The absence of isoperimetric assumptions is presented as a feature of the Girsanov+OT approach, with no evidence that the central claim collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The complexity result rests on the existence of an interpolating curve with finite action A and on the applicability of Girsanov's theorem; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • standard math Girsanov's theorem applies to the stochastic processes along the annealing path
    Invoked to control the Radon-Nikodym derivative between measures in the analysis paragraph of the abstract.
  • domain assumption An interpolating curve of probability measures with finite action A exists between the target and reference
    Central to the complexity expression; stated without further justification in the abstract.

pith-pipeline@v0.9.0 · 5812 in / 1423 out tokens · 65771 ms · 2026-05-23T03:54:32.282382+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sample-efficient evidence estimation of score based priors for model selection

    cs.LG 2026-02 unverdicted novelty 7.0

    DiME estimates model evidence for diffusion priors by integrating time-marginals from posterior sampling, enabling efficient prior selection and misfit diagnosis in ill-posed inverse problems.

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    M. S. Albergo and E. Vanden-Eijnden. NETS : A non-equilibrium transport sampler. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=QqGw9StPbQ

  2. [2]

    urich. Birkh\

    L. Ambrosio, N. Gigli, and G. Savar\'e. Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Z\"urich. Birkh\"auser Basel, 2 edition, 2008. doi:10.1007/978-3-7643-8722-8

  3. [3]

    Ambrosio, E

    L. Ambrosio, E. Bru\'e, and D. Semola. Lectures on optimal transport, volume 130 of UNITEXT. Springer Cham, 2021. doi:10.1007/978-3-030-72162-6. URL https://link.springer.com/book/10.1007/978-3-030-72162-6

  4. [4]

    B. D. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12 0 (3): 0 313--326, 1982. ISSN 0304-4149. doi:10.1016/0304-4149(82)90051-5. URL https://www.sciencedirect.com/science/article/pii/0304414982900515

  5. [5]

    Sampling normalizing constants in high dimensions using inhomogeneous diffusions

    C. Andrieu, J. Ridgway, and N. Whiteley. Sampling normalizing constants in high dimensions using inhomogeneous diffusions. arXiv preprint arXiv:1612.07583, 2016

  6. [6]

    Arrar, F

    M. Arrar, F. M. Boubeta, M. E. Szretter, M. Sued, L. Boechi, and D. Rodriguez. On the accurate estimation of free energies using the Jarzynski equality. Journal of Computational Chemistry, 40 0 (4): 0 688--696, 2019. doi:10.1002/jcc.25754. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.25754

  7. [7]

    Aurell, C

    E. Aurell, C. Mej\' a-Monasterio, and P. Muratore-Ginanneschi. Optimal protocols and optimal transport in stochastic thermodynamics. Phys. Rev. Lett., 106: 0 250601, Jun 2011. doi:10.1103/PhysRevLett.106.250601. URL https://link.aps.org/doi/10.1103/PhysRevLett.106.250601

  8. [8]

    Aurell, K

    E. Aurell, K. Gaw e dzki, C. Mej\' a-Monasterio, R. Mohayaee, and P. Muratore-Ginanneschi. Refined second law of thermodynamics for fast random processes. Journal of statistical physics, 147: 0 487--505, 2012. doi:10.1007/s10955-012-0478-x

  9. [9]

    Bakry, I

    D. Bakry, I. Gentil, and M. Ledoux. Analysis and geometry of Markov diffusion operators , volume 103 of Grundlehren der mathematischen Wissenschaften. Springer Cham, 1 edition, 2014. doi:10.1007/978-3-319-00227-9

  10. [10]

    Balasubramanian, S

    K. Balasubramanian, S. Chewi, M. A. Erdogdu, A. Salim, and S. Zhang. Towards a theory of non-log-concave sampling: First-order stationarity guarantees for Langevin Monte Carlo . In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 2896--2923. PMLR, 0...

  11. [11]

    Blessing, J

    D. Blessing, J. Berner, L. Richter, and G. Neumann. Underdamped diffusion bridges with applications to sampling. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=Q1QTxFm0Is

  12. [12]

    Brosse, A

    N. Brosse, A. Durmus, and E. Moulines. Normalizing constants of log-concave densities. Electronic Journal of Statistics, 12 0 (1): 0 851 -- 889, 2018. doi:10.1214/18-EJS1411. URL https://doi.org/10.1214/18-EJS1411

  13. [13]

    Carbone, M

    D. Carbone, M. Hua, S. Coste, and E. Vanden-Eijnden. Efficient training of energy-based models using Jarzynski equality. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 52583--52614. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_f...

  14. [14]

    Chatterjee and P

    S. Chatterjee and P. Diaconis. The sample size required in importance sampling. The Annals of Applied Probability, 28 0 (2): 0 1099 -- 1135, 2018. doi:10.1214/17-AAP1326. URL https://doi.org/10.1214/17-AAP1326

  15. [15]

    Chehab, A

    O. Chehab, A. Hyv\"arinen, and A. Risteski. Provable benefits of annealing for estimating normalizing constants: Importance sampling, noise-contrastive estimation, and beyond. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=iWGC0Nsq9i

  16. [16]

    Chehab, A

    O. Chehab, A. Korba, A. J. Stromme, and A. Vacher. Provable convergence and limitations of geometric tempering for Langevin dynamics. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=DZcmz9wU0i

  17. [17]

    Chemseddine, C

    J. Chemseddine, C. Wald, R. Duong, and G. Steidl. Neural sampling from Boltzmann densities: Fisher - Rao curves in the Wasserstein geometry. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=TUvg5uwdeG

  18. [18]

    Chen and L

    H. Chen and L. Ying. Ensemble-based annealed importance sampling. arXiv preprint arXiv:2401.15645, 2024

  19. [19]

    J. Chen, L. Richter, J. Berner, D. Blessing, G. Neumann, and A. Anandkumar. Sequential controlled Langevin Diffusions . In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=dImD2sgy86

  20. [20]

    Chen and Q.-M

    M.-H. Chen and Q.-M. Shao. On Monte Carlo methods for estimating ratios of normalizing constants. The Annals of Statistics, 25 0 (4): 0 1563 -- 1594, 1997. doi:10.1214/aos/1031594732. URL https://doi.org/10.1214/aos/1031594732

  21. [21]

    S. Chen, S. Chewi, J. Li, Y. Li, A. Salim, and A. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=zyLVMgsZ0U_

  22. [22]

    Y. Chen, T. T. Georgiou, and M. Pavon. On the relation between optimal transport and Schr\"odinger bridges: A stochastic control viewpoint. Journal of Optimization Theory and Applications, 169: 0 671--691, 2016. doi:10.1007/s10957-015-0803-z

  23. [23]

    Y. Chen, T. T. Georgiou, and A. Tannenbaum. Stochastic control and nonequilibrium thermodynamics: Fundamental limits. IEEE Transactions on Automatic Control, 65 0 (7): 0 2979--2991, 2020. doi:10.1109/TAC.2019.2939625

  24. [24]

    Y. Chen, T. T. Georgiou, and M. Pavon. Stochastic control liaisons: Richard Sinkhorn meets Gaspard Monge on a Schr\"odinger bridge. SIAM Review, 63 0 (2): 0 249--313, 2021. doi:10.1137/20M1339982. URL https://doi.org/10.1137/20M1339982

  25. [25]

    Cheng, N

    X. Cheng, N. S. Chatterji, P. L. Bartlett, and M. I. Jordan. Underdamped Langevin MCMC : A non-asymptotic analysis. In S. Bubeck, V. Perchet, and P. Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 300--323. PMLR, 06--09 Jul 2018. URL https://proceedings.mlr.press/v75/ch...

  26. [26]

    Cheng, B

    X. Cheng, B. Wang, J. Zhang, and Y. Zhu. Fast conditional mixing of MCMC algorithms for non-log-concave distributions. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 13374--13394. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_fil...

  27. [27]

    S. Chewi. Log-Concave Sampling. Book draft, in preparation, 2022. URL https://chewisinho.github.io

  28. [28]

    Chewi, M

    S. Chewi, M. A. Erdogdu, M. Li, R. Shen, and S. Zhang. Analysis of Langevin Monte Carlo from Poincar\'e to log- Sobolev . In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1--2. PMLR, 02--05 Jul 2022. URL https://proceedings.mlr.press/v178/chewi22a.html

  29. [29]

    Chipot and A

    C. Chipot and A. Pohorille, editors. Free Energy Calculations: Theory and Applications in Chemistry and Biology. Springer Series in Chemical Physics. Springer Berlin, Heidelberg, 2007. doi:10.1007/978-3-540-38448-9

  30. [30]

    Conforti and L

    G. Conforti and L. Tamanini. A formula for the time derivative of the entropic cost and applications. Journal of Functional Analysis, 280 0 (11): 0 108964, 2021. ISSN 0022-1236. doi:10.1016/j.jfa.2021.108964. URL https://www.sciencedirect.com/science/article/pii/S002212362100046X

  31. [31]

    Cousins and S

    B. Cousins and S. Vempala. Gaussian cooling and O^*(n^3) algorithms for volume and Gaussian volume. SIAM Journal on Computing, 47 0 (3): 0 1237--1273, 2018. doi:10.1137/15M1054250. URL https://doi.org/10.1137/15M1054250

  32. [32]

    G. E. Crooks. Nonequilibrium measurements of free energy differences for microscopically reversible Markovian systems. Journal of Statistical Physics, 90: 0 1481--1487, 1998. doi:10.1023/A:1023208217925

  33. [33]

    G. E. Crooks. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E, 60: 0 2721--2726, Sep 1999. doi:10.1103/PhysRevE.60.2721. URL https://link.aps.org/doi/10.1103/PhysRevE.60.2721

  34. [34]

    Del Moral, A

    P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68 0 (3): 0 411--436, 5 2006

  35. [35]

    Doucet, S

    A. Doucet, S. Godsill, and C. Andrieu. On sequential Monte Carlo sampling methods for bayesian filtering. Statistics and computing, 10: 0 197--208, 2000. doi:10.1023/A:1008935410038

  36. [36]

    Doucet, W

    A. Doucet, W. Grathwohl, A. G. Matthews, and H. Strathmann. Score-based diffusion meets annealed importance sampling. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 21482--21494. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/...

  37. [37]

    M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM, 38 0 (1): 0 1–17, Jan. 1991. ISSN 0004-5411. doi:10.1145/102782.102783. URL https://doi.org/10.1145/102782.102783

  38. [38]

    Echeverria and L

    I. Echeverria and L. M. Amzel. Estimation of free-energy differences from computed work distributions: An application of Jarzynski 's equality. The Journal of Physical Chemistry B, 116 0 (36): 0 10977--11396, 2012. doi:10.1021/jp300527q

  39. [39]

    Flamary, N

    R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotomamonjy, I. Redko, A. Rolet, A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard, A. Tong, and T. Vayer. POT : Python optimal transport. Journal of Machine Learning Research, 22 0 (78):...

  40. [40]

    Ge and D.-Q

    H. Ge and D.-Q. Jiang. Generalized Jarzynski 's equality of inhomogeneous multidimensional diffusion processes. Journal of Statistical Physics, 131: 0 675--689, 3 2008. ISSN 1572-9613. doi:10.1007/s10955-008-9520-4

  41. [41]

    R. Ge, H. Lee, and A. Risteski. Beyond log-concavity: Provable guarantees for sampling multi-modal distributions using simulated tempering Langevin Monte Carlo . In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL htt...

  42. [42]

    R. Ge, H. Lee, and J. Lu. Estimating normalizing constants for log-concave distributions: Algorithms and lower bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 579–586, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450369794. doi:10.1145/3357713.3384289. URL https://doi.org/10....

  43. [43]

    Gelman and X.-L

    A. Gelman and X.-L. Meng. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13 0 (2): 0 163 -- 185, 1998. doi:10.1214/ss/1028905934. URL https://doi.org/10.1214/ss/1028905934

  44. [44]

    Gelman, J

    A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis . Chapman and Hall/CRC, 3 edition, 2013

  45. [45]

    W. Guo, M. Tao, and Y. Chen. Provable benefit of annealed Langevin Monte Carlo for non-log-concave sampling. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=P6IVIoGRRg

  46. [46]

    Hartmann and L

    C. Hartmann and L. Richter. Nonasymptotic bounds for suboptimal importance sampling. SIAM/ASA Journal on Uncertainty Quantification, 12 0 (2): 0 309--346, 2024. doi:10.1137/21M1427760. URL https://doi.org/10.1137/21M1427760

  47. [47]

    Hartmann, L

    C. Hartmann, L. Richter, C. Sch\"utte, and W. Zhang. Variational characterization of free energy: Theory and algorithms. Entropy, 19 0 (11), 2017. ISSN 1099-4300. doi:10.3390/e19110626. URL https://www.mdpi.com/1099-4300/19/11/626

  48. [48]

    Hartmann, C

    C. Hartmann, C. Sch\"utte, and W. Zhang. Jarzynski 's equality, fluctuation theorems, and variance reduction: Mathematical analysis and numerical algorithms. Journal of Statistical Physics, 175: 0 1214--1261, 2019. doi:10.1007/s10955-019-02286-4. URL https://doi.org/10.1007/s10955-019-02286-4

  49. [49]

    J. He, Y. Du, F. Vargas, Y. Wang, C. P. Gomes, J. M. Hern\'andez-Lobato, and E. Vanden-Eijnden. FEAT : Free energy estimators with adaptive transport. arXiv preprint arXiv:2504.11516, 2025

  50. [50]

    He and C

    Y. He and C. Zhang. On the query complexity of sampling from non-log-concave distributions (extended abstract). In N. Haghtalab and A. Moitra, editors, Proceedings of Thirty Eighth Conference on Learning Theory, volume 291 of Proceedings of Machine Learning Research, pages 2786--2787. PMLR, 30 Jun--04 Jul 2025. URL https://proceedings.mlr.press/v291/he25a.html

  51. [51]

    Y. He, K. Rojas, and M. Tao. Zeroth-order sampling methods for non-log-concave distributions: Alleviating metastability by denoising diffusion. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=X3Aljulsw5

  52. [52]

    Huang, H

    X. Huang, H. Dong, Y. Hao, Y. Ma, and T. Zhang. Reverse diffusion Monte Carlo . In The Twelfth International Conference on Learning Representations, 2024 a . URL https://openreview.net/forum?id=kIPEyMSdFV

  53. [53]

    Huang, D

    X. Huang, D. Zou, H. Dong, Y.-A. Ma, and T. Zhang. Faster sampling without isoperimetry via diffusion-based Monte Carlo . In S. Agrawal and A. Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 of Proceedings of Machine Learning Research, pages 2438--2493. PMLR, 30 Jun--03 Jul 2024 b . URL https://proceedings.mlr.press/...

  54. [54]

    M. Huber. Approximation algorithms for the normalizing constant of Gibbs distributions. The Annals of Applied Probability, 25 0 (2): 0 974 -- 985, 2015. doi:10.1214/14-AAP1015. URL https://doi.org/10.1214/14-AAP1015

  55. [55]

    Jarzynski

    C. Jarzynski. Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78: 0 2690--2693, Apr 1997. doi:10.1103/PhysRevLett.78.2690. URL https://link.aps.org/doi/10.1103/PhysRevLett.78.2690

  56. [56]

    Jasra, K

    A. Jasra, K. Kamatani, P. P. Osei, and Y. Zhou. Multilevel particle filters: normalizing constant estimation. Statistics and Computing, 28: 0 47--60, 2018. doi:10.1007/s11222-016-9715-5

  57. [57]

    M. R. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43: 0 169--188, 1986. ISSN 0304-3975. doi:10.1016/0304-3975(86)90174-X. URL https://www.sciencedirect.com/science/article/pii/030439758690174X

  58. [58]

    Karatzas and S

    I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus . Graduate Texts in Mathematics. Springer New York, NY, 2 edition, 1991. doi:10.1007/978-1-4612-0949-2

  59. [59]

    D. P. Kingma and M. Welling. Auto-encoding variational Bayes . arXiv preprint arXiv:1312.6114, 2013

  60. [60]

    J. G. Kirkwood. Statistical mechanics of fluid mixtures. The Journal of Chemical Physics, 3 0 (5): 0 300--313, 05 1935. ISSN 0021-9606. doi:10.1063/1.1749657. URL https://doi.org/10.1063/1.1749657

  61. [61]

    Kook and S

    Y. Kook and S. S. Vempala. Sampling and integration of logconcave functions by algorithmic diffusion. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC '25, page 924–932, New York, NY, USA, 2025. Association for Computing Machinery. ISBN 9798400715105. doi:10.1145/3717823.3718202. URL https://doi.org/10.1145/3717823.3718202

  62. [62]

    Y. Kook, S. Vempala, and M. S. Zhang. In-and-Out : Algorithmic diffusion for sampling convex bodies. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=aNQWRHyh15

  63. [63]

    Kostov and N

    S. Kostov and N. Whiteley. An algorithm for approximating the second moment of the normalizing constant estimate from a particle filter. Methodology and Computing in Applied Probability, 19: 0 799--818, 2017. doi:10.1007/s11009-016-9513-8

  64. [64]

    Krause, A

    O. Krause, A. Fischer, and C. Igel. Algorithms for estimating the partition function of restricted Boltzmann machines. Artificial Intelligence, 278: 0 103195, 2020. ISSN 0004-3702. doi:10.1016/j.artint.2019.103195. URL https://www.sciencedirect.com/science/article/pii/S0004370219301948

  65. [65]

    Le Bris and P.-L

    C. Le Bris and P.-L. Lions. Existence and uniqueness of solutions to Fokker–Planck type equations with irregular coefficients. Communications in Partial Differential Equations, 33 0 (7): 0 1272--1317, 2008. doi:10.1080/03605300801970952. URL https://doi.org/10.1080/03605300801970952

  66. [66]

    H. Lee, J. Lu, and Y. Tan. Convergence of score-based generative modeling for general data distributions. In S. Agrawal and F. Orabona, editors, Proceedings of The 34th International Conference on Algorithmic Learning Theory, volume 201 of Proceedings of Machine Learning Research, pages 946--985. PMLR, 20 Feb--23 Feb 2023. URL https://proceedings.mlr.pres...

  67. [67]

    Leli\`evre, M

    T. Leli\`evre, M. Rousset, and G. Stoltz. Free Energy Computations: A Mathematical Perspective. Imperial College Press, 2010. doi:10.1142/p579

  68. [68]

    L \'e onard

    C. L \'e onard. A survey of the Schr\"odinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems - Series A, 34 0 (4): 0 1533--1574, 2014. URL https://hal.science/hal-00849930

  69. [69]

    J. Ma, J. Peng, S. Wang, and J. Xu. Estimating the partition function of graphical models using langevin importance sampling. In C. M. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 31 of Proceedings of Machine Learning Research, pages 433--441, Scottsdale, Arizon...

  70. [70]

    M\'at\'e and F

    B. M\'at\'e and F. Fleuret. Learning interpolations between Boltzmann densities. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=TH6YrEcbth

  71. [71]

    M\'at\'e, F

    B. M\'at\'e, F. Fleuret, and T. Bereau. Neural thermodynamic integration: Free energies from energy-based diffusion models. The Journal of Physical Chemistry Letters, 15 0 (45): 0 11395--11404, 2024. doi:10.1021/acs.jpclett.4c01958. URL https://doi.org/10.1021/acs.jpclett.4c01958. PMID: 39503734

  72. [72]

    Exactly solvable model illustrating far-from-equilibrium predictions

    O. Mazonka and C. Jarzynski. Exactly solvable model illustrating far-from-equilibrium predictions. arXiv preprint cond-mat/9912121, 1999

  73. [73]

    Mazzanti and E

    F. Mazzanti and E. Romero. Efficient evaluation of the partition function of RBMs with annealed importance sampling. arXiv preprint arXiv:2007.11926, 2020

  74. [74]

    Meng and W

    X.-L. Meng and W. H. Wong. Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica, 6 0 (4): 0 831--860, 1996. ISSN 10170405, 19968507. URL http://www.jstor.org/stable/24306045

  75. [75]

    Mousavi-Hosseini, T

    A. Mousavi-Hosseini, T. K. Farghly, Y. He, K. Balasubramanian, and M. A. Erdogdu. Towards a complete analysis of Langevin Monte Carlo : Beyond Poincar\'e inequality. In G. Neu and L. Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 1--35. PMLR, 12--15 Jul 2023. URL h...

  76. [76]

    R. M. Neal. Annealed importance sampling. Statistics and Computing, 11 0 (2): 0 125--139, April 2001. ISSN 1573-1375. doi:10.1023/A:1008923215028. URL https://doi.org/10.1023/A:1008923215028

  77. [77]

    E. Nelson. Dynamical Theories of Brownian Motion. Princeton University Press, 1967. ISBN 9780691079509. URL http://www.jstor.org/stable/j.ctv15r57jg

  78. [78]

    N\"usken and L

    N. N\"usken and L. Richter. Solving high-dimensional Hamilton -- Jacobi -- Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space. Partial differential equations and applications, 2 0 (4): 0 48, 2021. doi:10.1007/s42985-021-00102-x

  79. [79]

    Pohorille, C

    A. Pohorille, C. Jarzynski, and C. Chipot. Good practices in free-energy calculations. The Journal of Physical Chemistry B, 114 0 (32): 0 10235--10253, 2010. ISSN 1520-6106. doi:10.1021/jp102971x. URL https://doi.org/10.1021/jp102971x

  80. [80]

    Y. Ren, H. Chen, G. M. Rotskoff, and L. Ying. How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework. In The Thirteenth International Conference on Learning Representations, 2025 a . URL https://openreview.net/forum?id=6awxwQEI82

Showing first 80 references.