Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

Molei Tao; Wei Guo; Yongxin Chen

arxiv: 2502.04575 · v3 · pith:DPPVBZTNnew · submitted 2025-02-07 · 📊 stat.ML · cs.LG· cs.NA· math.NA· physics.comp-ph· stat.CO

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

Wei Guo , Molei Tao , Yongxin Chen This is my paper

Pith reviewed 2026-05-23 03:54 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NAphysics.comp-phstat.CO

keywords normalizing constant estimationannealed importance samplingJarzynski equalityoracle complexityGirsanov theoremoptimal transportreverse diffusion samplermultimodal distributions

0 comments

The pith

Annealed importance sampling estimates the normalizing constant Z to relative error ε with Õ(d β² A² / ε⁴) oracle complexity under finite action assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first non-asymptotic complexity bound for annealed importance sampling when estimating the normalizing constant Z of an unnormalized density π ∝ e^{-V}. It shows that Õ(d β² A² / ε⁴) oracle queries suffice to achieve ε-relative error with high probability, where β is the smoothness parameter of V and A is the action of an interpolating curve of measures. A sympathetic reader would care because these annealing methods are standard tools for high-dimensional and multimodal problems in statistics and machine learning, yet had lacked rigorous quantitative guarantees. The analysis applies Girsanov's theorem and optimal transport to control variance without isoperimetric assumptions on the target.

Core claim

We derive an oracle complexity of Õ(d β² A² / ε⁴) for estimating Z within ε relative error with high probability using annealed importance sampling. This holds when there exists a curve of interpolating measures with finite action A between the target and a tractable reference. The analysis leverages Girsanov's theorem and optimal transport and does not require isoperimetric assumptions on the target distribution. To handle the large action of standard geometric interpolation, we introduce a reverse diffusion sampler algorithm, establish its complexity framework, and show empirically that it handles multimodality efficiently.

What carries the argument

the action A of a curve of probability measures interpolating between the target and reference distribution, which enters the complexity bound by controlling the variance of the estimator through Girsanov's theorem

Load-bearing premise

A curve of interpolating measures with finite action A exists between the target distribution and a tractable reference, allowing Girsanov's theorem to be applied to the underlying stochastic processes.

What would settle it

A controlled experiment on an isotropic Gaussian target with explicitly computable exact action A and exact sample requirements, checking whether the observed oracle calls scale as Õ(d β² A² / ε⁴) when d, β, A, and ε are varied.

Figures

Figures reproduced from arXiv: 2502.04575 by Molei Tao, Wei Guo, Yongxin Chen.

**Figure 2.** Figure 2: Visualization of the samples from the modified Müller Brown distribution. The generated samples are [PITH_FULL_IMAGE:figures/full_fig_p044_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of the samples from the Gaussian mixture distribution. The generated samples are displayed on [PITH_FULL_IMAGE:figures/full_fig_p044_3.png] view at source ↗

read the original abstract

Given an unnormalized probability density $\pi\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when $\pi$ is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of $\widetilde{O}\left(\frac{d\beta^2{\mathcal{A}}^2}{\varepsilon^4}\right)$ for estimating $Z$ within $\varepsilon$ relative error with high probability, where $\beta$ is the smoothness of $V$ and $\mathcal{A}$ denotes the action of a curve of probability measures interpolating $\pi$ and a tractable reference distribution. Our analysis, leveraging Girsanov's theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first non-asymptotic complexity bound for AIS via Girsanov and OT, plus a reverse-diffusion sampler for high-action cases, but the martingale condition needs explicit verification.

read the letter

The core contribution here is a non-asymptotic oracle complexity of Õ(d β² A² / ε⁴) for relative-error estimation of the normalizing constant Z using annealed importance sampling, along with a new reverse-diffusion sampler whose complexity they also bound. This is new: prior work on Jarzynski and AIS stayed asymptotic or lacked explicit dependence on dimension and the interpolation action A. The analysis avoids isoperimetric assumptions on the target by routing through Girsanov and optimal transport, which is a clean technical choice and useful when the target is multimodal. The empirical section on the new sampler shows it handles multimodality better than geometric interpolation, which is a practical plus. The bound itself is stated cleanly in terms of A, β, d, and ε, with no obvious self-referential fitting. The main soft spot is the Girsanov step. Finite action A controls the L² norm of the drift, but Girsanov requires the exponential local martingale to be a true martingale, which typically needs a Novikov-type exponential-moment condition. The abstract claims no extra isoperimetric assumptions are needed, yet it is not obvious from the given sketch that finite A alone guarantees the required integrability for arbitrary curves. If the full proof supplies a direct check or a uniform bound that closes this, the result stands; otherwise the complexity claim rests on an unstated step. The paper is aimed at people working on non-asymptotic analysis of sampling algorithms in stats and ML. It is a first-step result with concrete bounds and a new method, so it deserves a serious referee even if revisions are needed on the integrability details.

Referee Report

2 major / 2 minor

Summary. The paper claims to provide the first non-asymptotic oracle complexity analysis of annealed importance sampling (AIS) for estimating the normalizing constant Z of an unnormalized density π ∝ e^{-V(x)}. It derives a bound of Õ(d β² A² / ε⁴) for achieving relative error ε with high probability, where β measures smoothness of V and A is the action of an interpolating curve of measures connecting π to a tractable reference; the analysis applies Girsanov's theorem together with optimal transport and avoids explicit isoperimetric assumptions on the target. The paper also introduces a reverse-diffusion-sampler algorithm to mitigate large action under geometric interpolation and reports empirical gains on multimodal targets.

Significance. If the central bound is valid, the result would be significant: it supplies the first quantitative non-asymptotic guarantee for a family of methods (Jarzynski equality, AIS) that are standard in Bayesian statistics, statistical mechanics, and machine learning yet previously lacked complexity statements. The combination of Girsanov and optimal transport to remove isoperimetric hypotheses is technically interesting, and the reverse-diffusion proposal directly addresses a practical bottleneck of geometric paths. The framework is reusable for other annealing schedules.

major comments (2)

[Abstract, §1] Abstract and analysis paragraph (§1): the claimed Õ(d β² A² / ε⁴) bound is obtained by using Girsanov to produce an unbiased estimator from the Radon-Nikodym derivative between forward and reverse processes along the interpolating curve. Girsanov yields a true martingale (hence unbiasedness) only when the exponential local martingale satisfies Novikov's condition E[exp(½ ∫ |u_t|² dt)] < ∞. Finite action A (presumably ∫ E[|u_t|²] dt < ∞) controls the L² norm but does not automatically imply the required exponential moment. The manuscript asserts that no isoperimetric assumptions on the target are needed, yet provides no explicit verification or supplementary integrability condition that would guarantee Novikov for arbitrary curves with only finite A. This step is load-bearing for the unbiasedness claim and therefore for the complexity bound.
[Abstract, §1] Abstract and §1: the stated oracle complexity is expressed in terms of the external quantity A (action of the chosen interpolating curve). The manuscript does not indicate whether a curve with A independent of the target accuracy ε can always be selected, or whether constructing such a curve (and therefore controlling A) may itself depend on ε in a manner that alters the overall complexity. Without this clarification the bound cannot be read as a fully non-asymptotic guarantee in the usual sense.

minor comments (2)

[Abstract] The abstract states that the analysis 'does not explicitly require isoperimetric assumptions,' but the precise regularity conditions placed on the interpolating curve (e.g., moment bounds on the Girsanov kernel) should be stated explicitly in the theorem statement for clarity.
Notation: β is used for the smoothness parameter of V; a brief reminder of its precise definition (e.g., Lipschitz constant of ∇V) would help readers who are not already familiar with the paper's conventions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting these two technical points on the application of Girsanov's theorem and the interpretation of the oracle complexity. Both comments are addressed point-by-point below. We believe the core claims remain valid once the requested clarifications are supplied.

read point-by-point responses

Referee: Girsanov yields a true martingale (hence unbiasedness) only when the exponential local martingale satisfies Novikov's condition E[exp(½ ∫ |u_t|² dt)] < ∞. Finite action A controls the L² norm but does not automatically imply the required exponential moment. No explicit verification or supplementary integrability condition is provided for arbitrary curves with only finite A.

Authors: We agree that Novikov's condition is required for the exponential martingale to be a true martingale. Our analysis implicitly assumes the SDEs admit a well-defined Girsanov change of measure; finite A guarantees that the quadratic variation process is integrable in L², but does not by itself guarantee the exponential integrability. Under the standing Lipschitz and linear-growth assumptions already placed on the drift and diffusion coefficients (standard for the Langevin and reverse-diffusion processes considered), standard results in stochastic analysis (e.g., Theorem 5.1 in Karatzas & Shreve or Proposition 3.1 in Øksendal) ensure that Novikov holds locally and can be extended globally on compact time intervals. We will add a short paragraph after the statement of Girsanov's theorem (new Section 2.3) that explicitly invokes these conditions and notes that they are satisfied by the geometric and reverse-diffusion schedules analyzed later. This is a clarification rather than a change to the complexity bound itself. revision: partial
Referee: The stated oracle complexity is expressed in terms of the external quantity A. The manuscript does not indicate whether a curve with A independent of the target accuracy ε can always be selected, or whether constructing such a curve may itself depend on ε in a manner that alters the overall complexity.

Authors: A is the action of a fixed interpolating curve of measures chosen independently of the accuracy parameter ε; it is a property of the annealing schedule (geometric, arithmetic, or reverse-diffusion) and of the pair (π, reference). For any fixed schedule the value of A is therefore independent of ε, and the Õ(d β² A² / ε⁴) bound is fully non-asymptotic once the schedule is selected. When a practitioner chooses a schedule whose A grows with dimension or with the separation of modes, that growth appears explicitly in the complexity; the reverse-diffusion construction is introduced precisely to produce schedules whose A remains moderate. We will insert one clarifying sentence at the end of the first paragraph of Section 1 and a footnote in the complexity theorem stating that A is schedule-dependent but ε-independent. revision: yes

Circularity Check

0 steps flagged

No circularity: complexity bound parameterized by external action A using standard theorems

full rationale

The claimed oracle complexity Õ(d β² A² / ε⁴) is expressed directly in terms of the external quantity A (action of an interpolating curve of measures), with the derivation invoking Girsanov's theorem and optimal transport as independent mathematical tools. No step reduces a prediction to a fitted input, renames a known result, or relies on a load-bearing self-citation chain. The bound is not self-definitional; A is an input to the analysis rather than derived from the target result. The absence of isoperimetric assumptions is presented as a feature of the Girsanov+OT approach, with no evidence that the central claim collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The complexity result rests on the existence of an interpolating curve with finite action A and on the applicability of Girsanov's theorem; no free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Girsanov's theorem applies to the stochastic processes along the annealing path
Invoked to control the Radon-Nikodym derivative between measures in the analysis paragraph of the abstract.
domain assumption An interpolating curve of probability measures with finite action A exists between the target and reference
Central to the complexity expression; stated without further justification in the abstract.

pith-pipeline@v0.9.0 · 5812 in / 1423 out tokens · 65771 ms · 2026-05-23T03:54:32.282382+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

derive an oracle complexity of Õ(d β² A² / ε⁴) … leveraging Girsanov’s theorem and optimal transport … action A of a curve of probability measures
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Jarzynski equality … EP→ e^{-W} = e^{-ΔF}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample-efficient evidence estimation of score based priors for model selection
cs.LG 2026-02 unverdicted novelty 7.0

DiME estimates model evidence for diffusion priors by integrating time-marginals from posterior sampling, enabling efficient prior selection and misfit diagnosis in ill-posed inverse problems.

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

M. S. Albergo and E. Vanden-Eijnden. NETS : A non-equilibrium transport sampler. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=QqGw9StPbQ

work page 2025
[2]

urich. Birkh\

L. Ambrosio, N. Gigli, and G. Savar\'e. Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Z\"urich. Birkh\"auser Basel, 2 edition, 2008. doi:10.1007/978-3-7643-8722-8

work page doi:10.1007/978-3-7643-8722-8 2008
[3]

Ambrosio, E

L. Ambrosio, E. Bru\'e, and D. Semola. Lectures on optimal transport, volume 130 of UNITEXT. Springer Cham, 2021. doi:10.1007/978-3-030-72162-6. URL https://link.springer.com/book/10.1007/978-3-030-72162-6

work page doi:10.1007/978-3-030-72162-6 2021
[4]

B. D. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12 0 (3): 0 313--326, 1982. ISSN 0304-4149. doi:10.1016/0304-4149(82)90051-5. URL https://www.sciencedirect.com/science/article/pii/0304414982900515

work page doi:10.1016/0304-4149(82)90051-5 1982
[5]

Sampling normalizing constants in high dimensions using inhomogeneous diffusions

C. Andrieu, J. Ridgway, and N. Whiteley. Sampling normalizing constants in high dimensions using inhomogeneous diffusions. arXiv preprint arXiv:1612.07583, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Arrar, F

M. Arrar, F. M. Boubeta, M. E. Szretter, M. Sued, L. Boechi, and D. Rodriguez. On the accurate estimation of free energies using the Jarzynski equality. Journal of Computational Chemistry, 40 0 (4): 0 688--696, 2019. doi:10.1002/jcc.25754. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.25754

work page doi:10.1002/jcc.25754 2019
[7]

Aurell, C

E. Aurell, C. Mej\' a-Monasterio, and P. Muratore-Ginanneschi. Optimal protocols and optimal transport in stochastic thermodynamics. Phys. Rev. Lett., 106: 0 250601, Jun 2011. doi:10.1103/PhysRevLett.106.250601. URL https://link.aps.org/doi/10.1103/PhysRevLett.106.250601

work page doi:10.1103/physrevlett.106.250601 2011
[8]

Aurell, K

E. Aurell, K. Gaw e dzki, C. Mej\' a-Monasterio, R. Mohayaee, and P. Muratore-Ginanneschi. Refined second law of thermodynamics for fast random processes. Journal of statistical physics, 147: 0 487--505, 2012. doi:10.1007/s10955-012-0478-x

work page doi:10.1007/s10955-012-0478-x 2012
[9]

Bakry, I

D. Bakry, I. Gentil, and M. Ledoux. Analysis and geometry of Markov diffusion operators , volume 103 of Grundlehren der mathematischen Wissenschaften. Springer Cham, 1 edition, 2014. doi:10.1007/978-3-319-00227-9

work page doi:10.1007/978-3-319-00227-9 2014
[10]

Balasubramanian, S

K. Balasubramanian, S. Chewi, M. A. Erdogdu, A. Salim, and S. Zhang. Towards a theory of non-log-concave sampling: First-order stationarity guarantees for Langevin Monte Carlo . In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 2896--2923. PMLR, 0...

work page 2022
[11]

Blessing, J

D. Blessing, J. Berner, L. Richter, and G. Neumann. Underdamped diffusion bridges with applications to sampling. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=Q1QTxFm0Is

work page 2025
[12]

Brosse, A

N. Brosse, A. Durmus, and E. Moulines. Normalizing constants of log-concave densities. Electronic Journal of Statistics, 12 0 (1): 0 851 -- 889, 2018. doi:10.1214/18-EJS1411. URL https://doi.org/10.1214/18-EJS1411

work page doi:10.1214/18-ejs1411 2018
[13]

Carbone, M

D. Carbone, M. Hua, S. Coste, and E. Vanden-Eijnden. Efficient training of energy-based models using Jarzynski equality. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 52583--52614. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_f...

work page 2023
[14]

Chatterjee and P

S. Chatterjee and P. Diaconis. The sample size required in importance sampling. The Annals of Applied Probability, 28 0 (2): 0 1099 -- 1135, 2018. doi:10.1214/17-AAP1326. URL https://doi.org/10.1214/17-AAP1326

work page doi:10.1214/17-aap1326 2018
[15]

Chehab, A

O. Chehab, A. Hyv\"arinen, and A. Risteski. Provable benefits of annealing for estimating normalizing constants: Importance sampling, noise-contrastive estimation, and beyond. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=iWGC0Nsq9i

work page 2023
[16]

Chehab, A

O. Chehab, A. Korba, A. J. Stromme, and A. Vacher. Provable convergence and limitations of geometric tempering for Langevin dynamics. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=DZcmz9wU0i

work page 2025
[17]

Chemseddine, C

J. Chemseddine, C. Wald, R. Duong, and G. Steidl. Neural sampling from Boltzmann densities: Fisher - Rao curves in the Wasserstein geometry. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=TUvg5uwdeG

work page 2025
[18]

Chen and L

H. Chen and L. Ying. Ensemble-based annealed importance sampling. arXiv preprint arXiv:2401.15645, 2024

work page arXiv 2024
[19]

J. Chen, L. Richter, J. Berner, D. Blessing, G. Neumann, and A. Anandkumar. Sequential controlled Langevin Diffusions . In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=dImD2sgy86

work page 2025
[20]

Chen and Q.-M

M.-H. Chen and Q.-M. Shao. On Monte Carlo methods for estimating ratios of normalizing constants. The Annals of Statistics, 25 0 (4): 0 1563 -- 1594, 1997. doi:10.1214/aos/1031594732. URL https://doi.org/10.1214/aos/1031594732

work page doi:10.1214/aos/1031594732 1997
[21]

S. Chen, S. Chewi, J. Li, Y. Li, A. Salim, and A. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=zyLVMgsZ0U_

work page 2023
[22]

Y. Chen, T. T. Georgiou, and M. Pavon. On the relation between optimal transport and Schr\"odinger bridges: A stochastic control viewpoint. Journal of Optimization Theory and Applications, 169: 0 671--691, 2016. doi:10.1007/s10957-015-0803-z

work page doi:10.1007/s10957-015-0803-z 2016
[23]

Y. Chen, T. T. Georgiou, and A. Tannenbaum. Stochastic control and nonequilibrium thermodynamics: Fundamental limits. IEEE Transactions on Automatic Control, 65 0 (7): 0 2979--2991, 2020. doi:10.1109/TAC.2019.2939625

work page doi:10.1109/tac.2019.2939625 2020
[24]

Y. Chen, T. T. Georgiou, and M. Pavon. Stochastic control liaisons: Richard Sinkhorn meets Gaspard Monge on a Schr\"odinger bridge. SIAM Review, 63 0 (2): 0 249--313, 2021. doi:10.1137/20M1339982. URL https://doi.org/10.1137/20M1339982

work page doi:10.1137/20m1339982 2021
[25]

Cheng, N

X. Cheng, N. S. Chatterji, P. L. Bartlett, and M. I. Jordan. Underdamped Langevin MCMC : A non-asymptotic analysis. In S. Bubeck, V. Perchet, and P. Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 300--323. PMLR, 06--09 Jul 2018. URL https://proceedings.mlr.press/v75/ch...

work page 2018
[26]

Cheng, B

X. Cheng, B. Wang, J. Zhang, and Y. Zhu. Fast conditional mixing of MCMC algorithms for non-log-concave distributions. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 13374--13394. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_fil...

work page 2023
[27]

S. Chewi. Log-Concave Sampling. Book draft, in preparation, 2022. URL https://chewisinho.github.io

work page 2022
[28]

Chewi, M

S. Chewi, M. A. Erdogdu, M. Li, R. Shen, and S. Zhang. Analysis of Langevin Monte Carlo from Poincar\'e to log- Sobolev . In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1--2. PMLR, 02--05 Jul 2022. URL https://proceedings.mlr.press/v178/chewi22a.html

work page 2022
[29]

Chipot and A

C. Chipot and A. Pohorille, editors. Free Energy Calculations: Theory and Applications in Chemistry and Biology. Springer Series in Chemical Physics. Springer Berlin, Heidelberg, 2007. doi:10.1007/978-3-540-38448-9

work page doi:10.1007/978-3-540-38448-9 2007
[30]

Conforti and L

G. Conforti and L. Tamanini. A formula for the time derivative of the entropic cost and applications. Journal of Functional Analysis, 280 0 (11): 0 108964, 2021. ISSN 0022-1236. doi:10.1016/j.jfa.2021.108964. URL https://www.sciencedirect.com/science/article/pii/S002212362100046X

work page doi:10.1016/j.jfa.2021.108964 2021
[31]

Cousins and S

B. Cousins and S. Vempala. Gaussian cooling and O^*(n^3) algorithms for volume and Gaussian volume. SIAM Journal on Computing, 47 0 (3): 0 1237--1273, 2018. doi:10.1137/15M1054250. URL https://doi.org/10.1137/15M1054250

work page doi:10.1137/15m1054250 2018
[32]

G. E. Crooks. Nonequilibrium measurements of free energy differences for microscopically reversible Markovian systems. Journal of Statistical Physics, 90: 0 1481--1487, 1998. doi:10.1023/A:1023208217925

work page doi:10.1023/a:1023208217925 1998
[33]

G. E. Crooks. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E, 60: 0 2721--2726, Sep 1999. doi:10.1103/PhysRevE.60.2721. URL https://link.aps.org/doi/10.1103/PhysRevE.60.2721

work page doi:10.1103/physreve.60.2721 1999
[34]

Del Moral, A

P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68 0 (3): 0 411--436, 5 2006

work page 2006
[35]

Doucet, S

A. Doucet, S. Godsill, and C. Andrieu. On sequential Monte Carlo sampling methods for bayesian filtering. Statistics and computing, 10: 0 197--208, 2000. doi:10.1023/A:1008935410038

work page doi:10.1023/a:1008935410038 2000
[36]

Doucet, W

A. Doucet, W. Grathwohl, A. G. Matthews, and H. Strathmann. Score-based diffusion meets annealed importance sampling. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 21482--21494. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/...

work page 2022
[37]

M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM, 38 0 (1): 0 1–17, Jan. 1991. ISSN 0004-5411. doi:10.1145/102782.102783. URL https://doi.org/10.1145/102782.102783

work page doi:10.1145/102782.102783 1991
[38]

Echeverria and L

I. Echeverria and L. M. Amzel. Estimation of free-energy differences from computed work distributions: An application of Jarzynski 's equality. The Journal of Physical Chemistry B, 116 0 (36): 0 10977--11396, 2012. doi:10.1021/jp300527q

work page doi:10.1021/jp300527q 2012
[39]

Flamary, N

R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotomamonjy, I. Redko, A. Rolet, A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard, A. Tong, and T. Vayer. POT : Python optimal transport. Journal of Machine Learning Research, 22 0 (78):...

work page 2021
[40]

Ge and D.-Q

H. Ge and D.-Q. Jiang. Generalized Jarzynski 's equality of inhomogeneous multidimensional diffusion processes. Journal of Statistical Physics, 131: 0 675--689, 3 2008. ISSN 1572-9613. doi:10.1007/s10955-008-9520-4

work page doi:10.1007/s10955-008-9520-4 2008
[41]

R. Ge, H. Lee, and A. Risteski. Beyond log-concavity: Provable guarantees for sampling multi-modal distributions using simulated tempering Langevin Monte Carlo . In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL htt...

work page 2018
[42]

R. Ge, H. Lee, and J. Lu. Estimating normalizing constants for log-concave distributions: Algorithms and lower bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 579–586, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450369794. doi:10.1145/3357713.3384289. URL https://doi.org/10....

work page doi:10.1145/3357713.3384289 2020
[43]

Gelman and X.-L

A. Gelman and X.-L. Meng. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13 0 (2): 0 163 -- 185, 1998. doi:10.1214/ss/1028905934. URL https://doi.org/10.1214/ss/1028905934

work page doi:10.1214/ss/1028905934 1998
[44]

Gelman, J

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis . Chapman and Hall/CRC, 3 edition, 2013

work page 2013
[45]

W. Guo, M. Tao, and Y. Chen. Provable benefit of annealed Langevin Monte Carlo for non-log-concave sampling. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=P6IVIoGRRg

work page 2025
[46]

Hartmann and L

C. Hartmann and L. Richter. Nonasymptotic bounds for suboptimal importance sampling. SIAM/ASA Journal on Uncertainty Quantification, 12 0 (2): 0 309--346, 2024. doi:10.1137/21M1427760. URL https://doi.org/10.1137/21M1427760

work page doi:10.1137/21m1427760 2024
[47]

Hartmann, L

C. Hartmann, L. Richter, C. Sch\"utte, and W. Zhang. Variational characterization of free energy: Theory and algorithms. Entropy, 19 0 (11), 2017. ISSN 1099-4300. doi:10.3390/e19110626. URL https://www.mdpi.com/1099-4300/19/11/626

work page doi:10.3390/e19110626 2017
[48]

Hartmann, C

C. Hartmann, C. Sch\"utte, and W. Zhang. Jarzynski 's equality, fluctuation theorems, and variance reduction: Mathematical analysis and numerical algorithms. Journal of Statistical Physics, 175: 0 1214--1261, 2019. doi:10.1007/s10955-019-02286-4. URL https://doi.org/10.1007/s10955-019-02286-4

work page doi:10.1007/s10955-019-02286-4 2019
[49]

J. He, Y. Du, F. Vargas, Y. Wang, C. P. Gomes, J. M. Hern\'andez-Lobato, and E. Vanden-Eijnden. FEAT : Free energy estimators with adaptive transport. arXiv preprint arXiv:2504.11516, 2025

work page arXiv 2025
[50]

He and C

Y. He and C. Zhang. On the query complexity of sampling from non-log-concave distributions (extended abstract). In N. Haghtalab and A. Moitra, editors, Proceedings of Thirty Eighth Conference on Learning Theory, volume 291 of Proceedings of Machine Learning Research, pages 2786--2787. PMLR, 30 Jun--04 Jul 2025. URL https://proceedings.mlr.press/v291/he25a.html

work page 2025
[51]

Y. He, K. Rojas, and M. Tao. Zeroth-order sampling methods for non-log-concave distributions: Alleviating metastability by denoising diffusion. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=X3Aljulsw5

work page 2024
[52]

Huang, H

X. Huang, H. Dong, Y. Hao, Y. Ma, and T. Zhang. Reverse diffusion Monte Carlo . In The Twelfth International Conference on Learning Representations, 2024 a . URL https://openreview.net/forum?id=kIPEyMSdFV

work page 2024
[53]

Huang, D

X. Huang, D. Zou, H. Dong, Y.-A. Ma, and T. Zhang. Faster sampling without isoperimetry via diffusion-based Monte Carlo . In S. Agrawal and A. Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 of Proceedings of Machine Learning Research, pages 2438--2493. PMLR, 30 Jun--03 Jul 2024 b . URL https://proceedings.mlr.press/...

work page 2024
[54]

M. Huber. Approximation algorithms for the normalizing constant of Gibbs distributions. The Annals of Applied Probability, 25 0 (2): 0 974 -- 985, 2015. doi:10.1214/14-AAP1015. URL https://doi.org/10.1214/14-AAP1015

work page doi:10.1214/14-aap1015 2015
[55]

Jarzynski

C. Jarzynski. Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78: 0 2690--2693, Apr 1997. doi:10.1103/PhysRevLett.78.2690. URL https://link.aps.org/doi/10.1103/PhysRevLett.78.2690

work page doi:10.1103/physrevlett.78.2690 1997
[56]

Jasra, K

A. Jasra, K. Kamatani, P. P. Osei, and Y. Zhou. Multilevel particle filters: normalizing constant estimation. Statistics and Computing, 28: 0 47--60, 2018. doi:10.1007/s11222-016-9715-5

work page doi:10.1007/s11222-016-9715-5 2018
[57]

M. R. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43: 0 169--188, 1986. ISSN 0304-3975. doi:10.1016/0304-3975(86)90174-X. URL https://www.sciencedirect.com/science/article/pii/030439758690174X

work page doi:10.1016/0304-3975(86)90174-x 1986
[58]

Karatzas and S

I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus . Graduate Texts in Mathematics. Springer New York, NY, 2 edition, 1991. doi:10.1007/978-1-4612-0949-2

work page doi:10.1007/978-1-4612-0949-2 1991
[59]

D. P. Kingma and M. Welling. Auto-encoding variational Bayes . arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[60]

J. G. Kirkwood. Statistical mechanics of fluid mixtures. The Journal of Chemical Physics, 3 0 (5): 0 300--313, 05 1935. ISSN 0021-9606. doi:10.1063/1.1749657. URL https://doi.org/10.1063/1.1749657

work page doi:10.1063/1.1749657 1935
[61]

Kook and S

Y. Kook and S. S. Vempala. Sampling and integration of logconcave functions by algorithmic diffusion. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC '25, page 924–932, New York, NY, USA, 2025. Association for Computing Machinery. ISBN 9798400715105. doi:10.1145/3717823.3718202. URL https://doi.org/10.1145/3717823.3718202

work page doi:10.1145/3717823.3718202 2025
[62]

Y. Kook, S. Vempala, and M. S. Zhang. In-and-Out : Algorithmic diffusion for sampling convex bodies. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=aNQWRHyh15

work page 2024
[63]

Kostov and N

S. Kostov and N. Whiteley. An algorithm for approximating the second moment of the normalizing constant estimate from a particle filter. Methodology and Computing in Applied Probability, 19: 0 799--818, 2017. doi:10.1007/s11009-016-9513-8

work page doi:10.1007/s11009-016-9513-8 2017
[64]

Krause, A

O. Krause, A. Fischer, and C. Igel. Algorithms for estimating the partition function of restricted Boltzmann machines. Artificial Intelligence, 278: 0 103195, 2020. ISSN 0004-3702. doi:10.1016/j.artint.2019.103195. URL https://www.sciencedirect.com/science/article/pii/S0004370219301948

work page doi:10.1016/j.artint.2019.103195 2020
[65]

Le Bris and P.-L

C. Le Bris and P.-L. Lions. Existence and uniqueness of solutions to Fokker–Planck type equations with irregular coefficients. Communications in Partial Differential Equations, 33 0 (7): 0 1272--1317, 2008. doi:10.1080/03605300801970952. URL https://doi.org/10.1080/03605300801970952

work page doi:10.1080/03605300801970952 2008
[66]

H. Lee, J. Lu, and Y. Tan. Convergence of score-based generative modeling for general data distributions. In S. Agrawal and F. Orabona, editors, Proceedings of The 34th International Conference on Algorithmic Learning Theory, volume 201 of Proceedings of Machine Learning Research, pages 946--985. PMLR, 20 Feb--23 Feb 2023. URL https://proceedings.mlr.pres...

work page 2023
[67]

Leli\`evre, M

T. Leli\`evre, M. Rousset, and G. Stoltz. Free Energy Computations: A Mathematical Perspective. Imperial College Press, 2010. doi:10.1142/p579

work page doi:10.1142/p579 2010
[68]

L \'e onard

C. L \'e onard. A survey of the Schr\"odinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems - Series A, 34 0 (4): 0 1533--1574, 2014. URL https://hal.science/hal-00849930

work page 2014
[69]

J. Ma, J. Peng, S. Wang, and J. Xu. Estimating the partition function of graphical models using langevin importance sampling. In C. M. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 31 of Proceedings of Machine Learning Research, pages 433--441, Scottsdale, Arizon...

work page 2013
[70]

M\'at\'e and F

B. M\'at\'e and F. Fleuret. Learning interpolations between Boltzmann densities. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=TH6YrEcbth

work page 2023
[71]

M\'at\'e, F

B. M\'at\'e, F. Fleuret, and T. Bereau. Neural thermodynamic integration: Free energies from energy-based diffusion models. The Journal of Physical Chemistry Letters, 15 0 (45): 0 11395--11404, 2024. doi:10.1021/acs.jpclett.4c01958. URL https://doi.org/10.1021/acs.jpclett.4c01958. PMID: 39503734

work page doi:10.1021/acs.jpclett.4c01958 2024
[72]

Exactly solvable model illustrating far-from-equilibrium predictions

O. Mazonka and C. Jarzynski. Exactly solvable model illustrating far-from-equilibrium predictions. arXiv preprint cond-mat/9912121, 1999

work page internal anchor Pith review Pith/arXiv arXiv 1999
[73]

Mazzanti and E

F. Mazzanti and E. Romero. Efficient evaluation of the partition function of RBMs with annealed importance sampling. arXiv preprint arXiv:2007.11926, 2020

work page arXiv 2007
[74]

Meng and W

X.-L. Meng and W. H. Wong. Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica, 6 0 (4): 0 831--860, 1996. ISSN 10170405, 19968507. URL http://www.jstor.org/stable/24306045

work page arXiv 1996
[75]

Mousavi-Hosseini, T

A. Mousavi-Hosseini, T. K. Farghly, Y. He, K. Balasubramanian, and M. A. Erdogdu. Towards a complete analysis of Langevin Monte Carlo : Beyond Poincar\'e inequality. In G. Neu and L. Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 1--35. PMLR, 12--15 Jul 2023. URL h...

work page 2023
[76]

R. M. Neal. Annealed importance sampling. Statistics and Computing, 11 0 (2): 0 125--139, April 2001. ISSN 1573-1375. doi:10.1023/A:1008923215028. URL https://doi.org/10.1023/A:1008923215028

work page doi:10.1023/a:1008923215028 2001
[77]

E. Nelson. Dynamical Theories of Brownian Motion. Princeton University Press, 1967. ISBN 9780691079509. URL http://www.jstor.org/stable/j.ctv15r57jg

work page 1967
[78]

N\"usken and L

N. N\"usken and L. Richter. Solving high-dimensional Hamilton -- Jacobi -- Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space. Partial differential equations and applications, 2 0 (4): 0 48, 2021. doi:10.1007/s42985-021-00102-x

work page doi:10.1007/s42985-021-00102-x 2021
[79]

Pohorille, C

A. Pohorille, C. Jarzynski, and C. Chipot. Good practices in free-energy calculations. The Journal of Physical Chemistry B, 114 0 (32): 0 10235--10253, 2010. ISSN 1520-6106. doi:10.1021/jp102971x. URL https://doi.org/10.1021/jp102971x

work page doi:10.1021/jp102971x 2010
[80]

Y. Ren, H. Chen, G. M. Rotskoff, and L. Ying. How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework. In The Thirteenth International Conference on Learning Representations, 2025 a . URL https://openreview.net/forum?id=6awxwQEI82

work page 2025

Showing first 80 references.

[1] [1]

M. S. Albergo and E. Vanden-Eijnden. NETS : A non-equilibrium transport sampler. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=QqGw9StPbQ

work page 2025

[2] [2]

urich. Birkh\

L. Ambrosio, N. Gigli, and G. Savar\'e. Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics. ETH Z\"urich. Birkh\"auser Basel, 2 edition, 2008. doi:10.1007/978-3-7643-8722-8

work page doi:10.1007/978-3-7643-8722-8 2008

[3] [3]

Ambrosio, E

L. Ambrosio, E. Bru\'e, and D. Semola. Lectures on optimal transport, volume 130 of UNITEXT. Springer Cham, 2021. doi:10.1007/978-3-030-72162-6. URL https://link.springer.com/book/10.1007/978-3-030-72162-6

work page doi:10.1007/978-3-030-72162-6 2021

[4] [4]

B. D. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12 0 (3): 0 313--326, 1982. ISSN 0304-4149. doi:10.1016/0304-4149(82)90051-5. URL https://www.sciencedirect.com/science/article/pii/0304414982900515

work page doi:10.1016/0304-4149(82)90051-5 1982

[5] [5]

Sampling normalizing constants in high dimensions using inhomogeneous diffusions

C. Andrieu, J. Ridgway, and N. Whiteley. Sampling normalizing constants in high dimensions using inhomogeneous diffusions. arXiv preprint arXiv:1612.07583, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Arrar, F

M. Arrar, F. M. Boubeta, M. E. Szretter, M. Sued, L. Boechi, and D. Rodriguez. On the accurate estimation of free energies using the Jarzynski equality. Journal of Computational Chemistry, 40 0 (4): 0 688--696, 2019. doi:10.1002/jcc.25754. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.25754

work page doi:10.1002/jcc.25754 2019

[7] [7]

Aurell, C

E. Aurell, C. Mej\' a-Monasterio, and P. Muratore-Ginanneschi. Optimal protocols and optimal transport in stochastic thermodynamics. Phys. Rev. Lett., 106: 0 250601, Jun 2011. doi:10.1103/PhysRevLett.106.250601. URL https://link.aps.org/doi/10.1103/PhysRevLett.106.250601

work page doi:10.1103/physrevlett.106.250601 2011

[8] [8]

Aurell, K

E. Aurell, K. Gaw e dzki, C. Mej\' a-Monasterio, R. Mohayaee, and P. Muratore-Ginanneschi. Refined second law of thermodynamics for fast random processes. Journal of statistical physics, 147: 0 487--505, 2012. doi:10.1007/s10955-012-0478-x

work page doi:10.1007/s10955-012-0478-x 2012

[9] [9]

Bakry, I

D. Bakry, I. Gentil, and M. Ledoux. Analysis and geometry of Markov diffusion operators , volume 103 of Grundlehren der mathematischen Wissenschaften. Springer Cham, 1 edition, 2014. doi:10.1007/978-3-319-00227-9

work page doi:10.1007/978-3-319-00227-9 2014

[10] [10]

Balasubramanian, S

K. Balasubramanian, S. Chewi, M. A. Erdogdu, A. Salim, and S. Zhang. Towards a theory of non-log-concave sampling: First-order stationarity guarantees for Langevin Monte Carlo . In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 2896--2923. PMLR, 0...

work page 2022

[11] [11]

Blessing, J

D. Blessing, J. Berner, L. Richter, and G. Neumann. Underdamped diffusion bridges with applications to sampling. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=Q1QTxFm0Is

work page 2025

[12] [12]

Brosse, A

N. Brosse, A. Durmus, and E. Moulines. Normalizing constants of log-concave densities. Electronic Journal of Statistics, 12 0 (1): 0 851 -- 889, 2018. doi:10.1214/18-EJS1411. URL https://doi.org/10.1214/18-EJS1411

work page doi:10.1214/18-ejs1411 2018

[13] [13]

Carbone, M

D. Carbone, M. Hua, S. Coste, and E. Vanden-Eijnden. Efficient training of energy-based models using Jarzynski equality. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 52583--52614. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_f...

work page 2023

[14] [14]

Chatterjee and P

S. Chatterjee and P. Diaconis. The sample size required in importance sampling. The Annals of Applied Probability, 28 0 (2): 0 1099 -- 1135, 2018. doi:10.1214/17-AAP1326. URL https://doi.org/10.1214/17-AAP1326

work page doi:10.1214/17-aap1326 2018

[15] [15]

Chehab, A

O. Chehab, A. Hyv\"arinen, and A. Risteski. Provable benefits of annealing for estimating normalizing constants: Importance sampling, noise-contrastive estimation, and beyond. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=iWGC0Nsq9i

work page 2023

[16] [16]

Chehab, A

O. Chehab, A. Korba, A. J. Stromme, and A. Vacher. Provable convergence and limitations of geometric tempering for Langevin dynamics. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=DZcmz9wU0i

work page 2025

[17] [17]

Chemseddine, C

J. Chemseddine, C. Wald, R. Duong, and G. Steidl. Neural sampling from Boltzmann densities: Fisher - Rao curves in the Wasserstein geometry. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=TUvg5uwdeG

work page 2025

[18] [18]

Chen and L

H. Chen and L. Ying. Ensemble-based annealed importance sampling. arXiv preprint arXiv:2401.15645, 2024

work page arXiv 2024

[19] [19]

J. Chen, L. Richter, J. Berner, D. Blessing, G. Neumann, and A. Anandkumar. Sequential controlled Langevin Diffusions . In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=dImD2sgy86

work page 2025

[20] [20]

Chen and Q.-M

M.-H. Chen and Q.-M. Shao. On Monte Carlo methods for estimating ratios of normalizing constants. The Annals of Statistics, 25 0 (4): 0 1563 -- 1594, 1997. doi:10.1214/aos/1031594732. URL https://doi.org/10.1214/aos/1031594732

work page doi:10.1214/aos/1031594732 1997

[21] [21]

S. Chen, S. Chewi, J. Li, Y. Li, A. Salim, and A. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=zyLVMgsZ0U_

work page 2023

[22] [22]

Y. Chen, T. T. Georgiou, and M. Pavon. On the relation between optimal transport and Schr\"odinger bridges: A stochastic control viewpoint. Journal of Optimization Theory and Applications, 169: 0 671--691, 2016. doi:10.1007/s10957-015-0803-z

work page doi:10.1007/s10957-015-0803-z 2016

[23] [23]

Y. Chen, T. T. Georgiou, and A. Tannenbaum. Stochastic control and nonequilibrium thermodynamics: Fundamental limits. IEEE Transactions on Automatic Control, 65 0 (7): 0 2979--2991, 2020. doi:10.1109/TAC.2019.2939625

work page doi:10.1109/tac.2019.2939625 2020

[24] [24]

Y. Chen, T. T. Georgiou, and M. Pavon. Stochastic control liaisons: Richard Sinkhorn meets Gaspard Monge on a Schr\"odinger bridge. SIAM Review, 63 0 (2): 0 249--313, 2021. doi:10.1137/20M1339982. URL https://doi.org/10.1137/20M1339982

work page doi:10.1137/20m1339982 2021

[25] [25]

Cheng, N

X. Cheng, N. S. Chatterji, P. L. Bartlett, and M. I. Jordan. Underdamped Langevin MCMC : A non-asymptotic analysis. In S. Bubeck, V. Perchet, and P. Rigollet, editors, Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 300--323. PMLR, 06--09 Jul 2018. URL https://proceedings.mlr.press/v75/ch...

work page 2018

[26] [26]

Cheng, B

X. Cheng, B. Wang, J. Zhang, and Y. Zhu. Fast conditional mixing of MCMC algorithms for non-log-concave distributions. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 13374--13394. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_fil...

work page 2023

[27] [27]

S. Chewi. Log-Concave Sampling. Book draft, in preparation, 2022. URL https://chewisinho.github.io

work page 2022

[28] [28]

Chewi, M

S. Chewi, M. A. Erdogdu, M. Li, R. Shen, and S. Zhang. Analysis of Langevin Monte Carlo from Poincar\'e to log- Sobolev . In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1--2. PMLR, 02--05 Jul 2022. URL https://proceedings.mlr.press/v178/chewi22a.html

work page 2022

[29] [29]

Chipot and A

C. Chipot and A. Pohorille, editors. Free Energy Calculations: Theory and Applications in Chemistry and Biology. Springer Series in Chemical Physics. Springer Berlin, Heidelberg, 2007. doi:10.1007/978-3-540-38448-9

work page doi:10.1007/978-3-540-38448-9 2007

[30] [30]

Conforti and L

G. Conforti and L. Tamanini. A formula for the time derivative of the entropic cost and applications. Journal of Functional Analysis, 280 0 (11): 0 108964, 2021. ISSN 0022-1236. doi:10.1016/j.jfa.2021.108964. URL https://www.sciencedirect.com/science/article/pii/S002212362100046X

work page doi:10.1016/j.jfa.2021.108964 2021

[31] [31]

Cousins and S

B. Cousins and S. Vempala. Gaussian cooling and O^*(n^3) algorithms for volume and Gaussian volume. SIAM Journal on Computing, 47 0 (3): 0 1237--1273, 2018. doi:10.1137/15M1054250. URL https://doi.org/10.1137/15M1054250

work page doi:10.1137/15m1054250 2018

[32] [32]

G. E. Crooks. Nonequilibrium measurements of free energy differences for microscopically reversible Markovian systems. Journal of Statistical Physics, 90: 0 1481--1487, 1998. doi:10.1023/A:1023208217925

work page doi:10.1023/a:1023208217925 1998

[33] [33]

G. E. Crooks. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E, 60: 0 2721--2726, Sep 1999. doi:10.1103/PhysRevE.60.2721. URL https://link.aps.org/doi/10.1103/PhysRevE.60.2721

work page doi:10.1103/physreve.60.2721 1999

[34] [34]

Del Moral, A

P. Del Moral, A. Doucet, and A. Jasra. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68 0 (3): 0 411--436, 5 2006

work page 2006

[35] [35]

Doucet, S

A. Doucet, S. Godsill, and C. Andrieu. On sequential Monte Carlo sampling methods for bayesian filtering. Statistics and computing, 10: 0 197--208, 2000. doi:10.1023/A:1008935410038

work page doi:10.1023/a:1008935410038 2000

[36] [36]

Doucet, W

A. Doucet, W. Grathwohl, A. G. Matthews, and H. Strathmann. Score-based diffusion meets annealed importance sampling. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 21482--21494. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/...

work page 2022

[37] [37]

M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM, 38 0 (1): 0 1–17, Jan. 1991. ISSN 0004-5411. doi:10.1145/102782.102783. URL https://doi.org/10.1145/102782.102783

work page doi:10.1145/102782.102783 1991

[38] [38]

Echeverria and L

I. Echeverria and L. M. Amzel. Estimation of free-energy differences from computed work distributions: An application of Jarzynski 's equality. The Journal of Physical Chemistry B, 116 0 (36): 0 10977--11396, 2012. doi:10.1021/jp300527q

work page doi:10.1021/jp300527q 2012

[39] [39]

Flamary, N

R. Flamary, N. Courty, A. Gramfort, M. Z. Alaya, A. Boisbunon, S. Chambon, L. Chapel, A. Corenflos, K. Fatras, N. Fournier, L. Gautheron, N. T. Gayraud, H. Janati, A. Rakotomamonjy, I. Redko, A. Rolet, A. Schutz, V. Seguy, D. J. Sutherland, R. Tavenard, A. Tong, and T. Vayer. POT : Python optimal transport. Journal of Machine Learning Research, 22 0 (78):...

work page 2021

[40] [40]

Ge and D.-Q

H. Ge and D.-Q. Jiang. Generalized Jarzynski 's equality of inhomogeneous multidimensional diffusion processes. Journal of Statistical Physics, 131: 0 675--689, 3 2008. ISSN 1572-9613. doi:10.1007/s10955-008-9520-4

work page doi:10.1007/s10955-008-9520-4 2008

[41] [41]

R. Ge, H. Lee, and A. Risteski. Beyond log-concavity: Provable guarantees for sampling multi-modal distributions using simulated tempering Langevin Monte Carlo . In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL htt...

work page 2018

[42] [42]

R. Ge, H. Lee, and J. Lu. Estimating normalizing constants for log-concave distributions: Algorithms and lower bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, page 579–586, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450369794. doi:10.1145/3357713.3384289. URL https://doi.org/10....

work page doi:10.1145/3357713.3384289 2020

[43] [43]

Gelman and X.-L

A. Gelman and X.-L. Meng. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13 0 (2): 0 163 -- 185, 1998. doi:10.1214/ss/1028905934. URL https://doi.org/10.1214/ss/1028905934

work page doi:10.1214/ss/1028905934 1998

[44] [44]

Gelman, J

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis . Chapman and Hall/CRC, 3 edition, 2013

work page 2013

[45] [45]

W. Guo, M. Tao, and Y. Chen. Provable benefit of annealed Langevin Monte Carlo for non-log-concave sampling. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=P6IVIoGRRg

work page 2025

[46] [46]

Hartmann and L

C. Hartmann and L. Richter. Nonasymptotic bounds for suboptimal importance sampling. SIAM/ASA Journal on Uncertainty Quantification, 12 0 (2): 0 309--346, 2024. doi:10.1137/21M1427760. URL https://doi.org/10.1137/21M1427760

work page doi:10.1137/21m1427760 2024

[47] [47]

Hartmann, L

C. Hartmann, L. Richter, C. Sch\"utte, and W. Zhang. Variational characterization of free energy: Theory and algorithms. Entropy, 19 0 (11), 2017. ISSN 1099-4300. doi:10.3390/e19110626. URL https://www.mdpi.com/1099-4300/19/11/626

work page doi:10.3390/e19110626 2017

[48] [48]

Hartmann, C

C. Hartmann, C. Sch\"utte, and W. Zhang. Jarzynski 's equality, fluctuation theorems, and variance reduction: Mathematical analysis and numerical algorithms. Journal of Statistical Physics, 175: 0 1214--1261, 2019. doi:10.1007/s10955-019-02286-4. URL https://doi.org/10.1007/s10955-019-02286-4

work page doi:10.1007/s10955-019-02286-4 2019

[49] [49]

J. He, Y. Du, F. Vargas, Y. Wang, C. P. Gomes, J. M. Hern\'andez-Lobato, and E. Vanden-Eijnden. FEAT : Free energy estimators with adaptive transport. arXiv preprint arXiv:2504.11516, 2025

work page arXiv 2025

[50] [50]

He and C

Y. He and C. Zhang. On the query complexity of sampling from non-log-concave distributions (extended abstract). In N. Haghtalab and A. Moitra, editors, Proceedings of Thirty Eighth Conference on Learning Theory, volume 291 of Proceedings of Machine Learning Research, pages 2786--2787. PMLR, 30 Jun--04 Jul 2025. URL https://proceedings.mlr.press/v291/he25a.html

work page 2025

[51] [51]

Y. He, K. Rojas, and M. Tao. Zeroth-order sampling methods for non-log-concave distributions: Alleviating metastability by denoising diffusion. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=X3Aljulsw5

work page 2024

[52] [52]

Huang, H

X. Huang, H. Dong, Y. Hao, Y. Ma, and T. Zhang. Reverse diffusion Monte Carlo . In The Twelfth International Conference on Learning Representations, 2024 a . URL https://openreview.net/forum?id=kIPEyMSdFV

work page 2024

[53] [53]

Huang, D

X. Huang, D. Zou, H. Dong, Y.-A. Ma, and T. Zhang. Faster sampling without isoperimetry via diffusion-based Monte Carlo . In S. Agrawal and A. Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 of Proceedings of Machine Learning Research, pages 2438--2493. PMLR, 30 Jun--03 Jul 2024 b . URL https://proceedings.mlr.press/...

work page 2024

[54] [54]

M. Huber. Approximation algorithms for the normalizing constant of Gibbs distributions. The Annals of Applied Probability, 25 0 (2): 0 974 -- 985, 2015. doi:10.1214/14-AAP1015. URL https://doi.org/10.1214/14-AAP1015

work page doi:10.1214/14-aap1015 2015

[55] [55]

Jarzynski

C. Jarzynski. Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78: 0 2690--2693, Apr 1997. doi:10.1103/PhysRevLett.78.2690. URL https://link.aps.org/doi/10.1103/PhysRevLett.78.2690

work page doi:10.1103/physrevlett.78.2690 1997

[56] [56]

Jasra, K

A. Jasra, K. Kamatani, P. P. Osei, and Y. Zhou. Multilevel particle filters: normalizing constant estimation. Statistics and Computing, 28: 0 47--60, 2018. doi:10.1007/s11222-016-9715-5

work page doi:10.1007/s11222-016-9715-5 2018

[57] [57]

M. R. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43: 0 169--188, 1986. ISSN 0304-3975. doi:10.1016/0304-3975(86)90174-X. URL https://www.sciencedirect.com/science/article/pii/030439758690174X

work page doi:10.1016/0304-3975(86)90174-x 1986

[58] [58]

Karatzas and S

I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus . Graduate Texts in Mathematics. Springer New York, NY, 2 edition, 1991. doi:10.1007/978-1-4612-0949-2

work page doi:10.1007/978-1-4612-0949-2 1991

[59] [59]

D. P. Kingma and M. Welling. Auto-encoding variational Bayes . arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[60] [60]

J. G. Kirkwood. Statistical mechanics of fluid mixtures. The Journal of Chemical Physics, 3 0 (5): 0 300--313, 05 1935. ISSN 0021-9606. doi:10.1063/1.1749657. URL https://doi.org/10.1063/1.1749657

work page doi:10.1063/1.1749657 1935

[61] [61]

Kook and S

Y. Kook and S. S. Vempala. Sampling and integration of logconcave functions by algorithmic diffusion. In Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC '25, page 924–932, New York, NY, USA, 2025. Association for Computing Machinery. ISBN 9798400715105. doi:10.1145/3717823.3718202. URL https://doi.org/10.1145/3717823.3718202

work page doi:10.1145/3717823.3718202 2025

[62] [62]

Y. Kook, S. Vempala, and M. S. Zhang. In-and-Out : Algorithmic diffusion for sampling convex bodies. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=aNQWRHyh15

work page 2024

[63] [63]

Kostov and N

S. Kostov and N. Whiteley. An algorithm for approximating the second moment of the normalizing constant estimate from a particle filter. Methodology and Computing in Applied Probability, 19: 0 799--818, 2017. doi:10.1007/s11009-016-9513-8

work page doi:10.1007/s11009-016-9513-8 2017

[64] [64]

Krause, A

O. Krause, A. Fischer, and C. Igel. Algorithms for estimating the partition function of restricted Boltzmann machines. Artificial Intelligence, 278: 0 103195, 2020. ISSN 0004-3702. doi:10.1016/j.artint.2019.103195. URL https://www.sciencedirect.com/science/article/pii/S0004370219301948

work page doi:10.1016/j.artint.2019.103195 2020

[65] [65]

Le Bris and P.-L

C. Le Bris and P.-L. Lions. Existence and uniqueness of solutions to Fokker–Planck type equations with irregular coefficients. Communications in Partial Differential Equations, 33 0 (7): 0 1272--1317, 2008. doi:10.1080/03605300801970952. URL https://doi.org/10.1080/03605300801970952

work page doi:10.1080/03605300801970952 2008

[66] [66]

H. Lee, J. Lu, and Y. Tan. Convergence of score-based generative modeling for general data distributions. In S. Agrawal and F. Orabona, editors, Proceedings of The 34th International Conference on Algorithmic Learning Theory, volume 201 of Proceedings of Machine Learning Research, pages 946--985. PMLR, 20 Feb--23 Feb 2023. URL https://proceedings.mlr.pres...

work page 2023

[67] [67]

Leli\`evre, M

T. Leli\`evre, M. Rousset, and G. Stoltz. Free Energy Computations: A Mathematical Perspective. Imperial College Press, 2010. doi:10.1142/p579

work page doi:10.1142/p579 2010

[68] [68]

L \'e onard

C. L \'e onard. A survey of the Schr\"odinger problem and some of its connections with optimal transport. Discrete and Continuous Dynamical Systems - Series A, 34 0 (4): 0 1533--1574, 2014. URL https://hal.science/hal-00849930

work page 2014

[69] [69]

J. Ma, J. Peng, S. Wang, and J. Xu. Estimating the partition function of graphical models using langevin importance sampling. In C. M. Carvalho and P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 31 of Proceedings of Machine Learning Research, pages 433--441, Scottsdale, Arizon...

work page 2013

[70] [70]

M\'at\'e and F

B. M\'at\'e and F. Fleuret. Learning interpolations between Boltzmann densities. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=TH6YrEcbth

work page 2023

[71] [71]

M\'at\'e, F

B. M\'at\'e, F. Fleuret, and T. Bereau. Neural thermodynamic integration: Free energies from energy-based diffusion models. The Journal of Physical Chemistry Letters, 15 0 (45): 0 11395--11404, 2024. doi:10.1021/acs.jpclett.4c01958. URL https://doi.org/10.1021/acs.jpclett.4c01958. PMID: 39503734

work page doi:10.1021/acs.jpclett.4c01958 2024

[72] [72]

Exactly solvable model illustrating far-from-equilibrium predictions

O. Mazonka and C. Jarzynski. Exactly solvable model illustrating far-from-equilibrium predictions. arXiv preprint cond-mat/9912121, 1999

work page internal anchor Pith review Pith/arXiv arXiv 1999

[73] [73]

Mazzanti and E

F. Mazzanti and E. Romero. Efficient evaluation of the partition function of RBMs with annealed importance sampling. arXiv preprint arXiv:2007.11926, 2020

work page arXiv 2007

[74] [74]

Meng and W

X.-L. Meng and W. H. Wong. Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica, 6 0 (4): 0 831--860, 1996. ISSN 10170405, 19968507. URL http://www.jstor.org/stable/24306045

work page arXiv 1996

[75] [75]

Mousavi-Hosseini, T

A. Mousavi-Hosseini, T. K. Farghly, Y. He, K. Balasubramanian, and M. A. Erdogdu. Towards a complete analysis of Langevin Monte Carlo : Beyond Poincar\'e inequality. In G. Neu and L. Rosasco, editors, Proceedings of Thirty Sixth Conference on Learning Theory, volume 195 of Proceedings of Machine Learning Research, pages 1--35. PMLR, 12--15 Jul 2023. URL h...

work page 2023

[76] [76]

R. M. Neal. Annealed importance sampling. Statistics and Computing, 11 0 (2): 0 125--139, April 2001. ISSN 1573-1375. doi:10.1023/A:1008923215028. URL https://doi.org/10.1023/A:1008923215028

work page doi:10.1023/a:1008923215028 2001

[77] [77]

E. Nelson. Dynamical Theories of Brownian Motion. Princeton University Press, 1967. ISBN 9780691079509. URL http://www.jstor.org/stable/j.ctv15r57jg

work page 1967

[78] [78]

N\"usken and L

N. N\"usken and L. Richter. Solving high-dimensional Hamilton -- Jacobi -- Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space. Partial differential equations and applications, 2 0 (4): 0 48, 2021. doi:10.1007/s42985-021-00102-x

work page doi:10.1007/s42985-021-00102-x 2021

[79] [79]

Pohorille, C

A. Pohorille, C. Jarzynski, and C. Chipot. Good practices in free-energy calculations. The Journal of Physical Chemistry B, 114 0 (32): 0 10235--10253, 2010. ISSN 1520-6106. doi:10.1021/jp102971x. URL https://doi.org/10.1021/jp102971x

work page doi:10.1021/jp102971x 2010

[80] [80]

Y. Ren, H. Chen, G. M. Rotskoff, and L. Ying. How discrete and continuous diffusion meet: Comprehensive analysis of discrete diffusion models via a stochastic integral framework. In The Thirteenth International Conference on Learning Representations, 2025 a . URL https://openreview.net/forum?id=6awxwQEI82

work page 2025