On quantitative Laplace-type convergence results for some exponential probability measures, with two applications

Agn\`es Desolneux; Valentin De Bortoli

arxiv: 2110.12922 · v2 · submitted 2021-10-25 · 🧮 math.PR · cs.LG· stat.ML

On quantitative Laplace-type convergence results for some exponential probability measures, with two applications

Valentin De Bortoli , Agn\`es Desolneux This is my paper

Pith reviewed 2026-05-24 13:03 UTC · model grok-4.3

classification 🧮 math.PR cs.LGstat.ML

keywords Laplace methodexponential measuresWasserstein distancenorm-like potentialsgeneralized Jacobiancoarea formulamaximum entropy modelsstochastic gradient Langevin dynamics

0 comments

The pith

For norm-like potentials, exponential measures converge quantitatively to their zero-temperature limits in Wasserstein-1 distance when a generalized Jacobian is invertible.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes quantitative bounds on how fast measures with densities proportional to exp(-U(x)/ε) approach their limiting distributions as ε tends to zero. These bounds hold in the 1-Wasserstein metric for potentials U that are norm-like, provided a generalized Jacobian remains invertible at the relevant points. The argument replaces the classical twice-differentiability requirement with geometric measure theory, specifically the coarea formula. The results are applied to maximum-entropy models and to the low-temperature behavior of stochastic gradient Langevin dynamics on non-convex landscapes.

Core claim

For norm-like potentials U, if a generalized Jacobian is invertible at the points of interest, then the measures π_ε converge to the limiting measure π_0 supported on the minimizers of U, and the convergence is quantitative in the Wasserstein distance of order 1.

What carries the argument

Invertibility condition on the generalized Jacobian of U, which permits application of the coarea formula to obtain explicit Wasserstein-1 bounds without classical Hessian assumptions.

If this is right

The same quantitative bounds relate microcanonical and macrocanonical distributions in maximum-entropy models.
The iterates of stochastic gradient Langevin dynamics converge in Wasserstein-1 to the set of global minimizers at low temperature for non-convex objectives.
The convergence rates hold without requiring the potential to be twice differentiable everywhere.
The coarea-formula approach yields explicit constants that depend on the geometry of the level sets of U.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same technique might be tested on other sampling schemes whose stationary measures are of Laplace type.
It remains open whether analogous bounds can be obtained in Wasserstein distance of order p greater than 1 under the same Jacobian condition.
The results suggest that invertibility of a first-order object can substitute for second-order non-degeneracy in several concentration problems.

Load-bearing premise

The potential U must be norm-like and its generalized Jacobian must be invertible at the relevant points.

What would settle it

A concrete norm-like potential U for which the generalized Jacobian is invertible at all required points yet the Wasserstein-1 distance between π_ε and π_0 fails to approach zero at the rate claimed as ε tends to zero.

Figures

Figures reproduced from arXiv: 2110.12922 by Agn\`es Desolneux, Valentin De Bortoli.

**Figure 1.** Figure 1: Left: graph of the polynomial x 7→ P(x), that has 4 zeros. Right: verifying the scaling relation of Proposition 6 by plotting ε 7→ 2πε[P 2 ], which is equivalent to ε as ε goes to 0. Two-dimensional ellipse In this second example, we consider the function F : R 2 → R given for any x = (x1, x2) ∈ R 2 by F(x) = a1x 2 1 + a2x 2 2 − 1 with a1, a2 > 0. For ε > 0, we define πε and π Ψ ε whose densities w.r.t the… view at source ↗

**Figure 2.** Figure 2: Distributions πε and π Ψ ε (first line), and histogram of their samples from them (second line). This experiment shows that the limit distribution of π Ψ ε as ε goes to 0 is the uniform microcanonical model given by the uniform distribution on the zeros of P. (dπ Ψ ε /dλ)(x) = JF(x) exp[−kF(x)k 2/ε]/ R R2 JF(˜x) exp[−kF(˜x)k 2/ε]d˜x , To sample from πε and π Ψ ε , we use two Markov chains given by the Unad… view at source ↗

**Figure 3.** Figure 3: Left: histogram of angles of the samples [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Left: the function x 7→ u(x, 0) with u given in (13). Right: the function x 7→ u(x, 0.5). The global minimizers are given by the dotted red lines. 3.2.3 The importance of the thermodynamic barrier To conclude this section, we investigate the role of the thermodynamic barrier in order to establish quantitative parametric Laplace-type results. This quantity should not be confused with the concept of kinetic … view at source ↗

**Figure 5.** Figure 5: Difference between the thermodynamic barrier (blue) and the kinetic barrier (red). 4 Proofs In this section, we gather the proofs of the previous sections. In Section 4.1 we prove Theorem 3. Then, in Section 4.2 we provide the proofs of the results of Section 3.1. Finally, the proofs of the results of Section 3.2 are given in Section 4.3. 4.1 Proof of Theorem 3 In this section, we prove Theorem 3. We recal… view at source ↗

read the original abstract

Laplace-type results characterize the limit of sequence of measures $(\pi_\varepsilon)_{\varepsilon >0}$ with density w.r.t the Lebesgue measure $(\mathrm{d} \pi_\varepsilon / \mathrm{d} \mathrm{Leb})(x) \propto \exp[-U(x)/\varepsilon]$ when the temperature $\varepsilon>0$ converges to $0$. If a limiting distribution $\pi_0$ exists, it concentrates on the minimizers of the potential $U$. Classical results require the invertibility of the Hessian of $U$ in order to establish such asymptotics. In this work, we study the particular case of norm-like potentials $U$ and establish quantitative bounds between $\pi_\varepsilon$ and $\pi_0$ w.r.t. the Wasserstein distance of order $1$ under an invertibility condition of a generalized Jacobian. One key element of our proof is the use of geometric measure theory tools such as the coarea formula. We apply our results to the study of maximum entropy models (microcanonical/macrocanonical distributions) and to the convergence of the iterates of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm at low temperatures for non-convex minimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper gives explicit W1 rates for Laplace asymptotics on norm-like potentials via coarea formula, with checks for max-entropy and low-temp SGLD.

read the letter

This paper derives quantitative W1 bounds between the exponential measures π_ε and their Laplace limit π_0 when the potential is norm-like and a generalized Jacobian is invertible at the right points. The argument uses the coarea formula from geometric measure theory to get the rates explicitly rather than just qualitative concentration. That is the main new piece relative to classical Laplace results that need the Hessian to be invertible everywhere. The applications section verifies the conditions for microcanonical versus macrocanonical maximum-entropy distributions and for the low-temperature iterates of SGLD on non-convex problems, which makes the abstract claims more usable. The derivation looks standard once the geometric tools are granted, with no obvious circularity or hidden fitting. The soft spots are the narrow scope: norm-like potentials plus the Jacobian condition rule out many common cases, and W1 is a relatively weak distance even if it is the natural one here. The rates themselves are not claimed to be optimal. This is for readers already working on precise asymptotics for Gibbs measures or on the theory of Langevin-type algorithms. It is not a broad reorganization of the field, but the quantitative statements are concrete and the applications are checked. I would send it to referees; the central claim is modest, grounded in external tools, and worth a careful look from someone in the area.

Referee Report

0 major / 3 minor

Summary. The paper establishes quantitative bounds in the 1-Wasserstein distance between the family of measures π_ε with Lebesgue density proportional to exp(−U(x)/ε) and the limiting measure π_0 supported on the minimizers of U, for the special case of norm-like potentials U. The main result requires an invertibility condition on a generalized Jacobian and is proved using the coarea formula from geometric measure theory. Two applications are developed: one to the equivalence of microcanonical and macrocanonical maximum-entropy distributions, and one to the low-temperature convergence of the iterates of stochastic gradient Langevin dynamics for non-convex objectives.

Significance. If the stated W1 bounds hold under the given hypotheses, the work supplies explicit quantitative rates for Laplace-type concentration in a non-smooth regime where the classical Hessian-invertibility assumption fails. The reliance on the coarea formula to handle the geometry of the level sets is a technically sound extension of existing GMT-based arguments. The two applications illustrate that the abstract condition can be verified in concrete statistical and algorithmic settings, which increases the result’s utility for both theoretical probability and optimization practice.

minor comments (3)

The precise definition of the generalized Jacobian and the points at which its invertibility is required should be stated in a single, self-contained paragraph early in the main-result section so that the hypotheses can be checked directly in the applications.
In the SGLD application, the passage from the continuous-time Langevin diffusion to the discrete iterates at low temperature would benefit from an explicit reference to the step-size and discretization-error controls that are used to transfer the W1 bound.
A short remark comparing the obtained W1 rate with the classical smooth-case rate (when the Hessian is invertible) would help readers gauge the price paid for allowing non-smooth norm-like potentials.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of our manuscript, the assessment of its significance, and the recommendation for minor revision. We appreciate the recognition of the technical approach using the coarea formula and the utility of the applications.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives quantitative W1 bounds for Laplace-type measures with norm-like potentials U under an explicit invertibility assumption on a generalized Jacobian. The proof invokes the coarea formula and other tools from geometric measure theory as independent external results. Applications proceed by direct verification of the stated hypotheses in each case. No equations reduce by construction to fitted parameters, self-definitions, or unverified self-citations; the central claim retains independent mathematical content grounded in standard analysis without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, invented entities, or ad-hoc axioms are described beyond standard assumptions of geometric measure theory and probability.

axioms (2)

standard math Coarea formula applies to the level sets of the potential U
Invoked as key element of the proof for quantitative bounds.
domain assumption Generalized Jacobian invertibility condition holds at minimizers
Stated as the condition under which the main result holds.

pith-pipeline@v0.9.0 · 5754 in / 1231 out tokens · 22587 ms · 2026-05-24T13:03:33.918273+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

under an invertibility condition of a generalized Jacobian... use of geometric measure theory tools such as the coarea formula
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

quantitative bounds between πε and π0 w.r.t. the Wasserstein distance of order 1

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 2 internal anchors

[1]

Functions of bounded variation and free discontinuity problems

Luigi Ambrosio, Nicola Fusco, and Diego Pallara. Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford Uni- versity Press, New York, 2000

work page 2000
[2]

Convergence of simulated annealing using foster-lyapunov criteria.Journal of Applied Probability, 38(4):975–994, 2001

Christophe Andrieu, Laird A Breyer, and Arnaud Doucet. Convergence of simulated annealing using foster-lyapunov criteria.Journal of Applied Probability, 38(4):975–994, 2001

work page 2001
[3]

Singularities of Diﬀerentiable Maps: Volume II Monodromy and Asymptotic Integrals, volume 83

Vladimir Igorevich Arnold, Aleksandr Nikolaevich Varchenko, and Sabir Medzhidovich Gusein- Zade. Singularities of Diﬀerentiable Maps: Volume II Monodromy and Asymptotic Integrals, volume 83. Springer Science & Business Media, 2012

work page 2012
[4]

Approximation of integrals over asymptotic sets with applications to probability and statistics

Philippe Barbe. Approximation of integrals over asymptotic sets with applications to proba- bility and statistics.arXiv preprint math/0312132, 2003

work page internal anchor Pith review Pith/arXiv arXiv 2003
[5]

Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory

Carl M Bender and Steven A Orszag. Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory. Springer Science & Business Media, 2013. 33

work page 2013
[6]

Asymptotic expansions of integrals

Norman Bleistein and Richard A Handelsman. Asymptotic expansions of integrals. Ardent Media, 1975

work page 1975
[7]

Nonnegative functions as squares or sums of squares.Journal of Functional Analysis, 232(1):137–147, 2006

Jean-Michel Bony, Fabrizio Broglia, Ferruccio Colombini, and Ludovico Pernazza. Nonnegative functions as squares or sums of squares.Journal of Functional Analysis, 232(1):137–147, 2006

work page 2006
[8]

Stability and generalization

Olivier Bousquet and André Elisseeﬀ. Stability and generalization. J. Mach. Learn. Res., 2(3):499–526, 2002

work page 2002
[9]

Les algorithmes stochastiques contournent-ils les pièges? Ann

Odile Brandière and Marie Duﬂo. Les algorithmes stochastiques contournent-ils les pièges? Ann. Inst. H. Poincaré Probab. Statist., 32(3):395–427, 1996

work page 1996
[10]

Convergence of Langevin-simulated annealing algorithms with multiplicative noise.arXiv preprint arXiv:2109.11669, 2021

Pierre Bras and Gilles Pagès. Convergence of Langevin-simulated annealing algorithms with multiplicative noise.arXiv preprint arXiv:2109.11669, 2021

work page arXiv 2021
[11]

Springer, 2006

Karl W Breitung.Asymptotic approximations for probability integrals. Springer, 2006

work page 2006
[12]

Multiscale sparse microcanonical models

Joan Bruna and Stéphane Mallat. Multiscale sparse microcanonical models. Mathematical Statistics and Learning, 1, 01 2018

work page 2018
[13]

The energy transformation method for the Metropolis algorithm compared with simulated annealing.Probability theory and related ﬁelds, 110(1):69–89, 1998

Olivier Catoni. The energy transformation method for the Metropolis algorithm compared with simulated annealing.Probability theory and related ﬁelds, 110(1):69–89, 1998

work page 1998
[14]

Stochastic Gradient Hamiltonian Monte Carlo for Non- Convex Learning.arXiv preprint arXiv:1903.10328, 2019

Huy N Chau and Miklos Rasonyi. Stochastic Gradient Hamiltonian Monte Carlo for Non- Convex Learning.arXiv preprint arXiv:1903.10328, 2019

work page arXiv 1903
[15]

Diﬀusion for global optimization in Rn

Tzuu-Shuh Chiang, Chii-Ruey Hwang, and Shuenn Jyi Sheu. Diﬀusion for global optimization in Rn. SIAM Journal on Control and Optimization, 25(3):737–753, 1987

work page 1987
[16]

An improved variant of simulated annealing that converges under fast cooling

Michael CH Choi. An improved variant of simulated annealing that converges under fast cooling. arXiv preprint arXiv:1901.10269, 2019

work page arXiv 1901
[17]

On the convergence of an improved discrete simulated annealing via land- scape modiﬁcation

Michael CH Choi. On the convergence of an improved discrete simulated annealing via land- scape modiﬁcation. arXiv preprint arXiv:2011.09680, 2020

work page arXiv 2011
[18]

Springer, 2006

Edmond Combet.Intégrales exponentielles: développements asymptotiques, propriétés lagrang- iennes, volume 937. Springer, 2006

work page 2006
[19]

I-divergence geometry of probability distributions and minimization problems

Imre Csiszár. I-divergence geometry of probability distributions and minimization problems. The annals of probability, pages 146–158, 1975

work page 1975
[20]

Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log- concave densities.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):651–676, 2017

work page 2017
[21]

Maximum entropy methods for texture synthesis: theory and practice

Valentin De Bortoli, Agnès Desolneux, Alain Durmus, Bruno Galerne, and Arthur Leclaire. Maximum entropy methods for texture synthesis: theory and practice. SIAM Journal on Mathematics of Data Science, 3(1):52–82, 2021

work page 2021
[22]

Courier Corporation, 1981

Nicolaas Govert De Bruijn.Asymptotic methods in analysis, volume 4. Courier Corporation, 1981. 34

work page 1981
[23]

Stochastic image reconstruction from local histograms of gradient orientation

Agnès Desolneux and Arthur Leclaire. Stochastic image reconstruction from local histograms of gradient orientation. In Franccois Lauze, Yiqiu Dong, and Anders Bjorholm Dahl, edi- tors, Scale Space and Variational Methods in Computer Vision - 6th International Conference, SSVM 2017, Kolding, Denmark, June 4-8, 2017, Proceedings, volume 10302 ofLecture Note...

work page 2017
[24]

Nonasymptotic convergence analysis for the unadjusted Langevin algorithm

Alain Durmus and Eric Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. The Annals of Applied Probability, 27(3):1551–1587, 2017

work page 2017
[25]

Ellis and Jay S

Richard S. Ellis and Jay S. Rosen. Laplace’s Method for Gaussian Integrals with an Application to Statistical Mechanics.The Annals of Probability, 10(1):47 – 66, 1982

work page 1982
[26]

Number 3

Arthur Erdélyi.Asymptotic expansions. Number 3. Courier Corporation, 1956

work page 1956
[27]

Global non-convex optimization with discretized diﬀusions

Murat A Erdogdu, Lester Mackey, and Ohad Shamir. Global non-convex optimization with discretized diﬀusions. arXiv preprint arXiv:1810.12361, 2018

work page arXiv 2018
[28]

Courier Dover Pub- lications, 2020

Marat Andreevich Evgrafov.Asymptotic estimates and entire functions. Courier Dover Pub- lications, 2020

work page 2020
[29]

Geometric measure theory

Herbert Federer. Geometric measure theory. Die Grundlehren der mathematischen Wis- senschaften, Band 153. Springer-Verlag New York Inc., New York, 1969

work page 1969
[30]

Asymptotic methods in analysis

MV Fedoryuk. Asymptotic methods in analysis. InAnalysis I, pages 83–191. Springer, 1989

work page 1989
[31]

On positivity of pseudo-diﬀerential operators

C Feﬀerman and Duong Hong Phong. On positivity of pseudo-diﬀerential operators. Pro- ceedings of the National Academy of Sciences of the United States of America, 75(10):4673, 1978

work page 1978
[32]

Global convergence of stochastic gradient Hamiltonian Monte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration

Xuefeng Gao, Mert Gürbüzbalaban, and Lingjiong Zhu. Global convergence of stochastic gradient Hamiltonian Monte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration. arXiv preprint arXiv:1809.04618, 2018

work page arXiv 2018
[33]

Recursive stochastic algorithms for global optimization in Rˆd.SIAM Journal on Control and Optimization, 29(5):999–1018, 1991

Saul B Gelfand and Sanjoy K Mitter. Recursive stochastic algorithms for global optimization in Rˆd.SIAM Journal on Control and Optimization, 29(5):999–1018, 1991

work page 1991
[34]

Gelfand and Sanjoy K

Saul B. Gelfand and Sanjoy K. Mitter. Metropolis-type annealing algorithms for global opti- mization in Rd. SIAM J. Control Optim., 31(1):111–131, 1993

work page 1993
[35]

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.IEEE Trans

Stuart Geman and Donald Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.IEEE Trans. Pattern Anal. Mach. Intell., 6(6):721–741, 1984

work page 1984
[36]

Diﬀusions for global optimization.SIAM Journal on Control and Optimization, 24(5):1031–1043, 1986

Stuart Geman and Chii-Ruey Hwang. Diﬀusions for global optimization.SIAM Journal on Control and Optimization, 24(5):1031–1043, 1986

work page 1986
[37]

Nonstationary Markov chains and convergence of the annealing algorithm.J

Basilis Gidas. Nonstationary Markov chains and convergence of the annealing algorithm.J. Statist. Phys., 39(1-2):73–131, 1985

work page 1985
[38]

Information and entropy econometrics: a review and synthesis, volume 3

Amos Golan. Information and entropy econometrics: a review and synthesis, volume 3. now publishers inc, 2008. 35

work page 2008
[39]

A tutorial survey of theory and applications of simulated annealing

Bruce Hajek. A tutorial survey of theory and applications of simulated annealing. In1985 24th IEEE Conference on Decision and Control, pages 755–760. IEEE, 1985

work page 1985
[40]

Cooling schedules for optimal annealing

Bruce Hajek. Cooling schedules for optimal annealing. Mathematics of operations research, 13(2):311–329, 1988

work page 1988
[41]

On the choice of a model to ﬁt data from an exponential family

Dominique MA Haughton. On the choice of a model to ﬁt data from an exponential family. The annals of statistics, pages 342–355, 1988

work page 1988
[42]

Resolutionofsingularitiesofanalgebraicvarietyoveraﬁeldofcharacteristic zero: Ii

HeisukeHironaka. Resolutionofsingularitiesofanalgebraicvarietyoveraﬁeldofcharacteristic zero: Ii. Annals of Mathematics, pages 205–326, 1964

work page 1964
[43]

Holley, Shigeo Kusuoka, and Daniel W

Richard A. Holley, Shigeo Kusuoka, and Daniel W. Stroock. Asymptotics of the spectral gap with applications to the theory of simulated annealing.J. Funct. Anal., 83(2):333–347, 1989

work page 1989
[44]

Laplace’s method revisited: Weak convergence of probability measures

Chii-Ruey Hwang. Laplace’s method revisited: Weak convergence of probability measures. Annals of Probability, 8(6):1177–1182, 1980

work page 1980
[45]

Stochastic diﬀerential equations and diﬀusion processes

NobuyukiIkedaandShinzoWatanabe. Stochastic diﬀerential equations and diﬀusion processes. Elsevier, 2014

work page 2014
[46]

E. T. Jaynes. Information theory and statistical mechanics.Phys. Rev., 1957

work page 1957
[47]

Optimization by simulated annealing: quantitative studies.J

Scott Kirkpatrick. Optimization by simulated annealing: quantitative studies.J. Statist. Phys., 34(5-6):975–986, 1984

work page 1984
[48]

On the asymptotic Laplacemethodanditsapplicationtorandomchaos

Dmitry Alekseevich Korshunov, Vladimir Il’ich Piterbarg, and E Hashorva. On the asymptotic Laplacemethodanditsapplicationtorandomchaos. Mathematical Notes, 97(5):878–891, 2015

work page 2015
[49]

Universitext

Serge Lang.Introduction to diﬀerentiable manifolds. Universitext. Springer-Verlag, New York, second edition, 2002

work page 2002
[50]

Notes on rectiﬁability.https://people

Urs Lang. Notes on rectiﬁability.https://people. math. ethz. ch/˜ lang/rect_notes. pdf, 2007

work page 2007
[51]

A numerical approach to some basic theorems in singularity theory

Ta Le Loi and Phan Phien. A numerical approach to some basic theorems in singularity theory. Mathematische Nachrichten, 287(7):764–781, 2014

work page 2014
[52]

Learning FRAME models using CNN ﬁlters

Yang Lu, Song-Chun Zhu, and Ying Nian Wu. Learning FRAME models using CNN ﬁlters. In Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence, pages 1902–1910, 2016

work page 1902
[53]

J. Milnor. Morse theory. Based on lecture notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton University Press, Princeton, N.J., 1963

work page 1963
[54]

Elsevier/Academic Press, Amsterdam, ﬁfth edition,

Frank Morgan.Geometric measure theory. Elsevier/Academic Press, Amsterdam, ﬁfth edition,

work page
[55]

A beginner’s guide, Illustrated by James F. Bredt

work page
[56]

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization

Than Huy Nguyen, Umut Simsekli, and Gaël Richard. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization. InInternational Conference on Machine Learning, pages 4810–4819. PMLR, 2019. 36

work page 2019
[57]

Topics in nonlinear functional analysis, volume 6 ofCourant Lecture Notes in Mathematics

Louis Nirenberg. Topics in nonlinear functional analysis, volume 6 ofCourant Lecture Notes in Mathematics. New York University, Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2001. Chapter 6 by E. Zehnder, Notes by R. A. Artino, Revised reprint of the 1974 original

work page 2001
[58]

CRC Press, 1997

Frank Olver.Asymptotics and special functions. CRC Press, 1997

work page 1997
[59]

Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing.Ann

Mariane Pelletier. Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing.Ann. Appl. Probab., 8(1):10–44, 1998

work page 1998
[60]

Simoncelli

Javier Portilla and Eero P. Simoncelli. A parametric texture model based on joint statistics of complex wavelet coeﬃcients.Int. J. Comput. Vis., 40(1):49–70, 2000

work page 2000
[61]

Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis

Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017

work page 2017
[62]

Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, pages 341–363, 1996

Gareth O Roberts and Richard L Tweedie. Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, pages 341–363, 1996

work page 1996
[63]

Simulated annealing for constrained global optimiza- tion

H Edwin Romeijn and Robert L Smith. Simulated annealing for constrained global optimiza- tion. Journal of Global Optimization, 5(2):101–126, 1994

work page 1994
[64]

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

Håvard Rue, Sara Martino, and Nicolas Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the royal statistical society: Series b (statistical methodology), 71(2):319–392, 2009

work page 2009
[65]

Multidimensional Watson lemma and its applications.Mathematical Notes, 99(3):406–412, 2016

Anastasiia Igorevna Rytova and Elena Borisovna Yarovaya. Multidimensional Watson lemma and its applications.Mathematical Notes, 99(3):406–412, 2016

work page 2016
[66]

Accurate approximations for posterior moments and marginal densities

Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the american statistical association, 81(393):82–86, 1986

work page 1986
[67]

Optimal transport, volume 338 of Grundlehren der Mathematischen Wis- senschaften [Fundamental Principles of Mathematical Sciences]

Cédric Villani. Optimal transport, volume 338 of Grundlehren der Mathematischen Wis- senschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009. Old and new

work page 2009
[68]

Tuning kinetics and thermodynamics of hydrogen storage in light metal element based systems–a review of recent progress.Journal of Alloys and Compounds, 658:280–300, 2016

H Wang, HJ Lin, WT Cai, LZ Ouyang, and M Zhu. Tuning kinetics and thermodynamics of hydrogen storage in light metal element based systems–a review of recent progress.Journal of Alloys and Compounds, 658:280–300, 2016

work page 2016
[69]

Bayesian learning via stochastic gradient Langevin dynamics

Max Welling and Yee W Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011

work page 2011
[70]

Analytic extensions of diﬀerentiable functions deﬁned in closed sets.Trans- actions of the American Mathematical Society, 36(1):63–89, 1934

Hassler Whitney. Analytic extensions of diﬀerentiable functions deﬁned in closed sets.Trans- actions of the American Mathematical Society, 36(1):63–89, 1934

work page 1934
[71]

R. Wong. Asymptotic approximations of integrals, volume 34 of Classics in Applied Math- ematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. Corrected reprint of the 1989 original. 37

work page 2001
[72]

Global convergence of Langevin dy- namics based algorithms for nonconvex optimization.arXiv preprint arXiv:1707.06618, 2017

Pan Xu, Jinghui Chen, Difan Zou, and Quanquan Gu. Global convergence of Langevin dy- namics based algorithms for nonconvex optimization.arXiv preprint arXiv:1707.06618, 2017

work page arXiv 2017
[73]

R. L. Yang. Convergence of the simulated annealing algorithm for continuous global optimiza- tion. J. Optim. Theory Appl., 104(3):691–716, 2000

work page 2000
[74]

Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

Nanyang Ye, Zhanxing Zhu, and Rafal K Mantiuk. Langevin dynamics with continuous tem- pering for training deep neural networks.arXiv preprint arXiv:1703.04379, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[75]

Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization

Ying Zhang, Ömer Deniz Akyildiz, Theodoros Damoulas, and Sotirios Sabanis. Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization. arXiv preprint arXiv:1910.02008, 2019

work page arXiv 1910
[76]

A hitting time analysis of stochastic gradient Langevin dynamics

Yuchen Zhang, Percy Liang, and Moses Charikar. A hitting time analysis of stochastic gradient Langevin dynamics. InConference on Learning Theory, pages 1980–2022. PMLR, 2017

work page 1980
[77]

Ziebart, Andrew L

Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy inverse reinforcement learning. In Dieter Fox and Carla P. Gomes, editors,Proceedings of the Twenty-Third AAAI Conference on Artiﬁcial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008, pages 1433–1438. AAAI Press, 2008. Organization of the appendix In t...

work page 2008
[78]

39 Lemma A.3

, A1 0 = (1/2)C −1 1 M −d∫ ¯B(0,M η/¯ε1/k) exp[− ∥x∥k]dx , A 2 0 = M −1Hd− ˆd(F −1(0)) , which concludes the proof. 39 Lemma A.3. Assume H1 and H2. Let ϕ : Rd → R and Cϕ ≥ 0 such that for anyx ∈ Rd |ϕ(x)| ≤ Cϕ exp[Cϕ∥x∥αk] . (36) Then, for any ¯ε ∈ (0, mk/(1 + Cϕ,Ψ)) and V ⊂ Rd open and bounded such thatF −1(0) ⊂ V there exist β1 > 0 and A1 ∈ C(R+, R+) su...

work page
[79]

Lemma A.4

and A1 = A1 1 + A2 1. Lemma A.4. Assume H1 and that d ≤ p. Then there exist N ∈ N, {xk 0}N k=1 ∈ (Rd)N and Wk ⊂ Rd open such that for anyk ∈ { 1, . . . , N}, xk 0 ∈ Wk, F : ¯Wk → F ( ¯Wk) is a bi-Lipschitz homeomorphism, for anyx ∈ Wk, dF(x) is injective and for anyj ∈ {1, . . . , N}, ¯Wk \ ¯Wj = ∅. In addition, F −1(0) = ∪N k=1{xk 0}. Proof. Since, for a...

work page
[80]

, N} and v ∈ Rd, v⊤H(xk 0, xk 0)v ≥ m∥v∥2 with H(x, y) = DF(x)⊤DF(y) for any x, y ∈ Rd

> 0, there existsm > 0 such that for anyk ∈ {1, . . . , N} and v ∈ Rd, v⊤H(xk 0, xk 0)v ≥ m∥v∥2 with H(x, y) = DF(x)⊤DF(y) for any x, y ∈ Rd. For anyk ∈ {1, . . . , N}, there existsWk ⊂ Vk such that for anyx, y ∈ Wk we have∥H(x, y)−H(xk 0, xk 0)∥2 ≤ m/2. Therefore we have for anyx, y ∈ Wk ∥F (x) − F(y)∥2 = ∫ 1 0 ∫ 1 0 ⟨DF(x + t(y − x))(y − x), DF(x + s(y ...

work page

[1] [1]

Functions of bounded variation and free discontinuity problems

Luigi Ambrosio, Nicola Fusco, and Diego Pallara. Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford Uni- versity Press, New York, 2000

work page 2000

[2] [2]

Convergence of simulated annealing using foster-lyapunov criteria.Journal of Applied Probability, 38(4):975–994, 2001

Christophe Andrieu, Laird A Breyer, and Arnaud Doucet. Convergence of simulated annealing using foster-lyapunov criteria.Journal of Applied Probability, 38(4):975–994, 2001

work page 2001

[3] [3]

Singularities of Diﬀerentiable Maps: Volume II Monodromy and Asymptotic Integrals, volume 83

Vladimir Igorevich Arnold, Aleksandr Nikolaevich Varchenko, and Sabir Medzhidovich Gusein- Zade. Singularities of Diﬀerentiable Maps: Volume II Monodromy and Asymptotic Integrals, volume 83. Springer Science & Business Media, 2012

work page 2012

[4] [4]

Approximation of integrals over asymptotic sets with applications to probability and statistics

Philippe Barbe. Approximation of integrals over asymptotic sets with applications to proba- bility and statistics.arXiv preprint math/0312132, 2003

work page internal anchor Pith review Pith/arXiv arXiv 2003

[5] [5]

Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory

Carl M Bender and Steven A Orszag. Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory. Springer Science & Business Media, 2013. 33

work page 2013

[6] [6]

Asymptotic expansions of integrals

Norman Bleistein and Richard A Handelsman. Asymptotic expansions of integrals. Ardent Media, 1975

work page 1975

[7] [7]

Nonnegative functions as squares or sums of squares.Journal of Functional Analysis, 232(1):137–147, 2006

Jean-Michel Bony, Fabrizio Broglia, Ferruccio Colombini, and Ludovico Pernazza. Nonnegative functions as squares or sums of squares.Journal of Functional Analysis, 232(1):137–147, 2006

work page 2006

[8] [8]

Stability and generalization

Olivier Bousquet and André Elisseeﬀ. Stability and generalization. J. Mach. Learn. Res., 2(3):499–526, 2002

work page 2002

[9] [9]

Les algorithmes stochastiques contournent-ils les pièges? Ann

Odile Brandière and Marie Duﬂo. Les algorithmes stochastiques contournent-ils les pièges? Ann. Inst. H. Poincaré Probab. Statist., 32(3):395–427, 1996

work page 1996

[10] [10]

Convergence of Langevin-simulated annealing algorithms with multiplicative noise.arXiv preprint arXiv:2109.11669, 2021

Pierre Bras and Gilles Pagès. Convergence of Langevin-simulated annealing algorithms with multiplicative noise.arXiv preprint arXiv:2109.11669, 2021

work page arXiv 2021

[11] [11]

Springer, 2006

Karl W Breitung.Asymptotic approximations for probability integrals. Springer, 2006

work page 2006

[12] [12]

Multiscale sparse microcanonical models

Joan Bruna and Stéphane Mallat. Multiscale sparse microcanonical models. Mathematical Statistics and Learning, 1, 01 2018

work page 2018

[13] [13]

The energy transformation method for the Metropolis algorithm compared with simulated annealing.Probability theory and related ﬁelds, 110(1):69–89, 1998

Olivier Catoni. The energy transformation method for the Metropolis algorithm compared with simulated annealing.Probability theory and related ﬁelds, 110(1):69–89, 1998

work page 1998

[14] [14]

Stochastic Gradient Hamiltonian Monte Carlo for Non- Convex Learning.arXiv preprint arXiv:1903.10328, 2019

Huy N Chau and Miklos Rasonyi. Stochastic Gradient Hamiltonian Monte Carlo for Non- Convex Learning.arXiv preprint arXiv:1903.10328, 2019

work page arXiv 1903

[15] [15]

Diﬀusion for global optimization in Rn

Tzuu-Shuh Chiang, Chii-Ruey Hwang, and Shuenn Jyi Sheu. Diﬀusion for global optimization in Rn. SIAM Journal on Control and Optimization, 25(3):737–753, 1987

work page 1987

[16] [16]

An improved variant of simulated annealing that converges under fast cooling

Michael CH Choi. An improved variant of simulated annealing that converges under fast cooling. arXiv preprint arXiv:1901.10269, 2019

work page arXiv 1901

[17] [17]

On the convergence of an improved discrete simulated annealing via land- scape modiﬁcation

Michael CH Choi. On the convergence of an improved discrete simulated annealing via land- scape modiﬁcation. arXiv preprint arXiv:2011.09680, 2020

work page arXiv 2011

[18] [18]

Springer, 2006

Edmond Combet.Intégrales exponentielles: développements asymptotiques, propriétés lagrang- iennes, volume 937. Springer, 2006

work page 2006

[19] [19]

I-divergence geometry of probability distributions and minimization problems

Imre Csiszár. I-divergence geometry of probability distributions and minimization problems. The annals of probability, pages 146–158, 1975

work page 1975

[20] [20]

Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log- concave densities.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):651–676, 2017

work page 2017

[21] [21]

Maximum entropy methods for texture synthesis: theory and practice

Valentin De Bortoli, Agnès Desolneux, Alain Durmus, Bruno Galerne, and Arthur Leclaire. Maximum entropy methods for texture synthesis: theory and practice. SIAM Journal on Mathematics of Data Science, 3(1):52–82, 2021

work page 2021

[22] [22]

Courier Corporation, 1981

Nicolaas Govert De Bruijn.Asymptotic methods in analysis, volume 4. Courier Corporation, 1981. 34

work page 1981

[23] [23]

Stochastic image reconstruction from local histograms of gradient orientation

Agnès Desolneux and Arthur Leclaire. Stochastic image reconstruction from local histograms of gradient orientation. In Franccois Lauze, Yiqiu Dong, and Anders Bjorholm Dahl, edi- tors, Scale Space and Variational Methods in Computer Vision - 6th International Conference, SSVM 2017, Kolding, Denmark, June 4-8, 2017, Proceedings, volume 10302 ofLecture Note...

work page 2017

[24] [24]

Nonasymptotic convergence analysis for the unadjusted Langevin algorithm

Alain Durmus and Eric Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. The Annals of Applied Probability, 27(3):1551–1587, 2017

work page 2017

[25] [25]

Ellis and Jay S

Richard S. Ellis and Jay S. Rosen. Laplace’s Method for Gaussian Integrals with an Application to Statistical Mechanics.The Annals of Probability, 10(1):47 – 66, 1982

work page 1982

[26] [26]

Number 3

Arthur Erdélyi.Asymptotic expansions. Number 3. Courier Corporation, 1956

work page 1956

[27] [27]

Global non-convex optimization with discretized diﬀusions

Murat A Erdogdu, Lester Mackey, and Ohad Shamir. Global non-convex optimization with discretized diﬀusions. arXiv preprint arXiv:1810.12361, 2018

work page arXiv 2018

[28] [28]

Courier Dover Pub- lications, 2020

Marat Andreevich Evgrafov.Asymptotic estimates and entire functions. Courier Dover Pub- lications, 2020

work page 2020

[29] [29]

Geometric measure theory

Herbert Federer. Geometric measure theory. Die Grundlehren der mathematischen Wis- senschaften, Band 153. Springer-Verlag New York Inc., New York, 1969

work page 1969

[30] [30]

Asymptotic methods in analysis

MV Fedoryuk. Asymptotic methods in analysis. InAnalysis I, pages 83–191. Springer, 1989

work page 1989

[31] [31]

On positivity of pseudo-diﬀerential operators

C Feﬀerman and Duong Hong Phong. On positivity of pseudo-diﬀerential operators. Pro- ceedings of the National Academy of Sciences of the United States of America, 75(10):4673, 1978

work page 1978

[32] [32]

Global convergence of stochastic gradient Hamiltonian Monte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration

Xuefeng Gao, Mert Gürbüzbalaban, and Lingjiong Zhu. Global convergence of stochastic gradient Hamiltonian Monte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration. arXiv preprint arXiv:1809.04618, 2018

work page arXiv 2018

[33] [33]

Recursive stochastic algorithms for global optimization in Rˆd.SIAM Journal on Control and Optimization, 29(5):999–1018, 1991

Saul B Gelfand and Sanjoy K Mitter. Recursive stochastic algorithms for global optimization in Rˆd.SIAM Journal on Control and Optimization, 29(5):999–1018, 1991

work page 1991

[34] [34]

Gelfand and Sanjoy K

Saul B. Gelfand and Sanjoy K. Mitter. Metropolis-type annealing algorithms for global opti- mization in Rd. SIAM J. Control Optim., 31(1):111–131, 1993

work page 1993

[35] [35]

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.IEEE Trans

Stuart Geman and Donald Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.IEEE Trans. Pattern Anal. Mach. Intell., 6(6):721–741, 1984

work page 1984

[36] [36]

Diﬀusions for global optimization.SIAM Journal on Control and Optimization, 24(5):1031–1043, 1986

Stuart Geman and Chii-Ruey Hwang. Diﬀusions for global optimization.SIAM Journal on Control and Optimization, 24(5):1031–1043, 1986

work page 1986

[37] [37]

Nonstationary Markov chains and convergence of the annealing algorithm.J

Basilis Gidas. Nonstationary Markov chains and convergence of the annealing algorithm.J. Statist. Phys., 39(1-2):73–131, 1985

work page 1985

[38] [38]

Information and entropy econometrics: a review and synthesis, volume 3

Amos Golan. Information and entropy econometrics: a review and synthesis, volume 3. now publishers inc, 2008. 35

work page 2008

[39] [39]

A tutorial survey of theory and applications of simulated annealing

Bruce Hajek. A tutorial survey of theory and applications of simulated annealing. In1985 24th IEEE Conference on Decision and Control, pages 755–760. IEEE, 1985

work page 1985

[40] [40]

Cooling schedules for optimal annealing

Bruce Hajek. Cooling schedules for optimal annealing. Mathematics of operations research, 13(2):311–329, 1988

work page 1988

[41] [41]

On the choice of a model to ﬁt data from an exponential family

Dominique MA Haughton. On the choice of a model to ﬁt data from an exponential family. The annals of statistics, pages 342–355, 1988

work page 1988

[42] [42]

Resolutionofsingularitiesofanalgebraicvarietyoveraﬁeldofcharacteristic zero: Ii

HeisukeHironaka. Resolutionofsingularitiesofanalgebraicvarietyoveraﬁeldofcharacteristic zero: Ii. Annals of Mathematics, pages 205–326, 1964

work page 1964

[43] [43]

Holley, Shigeo Kusuoka, and Daniel W

Richard A. Holley, Shigeo Kusuoka, and Daniel W. Stroock. Asymptotics of the spectral gap with applications to the theory of simulated annealing.J. Funct. Anal., 83(2):333–347, 1989

work page 1989

[44] [44]

Laplace’s method revisited: Weak convergence of probability measures

Chii-Ruey Hwang. Laplace’s method revisited: Weak convergence of probability measures. Annals of Probability, 8(6):1177–1182, 1980

work page 1980

[45] [45]

Stochastic diﬀerential equations and diﬀusion processes

NobuyukiIkedaandShinzoWatanabe. Stochastic diﬀerential equations and diﬀusion processes. Elsevier, 2014

work page 2014

[46] [46]

E. T. Jaynes. Information theory and statistical mechanics.Phys. Rev., 1957

work page 1957

[47] [47]

Optimization by simulated annealing: quantitative studies.J

Scott Kirkpatrick. Optimization by simulated annealing: quantitative studies.J. Statist. Phys., 34(5-6):975–986, 1984

work page 1984

[48] [48]

On the asymptotic Laplacemethodanditsapplicationtorandomchaos

Dmitry Alekseevich Korshunov, Vladimir Il’ich Piterbarg, and E Hashorva. On the asymptotic Laplacemethodanditsapplicationtorandomchaos. Mathematical Notes, 97(5):878–891, 2015

work page 2015

[49] [49]

Universitext

Serge Lang.Introduction to diﬀerentiable manifolds. Universitext. Springer-Verlag, New York, second edition, 2002

work page 2002

[50] [50]

Notes on rectiﬁability.https://people

Urs Lang. Notes on rectiﬁability.https://people. math. ethz. ch/˜ lang/rect_notes. pdf, 2007

work page 2007

[51] [51]

A numerical approach to some basic theorems in singularity theory

Ta Le Loi and Phan Phien. A numerical approach to some basic theorems in singularity theory. Mathematische Nachrichten, 287(7):764–781, 2014

work page 2014

[52] [52]

Learning FRAME models using CNN ﬁlters

Yang Lu, Song-Chun Zhu, and Ying Nian Wu. Learning FRAME models using CNN ﬁlters. In Proceedings of the Thirtieth AAAI Conference on Artiﬁcial Intelligence, pages 1902–1910, 2016

work page 1902

[53] [53]

J. Milnor. Morse theory. Based on lecture notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton University Press, Princeton, N.J., 1963

work page 1963

[54] [54]

Elsevier/Academic Press, Amsterdam, ﬁfth edition,

Frank Morgan.Geometric measure theory. Elsevier/Academic Press, Amsterdam, ﬁfth edition,

work page

[55] [55]

A beginner’s guide, Illustrated by James F. Bredt

work page

[56] [56]

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization

Than Huy Nguyen, Umut Simsekli, and Gaël Richard. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization. InInternational Conference on Machine Learning, pages 4810–4819. PMLR, 2019. 36

work page 2019

[57] [57]

Topics in nonlinear functional analysis, volume 6 ofCourant Lecture Notes in Mathematics

Louis Nirenberg. Topics in nonlinear functional analysis, volume 6 ofCourant Lecture Notes in Mathematics. New York University, Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2001. Chapter 6 by E. Zehnder, Notes by R. A. Artino, Revised reprint of the 1974 original

work page 2001

[58] [58]

CRC Press, 1997

Frank Olver.Asymptotics and special functions. CRC Press, 1997

work page 1997

[59] [59]

Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing.Ann

Mariane Pelletier. Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing.Ann. Appl. Probab., 8(1):10–44, 1998

work page 1998

[60] [60]

Simoncelli

Javier Portilla and Eero P. Simoncelli. A parametric texture model based on joint statistics of complex wavelet coeﬃcients.Int. J. Comput. Vis., 40(1):49–70, 2000

work page 2000

[61] [61]

Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis

Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017

work page 2017

[62] [62]

Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, pages 341–363, 1996

Gareth O Roberts and Richard L Tweedie. Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, pages 341–363, 1996

work page 1996

[63] [63]

Simulated annealing for constrained global optimiza- tion

H Edwin Romeijn and Robert L Smith. Simulated annealing for constrained global optimiza- tion. Journal of Global Optimization, 5(2):101–126, 1994

work page 1994

[64] [64]

Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

Håvard Rue, Sara Martino, and Nicolas Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the royal statistical society: Series b (statistical methodology), 71(2):319–392, 2009

work page 2009

[65] [65]

Multidimensional Watson lemma and its applications.Mathematical Notes, 99(3):406–412, 2016

Anastasiia Igorevna Rytova and Elena Borisovna Yarovaya. Multidimensional Watson lemma and its applications.Mathematical Notes, 99(3):406–412, 2016

work page 2016

[66] [66]

Accurate approximations for posterior moments and marginal densities

Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the american statistical association, 81(393):82–86, 1986

work page 1986

[67] [67]

Optimal transport, volume 338 of Grundlehren der Mathematischen Wis- senschaften [Fundamental Principles of Mathematical Sciences]

Cédric Villani. Optimal transport, volume 338 of Grundlehren der Mathematischen Wis- senschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009. Old and new

work page 2009

[68] [68]

Tuning kinetics and thermodynamics of hydrogen storage in light metal element based systems–a review of recent progress.Journal of Alloys and Compounds, 658:280–300, 2016

H Wang, HJ Lin, WT Cai, LZ Ouyang, and M Zhu. Tuning kinetics and thermodynamics of hydrogen storage in light metal element based systems–a review of recent progress.Journal of Alloys and Compounds, 658:280–300, 2016

work page 2016

[69] [69]

Bayesian learning via stochastic gradient Langevin dynamics

Max Welling and Yee W Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011

work page 2011

[70] [70]

Analytic extensions of diﬀerentiable functions deﬁned in closed sets.Trans- actions of the American Mathematical Society, 36(1):63–89, 1934

Hassler Whitney. Analytic extensions of diﬀerentiable functions deﬁned in closed sets.Trans- actions of the American Mathematical Society, 36(1):63–89, 1934

work page 1934

[71] [71]

R. Wong. Asymptotic approximations of integrals, volume 34 of Classics in Applied Math- ematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. Corrected reprint of the 1989 original. 37

work page 2001

[72] [72]

Global convergence of Langevin dy- namics based algorithms for nonconvex optimization.arXiv preprint arXiv:1707.06618, 2017

Pan Xu, Jinghui Chen, Difan Zou, and Quanquan Gu. Global convergence of Langevin dy- namics based algorithms for nonconvex optimization.arXiv preprint arXiv:1707.06618, 2017

work page arXiv 2017

[73] [73]

R. L. Yang. Convergence of the simulated annealing algorithm for continuous global optimiza- tion. J. Optim. Theory Appl., 104(3):691–716, 2000

work page 2000

[74] [74]

Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

Nanyang Ye, Zhanxing Zhu, and Rafal K Mantiuk. Langevin dynamics with continuous tem- pering for training deep neural networks.arXiv preprint arXiv:1703.04379, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[75] [75]

Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization

Ying Zhang, Ömer Deniz Akyildiz, Theodoros Damoulas, and Sotirios Sabanis. Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization. arXiv preprint arXiv:1910.02008, 2019

work page arXiv 1910

[76] [76]

A hitting time analysis of stochastic gradient Langevin dynamics

Yuchen Zhang, Percy Liang, and Moses Charikar. A hitting time analysis of stochastic gradient Langevin dynamics. InConference on Learning Theory, pages 1980–2022. PMLR, 2017

work page 1980

[77] [77]

Ziebart, Andrew L

Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy inverse reinforcement learning. In Dieter Fox and Carla P. Gomes, editors,Proceedings of the Twenty-Third AAAI Conference on Artiﬁcial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008, pages 1433–1438. AAAI Press, 2008. Organization of the appendix In t...

work page 2008

[78] [78]

39 Lemma A.3

, A1 0 = (1/2)C −1 1 M −d∫ ¯B(0,M η/¯ε1/k) exp[− ∥x∥k]dx , A 2 0 = M −1Hd− ˆd(F −1(0)) , which concludes the proof. 39 Lemma A.3. Assume H1 and H2. Let ϕ : Rd → R and Cϕ ≥ 0 such that for anyx ∈ Rd |ϕ(x)| ≤ Cϕ exp[Cϕ∥x∥αk] . (36) Then, for any ¯ε ∈ (0, mk/(1 + Cϕ,Ψ)) and V ⊂ Rd open and bounded such thatF −1(0) ⊂ V there exist β1 > 0 and A1 ∈ C(R+, R+) su...

work page

[79] [79]

Lemma A.4

and A1 = A1 1 + A2 1. Lemma A.4. Assume H1 and that d ≤ p. Then there exist N ∈ N, {xk 0}N k=1 ∈ (Rd)N and Wk ⊂ Rd open such that for anyk ∈ { 1, . . . , N}, xk 0 ∈ Wk, F : ¯Wk → F ( ¯Wk) is a bi-Lipschitz homeomorphism, for anyx ∈ Wk, dF(x) is injective and for anyj ∈ {1, . . . , N}, ¯Wk \ ¯Wj = ∅. In addition, F −1(0) = ∪N k=1{xk 0}. Proof. Since, for a...

work page

[80] [80]

, N} and v ∈ Rd, v⊤H(xk 0, xk 0)v ≥ m∥v∥2 with H(x, y) = DF(x)⊤DF(y) for any x, y ∈ Rd

> 0, there existsm > 0 such that for anyk ∈ {1, . . . , N} and v ∈ Rd, v⊤H(xk 0, xk 0)v ≥ m∥v∥2 with H(x, y) = DF(x)⊤DF(y) for any x, y ∈ Rd. For anyk ∈ {1, . . . , N}, there existsWk ⊂ Vk such that for anyx, y ∈ Wk we have∥H(x, y)−H(xk 0, xk 0)∥2 ≤ m/2. Therefore we have for anyx, y ∈ Wk ∥F (x) − F(y)∥2 = ∫ 1 0 ∫ 1 0 ⟨DF(x + t(y − x))(y − x), DF(x + s(y ...

work page