pith. sign in

arxiv: 2110.12922 · v2 · submitted 2021-10-25 · 🧮 math.PR · cs.LG· stat.ML

On quantitative Laplace-type convergence results for some exponential probability measures, with two applications

Pith reviewed 2026-05-24 13:03 UTC · model grok-4.3

classification 🧮 math.PR cs.LGstat.ML
keywords Laplace methodexponential measuresWasserstein distancenorm-like potentialsgeneralized Jacobiancoarea formulamaximum entropy modelsstochastic gradient Langevin dynamics
0
0 comments X

The pith

For norm-like potentials, exponential measures converge quantitatively to their zero-temperature limits in Wasserstein-1 distance when a generalized Jacobian is invertible.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes quantitative bounds on how fast measures with densities proportional to exp(-U(x)/ε) approach their limiting distributions as ε tends to zero. These bounds hold in the 1-Wasserstein metric for potentials U that are norm-like, provided a generalized Jacobian remains invertible at the relevant points. The argument replaces the classical twice-differentiability requirement with geometric measure theory, specifically the coarea formula. The results are applied to maximum-entropy models and to the low-temperature behavior of stochastic gradient Langevin dynamics on non-convex landscapes.

Core claim

For norm-like potentials U, if a generalized Jacobian is invertible at the points of interest, then the measures π_ε converge to the limiting measure π_0 supported on the minimizers of U, and the convergence is quantitative in the Wasserstein distance of order 1.

What carries the argument

Invertibility condition on the generalized Jacobian of U, which permits application of the coarea formula to obtain explicit Wasserstein-1 bounds without classical Hessian assumptions.

If this is right

  • The same quantitative bounds relate microcanonical and macrocanonical distributions in maximum-entropy models.
  • The iterates of stochastic gradient Langevin dynamics converge in Wasserstein-1 to the set of global minimizers at low temperature for non-convex objectives.
  • The convergence rates hold without requiring the potential to be twice differentiable everywhere.
  • The coarea-formula approach yields explicit constants that depend on the geometry of the level sets of U.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same technique might be tested on other sampling schemes whose stationary measures are of Laplace type.
  • It remains open whether analogous bounds can be obtained in Wasserstein distance of order p greater than 1 under the same Jacobian condition.
  • The results suggest that invertibility of a first-order object can substitute for second-order non-degeneracy in several concentration problems.

Load-bearing premise

The potential U must be norm-like and its generalized Jacobian must be invertible at the relevant points.

What would settle it

A concrete norm-like potential U for which the generalized Jacobian is invertible at all required points yet the Wasserstein-1 distance between π_ε and π_0 fails to approach zero at the rate claimed as ε tends to zero.

Figures

Figures reproduced from arXiv: 2110.12922 by Agn\`es Desolneux, Valentin De Bortoli.

Figure 1
Figure 1. Figure 1: Left: graph of the polynomial x 7→ P(x), that has 4 zeros. Right: verifying the scaling relation of Proposition 6 by plotting ε 7→ 2πε[P 2 ], which is equivalent to ε as ε goes to 0. Two-dimensional ellipse In this second example, we consider the function F : R 2 → R given for any x = (x1, x2) ∈ R 2 by F(x) = a1x 2 1 + a2x 2 2 − 1 with a1, a2 > 0. For ε > 0, we define πε and π Ψ ε whose densities w.r.t the… view at source ↗
Figure 2
Figure 2. Figure 2: Distributions πε and π Ψ ε (first line), and histogram of their samples from them (second line). This experiment shows that the limit distribution of π Ψ ε as ε goes to 0 is the uniform microcanonical model given by the uniform distribution on the zeros of P. (dπ Ψ ε /dλ)(x) = JF(x) exp[−kF(x)k 2/ε]/ R R2 JF(˜x) exp[−kF(˜x)k 2/ε]d˜x , To sample from πε and π Ψ ε , we use two Markov chains given by the Unad… view at source ↗
Figure 3
Figure 3. Figure 3: Left: histogram of angles of the samples [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Left: the function x 7→ u(x, 0) with u given in (13). Right: the function x 7→ u(x, 0.5). The global minimizers are given by the dotted red lines. 3.2.3 The importance of the thermodynamic barrier To conclude this section, we investigate the role of the thermodynamic barrier in order to establish quantitative parametric Laplace-type results. This quantity should not be confused with the concept of kinetic … view at source ↗
Figure 5
Figure 5. Figure 5: Difference between the thermodynamic barrier (blue) and the kinetic barrier (red). 4 Proofs In this section, we gather the proofs of the previous sections. In Section 4.1 we prove Theorem 3. Then, in Section 4.2 we provide the proofs of the results of Section 3.1. Finally, the proofs of the results of Section 3.2 are given in Section 4.3. 4.1 Proof of Theorem 3 In this section, we prove Theorem 3. We recal… view at source ↗
read the original abstract

Laplace-type results characterize the limit of sequence of measures $(\pi_\varepsilon)_{\varepsilon >0}$ with density w.r.t the Lebesgue measure $(\mathrm{d} \pi_\varepsilon / \mathrm{d} \mathrm{Leb})(x) \propto \exp[-U(x)/\varepsilon]$ when the temperature $\varepsilon>0$ converges to $0$. If a limiting distribution $\pi_0$ exists, it concentrates on the minimizers of the potential $U$. Classical results require the invertibility of the Hessian of $U$ in order to establish such asymptotics. In this work, we study the particular case of norm-like potentials $U$ and establish quantitative bounds between $\pi_\varepsilon$ and $\pi_0$ w.r.t. the Wasserstein distance of order $1$ under an invertibility condition of a generalized Jacobian. One key element of our proof is the use of geometric measure theory tools such as the coarea formula. We apply our results to the study of maximum entropy models (microcanonical/macrocanonical distributions) and to the convergence of the iterates of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm at low temperatures for non-convex minimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper establishes quantitative bounds in the 1-Wasserstein distance between the family of measures π_ε with Lebesgue density proportional to exp(−U(x)/ε) and the limiting measure π_0 supported on the minimizers of U, for the special case of norm-like potentials U. The main result requires an invertibility condition on a generalized Jacobian and is proved using the coarea formula from geometric measure theory. Two applications are developed: one to the equivalence of microcanonical and macrocanonical maximum-entropy distributions, and one to the low-temperature convergence of the iterates of stochastic gradient Langevin dynamics for non-convex objectives.

Significance. If the stated W1 bounds hold under the given hypotheses, the work supplies explicit quantitative rates for Laplace-type concentration in a non-smooth regime where the classical Hessian-invertibility assumption fails. The reliance on the coarea formula to handle the geometry of the level sets is a technically sound extension of existing GMT-based arguments. The two applications illustrate that the abstract condition can be verified in concrete statistical and algorithmic settings, which increases the result’s utility for both theoretical probability and optimization practice.

minor comments (3)
  1. The precise definition of the generalized Jacobian and the points at which its invertibility is required should be stated in a single, self-contained paragraph early in the main-result section so that the hypotheses can be checked directly in the applications.
  2. In the SGLD application, the passage from the continuous-time Langevin diffusion to the discrete iterates at low temperature would benefit from an explicit reference to the step-size and discretization-error controls that are used to transfer the W1 bound.
  3. A short remark comparing the obtained W1 rate with the classical smooth-case rate (when the Hessian is invertible) would help readers gauge the price paid for allowing non-smooth norm-like potentials.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of our manuscript, the assessment of its significance, and the recommendation for minor revision. We appreciate the recognition of the technical approach using the coarea formula and the utility of the applications.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper derives quantitative W1 bounds for Laplace-type measures with norm-like potentials U under an explicit invertibility assumption on a generalized Jacobian. The proof invokes the coarea formula and other tools from geometric measure theory as independent external results. Applications proceed by direct verification of the stated hypotheses in each case. No equations reduce by construction to fitted parameters, self-definitions, or unverified self-citations; the central claim retains independent mathematical content grounded in standard analysis without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, invented entities, or ad-hoc axioms are described beyond standard assumptions of geometric measure theory and probability.

axioms (2)
  • standard math Coarea formula applies to the level sets of the potential U
    Invoked as key element of the proof for quantitative bounds.
  • domain assumption Generalized Jacobian invertibility condition holds at minimizers
    Stated as the condition under which the main result holds.

pith-pipeline@v0.9.0 · 5754 in / 1231 out tokens · 22587 ms · 2026-05-24T13:03:33.918273+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 2 internal anchors

  1. [1]

    Functions of bounded variation and free discontinuity problems

    Luigi Ambrosio, Nicola Fusco, and Diego Pallara. Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford Uni- versity Press, New York, 2000

  2. [2]

    Convergence of simulated annealing using foster-lyapunov criteria.Journal of Applied Probability, 38(4):975–994, 2001

    Christophe Andrieu, Laird A Breyer, and Arnaud Doucet. Convergence of simulated annealing using foster-lyapunov criteria.Journal of Applied Probability, 38(4):975–994, 2001

  3. [3]

    Singularities of Differentiable Maps: Volume II Monodromy and Asymptotic Integrals, volume 83

    Vladimir Igorevich Arnold, Aleksandr Nikolaevich Varchenko, and Sabir Medzhidovich Gusein- Zade. Singularities of Differentiable Maps: Volume II Monodromy and Asymptotic Integrals, volume 83. Springer Science & Business Media, 2012

  4. [4]

    Approximation of integrals over asymptotic sets with applications to probability and statistics

    Philippe Barbe. Approximation of integrals over asymptotic sets with applications to proba- bility and statistics.arXiv preprint math/0312132, 2003

  5. [5]

    Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory

    Carl M Bender and Steven A Orszag. Advanced mathematical methods for scientists and engineers I: Asymptotic methods and perturbation theory. Springer Science & Business Media, 2013. 33

  6. [6]

    Asymptotic expansions of integrals

    Norman Bleistein and Richard A Handelsman. Asymptotic expansions of integrals. Ardent Media, 1975

  7. [7]

    Nonnegative functions as squares or sums of squares.Journal of Functional Analysis, 232(1):137–147, 2006

    Jean-Michel Bony, Fabrizio Broglia, Ferruccio Colombini, and Ludovico Pernazza. Nonnegative functions as squares or sums of squares.Journal of Functional Analysis, 232(1):137–147, 2006

  8. [8]

    Stability and generalization

    Olivier Bousquet and André Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2(3):499–526, 2002

  9. [9]

    Les algorithmes stochastiques contournent-ils les pièges? Ann

    Odile Brandière and Marie Duflo. Les algorithmes stochastiques contournent-ils les pièges? Ann. Inst. H. Poincaré Probab. Statist., 32(3):395–427, 1996

  10. [10]

    Convergence of Langevin-simulated annealing algorithms with multiplicative noise.arXiv preprint arXiv:2109.11669, 2021

    Pierre Bras and Gilles Pagès. Convergence of Langevin-simulated annealing algorithms with multiplicative noise.arXiv preprint arXiv:2109.11669, 2021

  11. [11]

    Springer, 2006

    Karl W Breitung.Asymptotic approximations for probability integrals. Springer, 2006

  12. [12]

    Multiscale sparse microcanonical models

    Joan Bruna and Stéphane Mallat. Multiscale sparse microcanonical models. Mathematical Statistics and Learning, 1, 01 2018

  13. [13]

    The energy transformation method for the Metropolis algorithm compared with simulated annealing.Probability theory and related fields, 110(1):69–89, 1998

    Olivier Catoni. The energy transformation method for the Metropolis algorithm compared with simulated annealing.Probability theory and related fields, 110(1):69–89, 1998

  14. [14]

    Stochastic Gradient Hamiltonian Monte Carlo for Non- Convex Learning.arXiv preprint arXiv:1903.10328, 2019

    Huy N Chau and Miklos Rasonyi. Stochastic Gradient Hamiltonian Monte Carlo for Non- Convex Learning.arXiv preprint arXiv:1903.10328, 2019

  15. [15]

    Diffusion for global optimization in Rn

    Tzuu-Shuh Chiang, Chii-Ruey Hwang, and Shuenn Jyi Sheu. Diffusion for global optimization in Rn. SIAM Journal on Control and Optimization, 25(3):737–753, 1987

  16. [16]

    An improved variant of simulated annealing that converges under fast cooling

    Michael CH Choi. An improved variant of simulated annealing that converges under fast cooling. arXiv preprint arXiv:1901.10269, 2019

  17. [17]

    On the convergence of an improved discrete simulated annealing via land- scape modification

    Michael CH Choi. On the convergence of an improved discrete simulated annealing via land- scape modification. arXiv preprint arXiv:2011.09680, 2020

  18. [18]

    Springer, 2006

    Edmond Combet.Intégrales exponentielles: développements asymptotiques, propriétés lagrang- iennes, volume 937. Springer, 2006

  19. [19]

    I-divergence geometry of probability distributions and minimization problems

    Imre Csiszár. I-divergence geometry of probability distributions and minimization problems. The annals of probability, pages 146–158, 1975

  20. [20]

    Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log- concave densities.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):651–676, 2017

  21. [21]

    Maximum entropy methods for texture synthesis: theory and practice

    Valentin De Bortoli, Agnès Desolneux, Alain Durmus, Bruno Galerne, and Arthur Leclaire. Maximum entropy methods for texture synthesis: theory and practice. SIAM Journal on Mathematics of Data Science, 3(1):52–82, 2021

  22. [22]

    Courier Corporation, 1981

    Nicolaas Govert De Bruijn.Asymptotic methods in analysis, volume 4. Courier Corporation, 1981. 34

  23. [23]

    Stochastic image reconstruction from local histograms of gradient orientation

    Agnès Desolneux and Arthur Leclaire. Stochastic image reconstruction from local histograms of gradient orientation. In Franccois Lauze, Yiqiu Dong, and Anders Bjorholm Dahl, edi- tors, Scale Space and Variational Methods in Computer Vision - 6th International Conference, SSVM 2017, Kolding, Denmark, June 4-8, 2017, Proceedings, volume 10302 ofLecture Note...

  24. [24]

    Nonasymptotic convergence analysis for the unadjusted Langevin algorithm

    Alain Durmus and Eric Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. The Annals of Applied Probability, 27(3):1551–1587, 2017

  25. [25]

    Ellis and Jay S

    Richard S. Ellis and Jay S. Rosen. Laplace’s Method for Gaussian Integrals with an Application to Statistical Mechanics.The Annals of Probability, 10(1):47 – 66, 1982

  26. [26]

    Number 3

    Arthur Erdélyi.Asymptotic expansions. Number 3. Courier Corporation, 1956

  27. [27]

    Global non-convex optimization with discretized diffusions

    Murat A Erdogdu, Lester Mackey, and Ohad Shamir. Global non-convex optimization with discretized diffusions. arXiv preprint arXiv:1810.12361, 2018

  28. [28]

    Courier Dover Pub- lications, 2020

    Marat Andreevich Evgrafov.Asymptotic estimates and entire functions. Courier Dover Pub- lications, 2020

  29. [29]

    Geometric measure theory

    Herbert Federer. Geometric measure theory. Die Grundlehren der mathematischen Wis- senschaften, Band 153. Springer-Verlag New York Inc., New York, 1969

  30. [30]

    Asymptotic methods in analysis

    MV Fedoryuk. Asymptotic methods in analysis. InAnalysis I, pages 83–191. Springer, 1989

  31. [31]

    On positivity of pseudo-differential operators

    C Fefferman and Duong Hong Phong. On positivity of pseudo-differential operators. Pro- ceedings of the National Academy of Sciences of the United States of America, 75(10):4673, 1978

  32. [32]

    Global convergence of stochastic gradient Hamiltonian Monte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration

    Xuefeng Gao, Mert Gürbüzbalaban, and Lingjiong Zhu. Global convergence of stochastic gradient Hamiltonian Monte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration. arXiv preprint arXiv:1809.04618, 2018

  33. [33]

    Recursive stochastic algorithms for global optimization in Rˆd.SIAM Journal on Control and Optimization, 29(5):999–1018, 1991

    Saul B Gelfand and Sanjoy K Mitter. Recursive stochastic algorithms for global optimization in Rˆd.SIAM Journal on Control and Optimization, 29(5):999–1018, 1991

  34. [34]

    Gelfand and Sanjoy K

    Saul B. Gelfand and Sanjoy K. Mitter. Metropolis-type annealing algorithms for global opti- mization in Rd. SIAM J. Control Optim., 31(1):111–131, 1993

  35. [35]

    Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.IEEE Trans

    Stuart Geman and Donald Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.IEEE Trans. Pattern Anal. Mach. Intell., 6(6):721–741, 1984

  36. [36]

    Diffusions for global optimization.SIAM Journal on Control and Optimization, 24(5):1031–1043, 1986

    Stuart Geman and Chii-Ruey Hwang. Diffusions for global optimization.SIAM Journal on Control and Optimization, 24(5):1031–1043, 1986

  37. [37]

    Nonstationary Markov chains and convergence of the annealing algorithm.J

    Basilis Gidas. Nonstationary Markov chains and convergence of the annealing algorithm.J. Statist. Phys., 39(1-2):73–131, 1985

  38. [38]

    Information and entropy econometrics: a review and synthesis, volume 3

    Amos Golan. Information and entropy econometrics: a review and synthesis, volume 3. now publishers inc, 2008. 35

  39. [39]

    A tutorial survey of theory and applications of simulated annealing

    Bruce Hajek. A tutorial survey of theory and applications of simulated annealing. In1985 24th IEEE Conference on Decision and Control, pages 755–760. IEEE, 1985

  40. [40]

    Cooling schedules for optimal annealing

    Bruce Hajek. Cooling schedules for optimal annealing. Mathematics of operations research, 13(2):311–329, 1988

  41. [41]

    On the choice of a model to fit data from an exponential family

    Dominique MA Haughton. On the choice of a model to fit data from an exponential family. The annals of statistics, pages 342–355, 1988

  42. [42]

    Resolutionofsingularitiesofanalgebraicvarietyoverafieldofcharacteristic zero: Ii

    HeisukeHironaka. Resolutionofsingularitiesofanalgebraicvarietyoverafieldofcharacteristic zero: Ii. Annals of Mathematics, pages 205–326, 1964

  43. [43]

    Holley, Shigeo Kusuoka, and Daniel W

    Richard A. Holley, Shigeo Kusuoka, and Daniel W. Stroock. Asymptotics of the spectral gap with applications to the theory of simulated annealing.J. Funct. Anal., 83(2):333–347, 1989

  44. [44]

    Laplace’s method revisited: Weak convergence of probability measures

    Chii-Ruey Hwang. Laplace’s method revisited: Weak convergence of probability measures. Annals of Probability, 8(6):1177–1182, 1980

  45. [45]

    Stochastic differential equations and diffusion processes

    NobuyukiIkedaandShinzoWatanabe. Stochastic differential equations and diffusion processes. Elsevier, 2014

  46. [46]

    E. T. Jaynes. Information theory and statistical mechanics.Phys. Rev., 1957

  47. [47]

    Optimization by simulated annealing: quantitative studies.J

    Scott Kirkpatrick. Optimization by simulated annealing: quantitative studies.J. Statist. Phys., 34(5-6):975–986, 1984

  48. [48]

    On the asymptotic Laplacemethodanditsapplicationtorandomchaos

    Dmitry Alekseevich Korshunov, Vladimir Il’ich Piterbarg, and E Hashorva. On the asymptotic Laplacemethodanditsapplicationtorandomchaos. Mathematical Notes, 97(5):878–891, 2015

  49. [49]

    Universitext

    Serge Lang.Introduction to differentiable manifolds. Universitext. Springer-Verlag, New York, second edition, 2002

  50. [50]

    Notes on rectifiability.https://people

    Urs Lang. Notes on rectifiability.https://people. math. ethz. ch/˜ lang/rect_notes. pdf, 2007

  51. [51]

    A numerical approach to some basic theorems in singularity theory

    Ta Le Loi and Phan Phien. A numerical approach to some basic theorems in singularity theory. Mathematische Nachrichten, 287(7):764–781, 2014

  52. [52]

    Learning FRAME models using CNN filters

    Yang Lu, Song-Chun Zhu, and Ying Nian Wu. Learning FRAME models using CNN filters. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 1902–1910, 2016

  53. [53]

    J. Milnor. Morse theory. Based on lecture notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton University Press, Princeton, N.J., 1963

  54. [54]

    Elsevier/Academic Press, Amsterdam, fifth edition,

    Frank Morgan.Geometric measure theory. Elsevier/Academic Press, Amsterdam, fifth edition,

  55. [55]

    A beginner’s guide, Illustrated by James F. Bredt

  56. [56]

    Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization

    Than Huy Nguyen, Umut Simsekli, and Gaël Richard. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization. InInternational Conference on Machine Learning, pages 4810–4819. PMLR, 2019. 36

  57. [57]

    Topics in nonlinear functional analysis, volume 6 ofCourant Lecture Notes in Mathematics

    Louis Nirenberg. Topics in nonlinear functional analysis, volume 6 ofCourant Lecture Notes in Mathematics. New York University, Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2001. Chapter 6 by E. Zehnder, Notes by R. A. Artino, Revised reprint of the 1974 original

  58. [58]

    CRC Press, 1997

    Frank Olver.Asymptotics and special functions. CRC Press, 1997

  59. [59]

    Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing.Ann

    Mariane Pelletier. Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing.Ann. Appl. Probab., 8(1):10–44, 1998

  60. [60]

    Simoncelli

    Javier Portilla and Eero P. Simoncelli. A parametric texture model based on joint statistics of complex wavelet coefficients.Int. J. Comput. Vis., 40(1):49–70, 2000

  61. [61]

    Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis

    Maxim Raginsky, Alexander Rakhlin, and Matus Telgarsky. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017

  62. [62]

    Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, pages 341–363, 1996

    Gareth O Roberts and Richard L Tweedie. Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, pages 341–363, 1996

  63. [63]

    Simulated annealing for constrained global optimiza- tion

    H Edwin Romeijn and Robert L Smith. Simulated annealing for constrained global optimiza- tion. Journal of Global Optimization, 5(2):101–126, 1994

  64. [64]

    Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations

    Håvard Rue, Sara Martino, and Nicolas Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the royal statistical society: Series b (statistical methodology), 71(2):319–392, 2009

  65. [65]

    Multidimensional Watson lemma and its applications.Mathematical Notes, 99(3):406–412, 2016

    Anastasiia Igorevna Rytova and Elena Borisovna Yarovaya. Multidimensional Watson lemma and its applications.Mathematical Notes, 99(3):406–412, 2016

  66. [66]

    Accurate approximations for posterior moments and marginal densities

    Luke Tierney and Joseph B Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the american statistical association, 81(393):82–86, 1986

  67. [67]

    Optimal transport, volume 338 of Grundlehren der Mathematischen Wis- senschaften [Fundamental Principles of Mathematical Sciences]

    Cédric Villani. Optimal transport, volume 338 of Grundlehren der Mathematischen Wis- senschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009. Old and new

  68. [68]

    Tuning kinetics and thermodynamics of hydrogen storage in light metal element based systems–a review of recent progress.Journal of Alloys and Compounds, 658:280–300, 2016

    H Wang, HJ Lin, WT Cai, LZ Ouyang, and M Zhu. Tuning kinetics and thermodynamics of hydrogen storage in light metal element based systems–a review of recent progress.Journal of Alloys and Compounds, 658:280–300, 2016

  69. [69]

    Bayesian learning via stochastic gradient Langevin dynamics

    Max Welling and Yee W Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011

  70. [70]

    Analytic extensions of differentiable functions defined in closed sets.Trans- actions of the American Mathematical Society, 36(1):63–89, 1934

    Hassler Whitney. Analytic extensions of differentiable functions defined in closed sets.Trans- actions of the American Mathematical Society, 36(1):63–89, 1934

  71. [71]

    R. Wong. Asymptotic approximations of integrals, volume 34 of Classics in Applied Math- ematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. Corrected reprint of the 1989 original. 37

  72. [72]

    Global convergence of Langevin dy- namics based algorithms for nonconvex optimization.arXiv preprint arXiv:1707.06618, 2017

    Pan Xu, Jinghui Chen, Difan Zou, and Quanquan Gu. Global convergence of Langevin dy- namics based algorithms for nonconvex optimization.arXiv preprint arXiv:1707.06618, 2017

  73. [73]

    R. L. Yang. Convergence of the simulated annealing algorithm for continuous global optimiza- tion. J. Optim. Theory Appl., 104(3):691–716, 2000

  74. [74]

    Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

    Nanyang Ye, Zhanxing Zhu, and Rafal K Mantiuk. Langevin dynamics with continuous tem- pering for training deep neural networks.arXiv preprint arXiv:1703.04379, 2017

  75. [75]

    Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization

    Ying Zhang, Ömer Deniz Akyildiz, Theodoros Damoulas, and Sotirios Sabanis. Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization. arXiv preprint arXiv:1910.02008, 2019

  76. [76]

    A hitting time analysis of stochastic gradient Langevin dynamics

    Yuchen Zhang, Percy Liang, and Moses Charikar. A hitting time analysis of stochastic gradient Langevin dynamics. InConference on Learning Theory, pages 1980–2022. PMLR, 2017

  77. [77]

    Ziebart, Andrew L

    Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy inverse reinforcement learning. In Dieter Fox and Carla P. Gomes, editors,Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008, pages 1433–1438. AAAI Press, 2008. Organization of the appendix In t...

  78. [78]

    39 Lemma A.3

    , A1 0 = (1/2)C −1 1 M −d∫ ¯B(0,M η/¯ε1/k) exp[− ∥x∥k]dx , A 2 0 = M −1Hd− ˆd(F −1(0)) , which concludes the proof. 39 Lemma A.3. Assume H1 and H2. Let ϕ : Rd → R and Cϕ ≥ 0 such that for anyx ∈ Rd |ϕ(x)| ≤ Cϕ exp[Cϕ∥x∥αk] . (36) Then, for any ¯ε ∈ (0, mk/(1 + Cϕ,Ψ)) and V ⊂ Rd open and bounded such thatF −1(0) ⊂ V there exist β1 > 0 and A1 ∈ C(R+, R+) su...

  79. [79]

    Lemma A.4

    and A1 = A1 1 + A2 1. Lemma A.4. Assume H1 and that d ≤ p. Then there exist N ∈ N, {xk 0}N k=1 ∈ (Rd)N and Wk ⊂ Rd open such that for anyk ∈ { 1, . . . , N}, xk 0 ∈ Wk, F : ¯Wk → F ( ¯Wk) is a bi-Lipschitz homeomorphism, for anyx ∈ Wk, dF(x) is injective and for anyj ∈ {1, . . . , N}, ¯Wk \ ¯Wj = ∅. In addition, F −1(0) = ∪N k=1{xk 0}. Proof. Since, for a...

  80. [80]

    , N} and v ∈ Rd, v⊤H(xk 0, xk 0)v ≥ m∥v∥2 with H(x, y) = DF(x)⊤DF(y) for any x, y ∈ Rd

    > 0, there existsm > 0 such that for anyk ∈ {1, . . . , N} and v ∈ Rd, v⊤H(xk 0, xk 0)v ≥ m∥v∥2 with H(x, y) = DF(x)⊤DF(y) for any x, y ∈ Rd. For anyk ∈ {1, . . . , N}, there existsWk ⊂ Vk such that for anyx, y ∈ Wk we have∥H(x, y)−H(xk 0, xk 0)∥2 ≤ m/2. Therefore we have for anyx, y ∈ Wk ∥F (x) − F(y)∥2 = ∫ 1 0 ∫ 1 0 ⟨DF(x + t(y − x))(y − x), DF(x + s(y ...