Improved Guarantees for Langevin Monte Carlo with Average Smoothness

Arnak S. Dalalyan; Avetik Karagulyan

arxiv: 2605.31413 · v1 · pith:IMKSZBQ6new · submitted 2026-05-29 · 🧮 math.ST · cs.LG· stat.TH

Improved Guarantees for Langevin Monte Carlo with Average Smoothness

Arnak S. Dalalyan , Avetik Karagulyan This is my paper

Pith reviewed 2026-06-28 20:02 UTC · model grok-4.3

classification 🧮 math.ST cs.LGstat.TH

keywords Langevin Monte CarloWasserstein distanceaverage smoothnessstrongly log-concavediscretization errorsynchronous couplingstochastic gradient Langevin

0 comments

The pith

Langevin Monte Carlo discretization error is governed by average coordinate-wise smoothness rather than the global constant.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves tighter nonasymptotic bounds on the Wasserstein error of Langevin Monte Carlo when the target is strongly log-concave. The central improvement replaces the usual global smoothness constant with an average taken over coordinate-wise smoothness constants. The argument is probabilistic and rests on a refined synchronous coupling between the continuous diffusion and its Euler discretization. The same technique yields sharper results for variable step sizes, for potentials whose Laplacian is Lipschitz, and for stochastic-gradient Langevin dynamics on finite-sum objectives.

Core claim

In the strongly log-concave setting the discretization error of Langevin Monte Carlo, measured in Wasserstein distance, is controlled by an average coordinate-wise smoothness constant rather than the conventional global smoothness constant. The proof uses synchronous coupling. The same ideas produce improved guarantees for variable step sizes, replace the usual Hessian-Lipschitz term by a weaker trace-type third-order quantity when the Laplacian is Lipschitz, and improve the dependence on root-mean-square smoothness for stochastic-gradient Langevin dynamics with control variates. Applications to generalized linear models with Gaussian design show dimension-dependent gains, particularly when

What carries the argument

Refined synchronous coupling that propagates the average of the coordinate-wise smoothness constants into the evolution of the Wasserstein distance between the continuous and discrete processes.

If this is right

Variable-step-size schemes inherit the same average-smoothness improvement.
When the Laplacian is Lipschitz the third-order contribution becomes a trace-type quantity instead of the usual Hessian-Lipschitz term.
Stochastic-gradient Langevin dynamics on finite sums improves its dependence on the root-mean-square smoothness of the component functions.
Generalized linear models with Gaussian design obtain dimension-dependent gains over prior bounds when covariates are correlated.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In high-dimensional problems where smoothness varies sharply across coordinates the effective rate may approach the best single-coordinate rate.
Adaptive step-size rules could be designed by estimating the per-coordinate smoothness vector on the fly.
Similar average-smoothness arguments might apply to other first-order discretizations such as underdamped Langevin or splitting methods.

Load-bearing premise

The target potential must be strongly log-concave so that the synchronous coupling contracts at a rate that does not degrade with the discretization step size.

What would settle it

Run the algorithm on a quadratic potential whose coordinate-wise smoothness constants differ by a large factor; if the observed Wasserstein error scales with the average rather than the maximum, the claim holds.

read the original abstract

We establish improved nonasymptotic bounds for Langevin Monte Carlo in the strongly log-concave setting, when the error is measured by the Wasserstein distance. The main result shows that the discretization error is governed by an average coordinate-wise smoothness constant, rather than by the usual global smoothness constant. The proof is short and probabilistic, and relies on a refined use of the synchronous coupling. We further show that the same ideas lead to improved bounds for variable step sizes, for potentials whose Laplacian is Lipschitz-continuous, and for finite-sum problems sampled by stochastic-gradient Langevin dynamics with fixed point control variates. In the Laplacian-smooth case, the usual Hessian-Lipschitz contribution is replaced by a weaker trace-type third-order smoothness quantity. In the finite-sum setting, the resulting SGLD bound improves the dependence on the root mean square smoothness of the component functions. Applications to generalized linear models with Gaussian design show that these refinements can yield substantial, dimension-dependent improvements over previously known bounds, especially for correlated covariates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper refines LMC Wasserstein bounds by replacing global smoothness with an average coordinate-wise quantity via tighter synchronous coupling, with clean extensions to SGLD and Laplacian-smooth cases.

read the letter

The core contribution is a non-asymptotic Wasserstein bound for Langevin Monte Carlo under strong log-concavity that depends on an average coordinate-wise smoothness constant instead of the usual global one. The argument uses a refined synchronous coupling and stays short and probabilistic. The same approach yields improved bounds for variable steps, for potentials with Lipschitz Laplacian (swapping the usual Hessian-Lipschitz term for a weaker trace-type third-order quantity), and for SGLD with fixed-point control variates (improving the RMS smoothness dependence). The GLM applications with Gaussian design indicate dimension-dependent gains when covariates are correlated.

The structure looks internally consistent. The stress-test found no hidden global-smoothness dependence or circularity, and the strong-convexity assumption is stated up front as the regime where contraction works. No load-bearing fitting or contradictory equations appear.

The main limitation is that the abstract and stress-test give the claim but not the explicit error terms or constant tracking, so one still needs the full proof to confirm the average smoothness controls the discretization error without extra factors. That is a standard check rather than a red flag.

This is for researchers who use or analyze Langevin-type samplers in high dimensions and care about non-asymptotic rates. It is a modest but useful technical sharpening of existing coupling arguments. It deserves a serious referee because the idea is clean, the extensions are natural, and the potential practical impact on correlated-design settings is concrete.

Referee Report

0 major / 3 minor

Summary. The paper establishes improved non-asymptotic Wasserstein bounds for Langevin Monte Carlo (LMC) discretization error in the strongly log-concave setting. The central claim is that the error is governed by an average coordinate-wise smoothness constant rather than the usual global smoothness constant, derived via a refined synchronous coupling argument. Extensions are given for variable step sizes, potentials with Lipschitz-continuous Laplacian (replacing the Hessian-Lipschitz term by a trace-type third-order smoothness quantity), and finite-sum SGLD with fixed-point control variates (improving the RMS smoothness dependence). Applications to generalized linear models with Gaussian design illustrate dimension-dependent gains, especially under correlated covariates.

Significance. If the central bounds hold, the work supplies sharper, structure-exploiting guarantees for LMC and SGLD that can yield substantial improvements over global-smoothness analyses in high-dimensional regimes. The short probabilistic proof style, the weakening to average or trace-type quantities, and the explicit GLM applications are concrete strengths that could influence both theoretical sampling literature and practical algorithm design.

minor comments (3)

The introduction would benefit from a one-paragraph high-level outline of the refined synchronous coupling argument (currently only alluded to in the abstract) to help readers locate the key technical departure from standard analyses.
[Applications] In the GLM application, the claimed dimension-dependent improvement is stated qualitatively; adding a brief explicit comparison (e.g., the ratio of the new bound to the global-smoothness bound under a simple correlation model) would strengthen the illustration.
Notation for the average coordinate-wise smoothness constant should be introduced with a short display equation early in the main result section to avoid any ambiguity when it is later used in the variable-step and SGLD extensions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the significance of the average coordinate-wise smoothness bounds, and recommendation for minor revision. We are pleased that the short probabilistic proof, the extensions to variable steps, Laplacian smoothness, and SGLD with control variates, as well as the GLM applications, are viewed as strengths.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim—an improved Wasserstein bound for LMC discretization error governed by average coordinate-wise smoothness—is derived via a refined synchronous coupling argument applied to the strongly log-concave regime. This is a direct probabilistic derivation from the coupling construction and the assumed strong convexity, with no equations or quantities defined in terms of themselves, no fitted inputs renamed as predictions, and no load-bearing self-citations that reduce the result to prior work by the same authors. The extensions to variable steps, Laplacian-smooth potentials, and SGLD follow analogous coupling weakenings without circular reduction. The derivation is self-contained against external benchmarks such as standard global-smoothness analyses.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard domain assumptions from convex analysis and stochastic processes; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption The potential function is strongly log-concave
Stated explicitly as the setting in which the improved bounds hold.
domain assumption Synchronous coupling can be refined to track coordinate-wise smoothness
The proof technique invoked in the abstract.

pith-pipeline@v0.9.1-grok · 5710 in / 1413 out tokens · 26833 ms · 2026-06-28T20:02:13.661778+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 5 canonical work pages

[1]

and CHEWI, S

AHN, K. and CHEWI, S. (2021). Efficient constrained sampling via the mirror-Langevin algorithm. In Advances in Neural Information Processing Systems3428405–28418

2021
[2]

and KOHATSU-HIGA, A

ALFONSI, A., JOURDAIN, B. and KOHATSU-HIGA, A. (2015). Optimal transport bounds between the time-marginals of a multidimensional diffusion and its Euler scheme.Electron. J. Probab.2031 pp. https://doi.org/10.1214/EJP.v20-4195

work page doi:10.1214/ejp.v20-4195 2015
[3]

BACH, F. (2017). On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions. Journal of Machine Learning Research181–38

2017
[4]

BAKER, J., FEARNHEAD, P., FOX, E. B. and NEMETH, C. (2019). Control variates for stochastic gradient MCMC.Statistics and Computing29599–615. https://doi.org/10.1007/s11222-018-9826-2

work page doi:10.1007/s11222-018-9826-2 2019
[5]

Representations of Knowledge in Complex Systems

BESAG, J. (1994). Comments on “Representations of Knowledge in Complex Systems” by U. Grenander and M. I. Miller.Journal of the Royal Statistical Society. Series B56591–592

1994
[6]

and MOULINES, E

BROSSE, N., DURMUS, A. and MOULINES, E. (2018). The promises and pitfalls of Stochastic Gradient Langevin Dynamics. InNeurIPS 20188278–8288

2018
[7]

and PEREYRA, M

BROSSE, N., DURMUS, A., MOULINES, É. and PEREYRA, M. (2017). Sampling from a log-concave dis- tribution with compact support with proximal Langevin Monte Carlo. InProceedings of the 30th Con- ference on Learning Theory.Proceedings of Machine Learning Research65319–342

2017
[8]

and LEHEC, J

BUBECK, S., ELDAN, R. and LEHEC, J. (2018). Sampling from a log-concave distribution with projected Langevin Monte Carlo.Discrete & Computational Geometry59757–783

2018
[9]

S., FLAMMARION, N., MA, Y.-A., BARTLETT, P

CHATTERJI, N. S., FLAMMARION, N., MA, Y.-A., BARTLETT, P. L. and JORDAN, M. I. (2018). On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo. InProceedings of the 35th International Conference on Machine Learning.Proceedings of Machine Learning Research80764–

2018
[10]

and WIBISONO, A

CHEN, Y., CHEWI, S., SALIM, A. and WIBISONO, A. (2022). Improved analysis for a proximal algo- rithm for sampling. InProceedings of Thirty Fifth Conference on Learning Theory(P.-L. LOHand M. RAGINSKY, eds.).Proceedings of Machine Learning Research1782984–3014. PMLR

2022
[11]

and BARTLETT, P

CHENG, X. and BARTLETT, P. L. (2018). Convergence of Langevin MCMC in KL-divergence. InPro- ceedings of Algorithmic Learning Theory.Proceedings of Machine Learning Research83186–211. PMLR

2018
[12]

S., BARTLETT, P

CHENG, X., CHATTERJI, N. S., BARTLETT, P. L. and JORDAN, M. I. (2018). Underdamped Langevin MCMC: A non-asymptotic analysis. InConference on Learning Theory300–323. PMLR. LMC: GUARANTEES WITH IMPROVED CONDITION NUMBERS23

2018
[13]

and JORDAN, M

CHENG, X., YIN, D., BARTLETT, P. and JORDAN, M. (2020). Stochastic Gradient and Langevin Processes. InProceedings of the 37th International Conference on Machine Learning(H. D. III and A. SINGH, eds.).Proceedings of Machine Learning Research1191810–1819. PMLR

2020
[14]

L., LU, C., MAUNU, T., RIGOLLET, P

CHEWI, S., GOUIC, T. L., LU, C., MAUNU, T., RIGOLLET, P. and STROMME, A. (2020). Exponential ergodicity of mirror-Langevin diffusions

2020
[15]

DALALYAN, A. (2017). Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InProceedings of the 2017 Conference on Learning Theory(S. KALEand O. SHAMIR, eds.).Proceedings of Machine Learning Research65678–689

2017
[16]

DALALYAN, A. S. and KARAGULYAN, A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications

2019
[17]

DALALYAN, A. S. and RIOU-DURAND, L. (2020). On sampling from a log-concave density using kinetic Langevin diffusions.Bernoulli261956–1988

2020
[18]

A., REDDI, S

DUBEY, K. A., REDDI, S. J., WILLIAMSON, S. A., PÓCZOS, B., SMOLA, A. J. and XING, E. P. (2016). Variance Reduction in Stochastic Gradient Langevin Dynamics. InAdvances in Neural Information Processing Systems291154–1162

2016
[19]

and MIASOJEDOW, B

DURMUS, A., MAJEWSKI, S. and MIASOJEDOW, B. (2019). Analysis of Langevin Monte Carlo via Convex Optimization.J. Mach. Learn. Res.2073–1

2019
[20]

Durmus and E

DURMUS, A. and MOULINES, E. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.Ann. Appl. Probab.271551–1587. https://doi.org/10.1214/16-AAP1238

work page doi:10.1214/16-aap1238 2017
[21]

Durmus and E

DURMUS, A. and MOULINES, E. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli252854–2882. https://doi.org/10.3150/18-BEJ1073

work page doi:10.3150/18-bej1073 2019
[22]

and PEREYRA, M

DURMUS, A., MOULINES, É. and PEREYRA, M. (2018). Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau.SIAM Journal on Imaging Sciences11 473–506

2018
[23]

EINSTEIN, A. (1905). Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen.Annalen der Physik322549–560

1905
[24]

B., STERN, H

GELMAN, A., CARLIN, J. B., STERN, H. S., DUNSON, D. B., VEHTARI, A. and RUBIN, D. B. (2013). Bayesian Data Analysis, 3 ed. CRC Press

2013
[25]

and ZDEBOROVÁ, L

GERACE, F., LOUREIRO, B., KRZAKALA, F., MÉZARD, M. and ZDEBOROVÁ, L. (2020). Generalisa- tion Error in Learning with Random Features and the Hidden Manifold Model. InProceedings of the 37th International Conference on Machine Learning.Proceedings of Machine Learning Research119 3452–3462. PMLR

2020
[26]

and MILLER, M

GRENANDER, U. and MILLER, M. I. (1994). Representations of Knowledge in Complex Systems.Journal of the Royal Statistical Society. Series B56549–603

1994
[27]

HASTINGS, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika5797–109

1970
[28]

HE, Y., ERDOGDU, M. A. et al. (2020). Ergodicity of randomized midpoint sampling for strongly log- concave distributions.arXiv preprint arXiv:2007.XXXX

2020
[29]

IBRAHIM, J. G. and LAUD, P. W. (1991). On Bayesian Analysis of Generalized Linear Models.Journal of the American Statistical Association86981–986

1991
[30]

LANGEVIN, P. (1908). Sur la théorie du mouvement brownien.Comptes Rendus de l’Académie des Sciences 146530–533

1908
[31]

LEHEC, J. (2023). The Langevin Monte Carlo algorithm in the non-smooth log-concave case.The Annals of Applied Probability334858–4874

2023
[32]

and FOX, E

MA, Y.-A., CHEN, T. and FOX, E. (2015). A complete recipe for stochastic gradient MCMC.Advances in Neural Information Processing Systems28

2015
[33]

and MONTANARI, A

MEI, S. and MONTANARI, A. (2022). The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve.Communications on Pure and Applied Mathematics75 667–766. https://doi.org/10.1002/cpa.22008

work page doi:10.1002/cpa.22008 2022
[34]

W., ROSENBLUTH, M

METROPOLIS, N., ROSENBLUTH, A. W., ROSENBLUTH, M. N., TELLER, A. H. and TELLER, E. (1953). Equation of State Calculations by Fast Computing Machines.Journal of Chemical Physics211087– 1092

1953
[35]

MONMARCHÉ, P. (2021). High-Dimensional MCMC with a Standard Splitting Scheme for the Under- damped Langevin Diffusion.Electronic Journal of Statistics154117–4166. https://doi.org/10.1214/ 21-EJS1888

2021
[36]

J., BARTLETT, P

MOU, W., WAINWRIGHT, M. J., BARTLETT, P. L. and JORDAN, M. I. (2021). High-order Langevin dif- fusion yields an accelerated MCMC algorithm.Journal of Machine Learning Research221–41

2021
[37]

B., HASENCLEVER, L., VOLLMER, S

NAGAPETYAN, T., DUNCAN, A. B., HASENCLEVER, L., VOLLMER, S. J., SZPRUCH, L. and ZY- GALAKIS, K. (2017). The True Cost of Stochastic Gradient Langevin Dynamics.ArXiv e-prints. 24

2017
[38]

NELDER, J. A. and WEDDERBURN, R. W. M. (1972). Generalized Linear Models.Journal of the Royal Statistical Society. Series A135370–384

1972
[39]

and TELGARSKY, M

RAGINSKY, M., RAKHLIN, A. and TELGARSKY, M. (2017). Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis.Proceedings of Machine Learning Research651674– 1703

2017
[40]

ROBERTS, G. O. and ROSENTHAL, J. S. (1998). Optimal Scaling of Discrete Approximations to Langevin Diffusions.Journal of the Royal Statistical Society. Series B60255–268

1998
[41]

ROBERTS, G. O. and TWEEDIE, R. L. (1996). Exponential Convergence of Langevin Distributions and Their Discrete Approximations.Bernoulli2341–363

1996
[42]

and LEE, Y

SHEN, R. and LEE, Y. T. (2019). The randomized midpoint method for log-concave sampling. InAdvances in Neural Information Processing Systems2098–2109

2019
[43]

W., THIERY, A

TEH, Y. W., THIERY, A. H. and VOLLMER, S. J. (2016). Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics.Journal of Machine Learning Research171–33

2016
[44]

UHLENBECK, G. E. and ORNSTEIN, L. S. (1930). On the Theory of the Brownian Motion.Physical Review 36823–841

1930
[45]

VEMPALA, S. S. and WIBISONO, A. (2019). Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices. InAdvances in Neural Information Processing Systems328092–8104

2019
[46]

(2018).High-Dimensional Probability: An Introduction with Applications in Data Science

VERSHYNIN, R. (2018).High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics47. Cambridge University Press

2018
[47]

and TEH, Y

WELLING, M. and TEH, Y. W. (2011). Bayesian Learning via Stochastic Gradient Langevin Dynamics. InProceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011681–688

2011
[48]

WIBISONO, A. (2018). Sampling as Optimization in the Space of Measures: The Langevin Dynamics as a Composite Optimization Problem. InProceedings of the 31st Conference on Learning Theory.Pro- ceedings of Machine Learning Research752093–3027. PMLR

2018
[49]

and DALALYAN, A

YU, L. and DALALYAN, A. (2025). Parallelized midpoint randomization for Langevin Monte Carlo. Stochastic Processes and their Applications190104764

2025
[50]

S., CHEWI, S., LI, M., BALASUBRAMANIAN, K

ZHANG, M. S., CHEWI, S., LI, M., BALASUBRAMANIAN, K. and ERDOGDU, M. A. (2023). Improved Discretization Analysis for Underdamped Langevin Monte Carlo. InProceedings of Thirty Sixth Con- ference on Learning Theory.Proceedings of Machine Learning Research19536–71. PMLR

2023
[51]

and GU, Q

ZOU, D., XU, P. and GU, Q. (2019). Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics. InProceedings of the Twenty-Second International Conference on Arti- ficial Intelligence and Statistics.Proceedings of Machine Learning Research892936–2945. PMLR

2019

[1] [1]

and CHEWI, S

AHN, K. and CHEWI, S. (2021). Efficient constrained sampling via the mirror-Langevin algorithm. In Advances in Neural Information Processing Systems3428405–28418

2021

[2] [2]

and KOHATSU-HIGA, A

ALFONSI, A., JOURDAIN, B. and KOHATSU-HIGA, A. (2015). Optimal transport bounds between the time-marginals of a multidimensional diffusion and its Euler scheme.Electron. J. Probab.2031 pp. https://doi.org/10.1214/EJP.v20-4195

work page doi:10.1214/ejp.v20-4195 2015

[3] [3]

BACH, F. (2017). On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions. Journal of Machine Learning Research181–38

2017

[4] [4]

BAKER, J., FEARNHEAD, P., FOX, E. B. and NEMETH, C. (2019). Control variates for stochastic gradient MCMC.Statistics and Computing29599–615. https://doi.org/10.1007/s11222-018-9826-2

work page doi:10.1007/s11222-018-9826-2 2019

[5] [5]

Representations of Knowledge in Complex Systems

BESAG, J. (1994). Comments on “Representations of Knowledge in Complex Systems” by U. Grenander and M. I. Miller.Journal of the Royal Statistical Society. Series B56591–592

1994

[6] [6]

and MOULINES, E

BROSSE, N., DURMUS, A. and MOULINES, E. (2018). The promises and pitfalls of Stochastic Gradient Langevin Dynamics. InNeurIPS 20188278–8288

2018

[7] [7]

and PEREYRA, M

BROSSE, N., DURMUS, A., MOULINES, É. and PEREYRA, M. (2017). Sampling from a log-concave dis- tribution with compact support with proximal Langevin Monte Carlo. InProceedings of the 30th Con- ference on Learning Theory.Proceedings of Machine Learning Research65319–342

2017

[8] [8]

and LEHEC, J

BUBECK, S., ELDAN, R. and LEHEC, J. (2018). Sampling from a log-concave distribution with projected Langevin Monte Carlo.Discrete & Computational Geometry59757–783

2018

[9] [9]

S., FLAMMARION, N., MA, Y.-A., BARTLETT, P

CHATTERJI, N. S., FLAMMARION, N., MA, Y.-A., BARTLETT, P. L. and JORDAN, M. I. (2018). On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo. InProceedings of the 35th International Conference on Machine Learning.Proceedings of Machine Learning Research80764–

2018

[10] [10]

and WIBISONO, A

CHEN, Y., CHEWI, S., SALIM, A. and WIBISONO, A. (2022). Improved analysis for a proximal algo- rithm for sampling. InProceedings of Thirty Fifth Conference on Learning Theory(P.-L. LOHand M. RAGINSKY, eds.).Proceedings of Machine Learning Research1782984–3014. PMLR

2022

[11] [11]

and BARTLETT, P

CHENG, X. and BARTLETT, P. L. (2018). Convergence of Langevin MCMC in KL-divergence. InPro- ceedings of Algorithmic Learning Theory.Proceedings of Machine Learning Research83186–211. PMLR

2018

[12] [12]

S., BARTLETT, P

CHENG, X., CHATTERJI, N. S., BARTLETT, P. L. and JORDAN, M. I. (2018). Underdamped Langevin MCMC: A non-asymptotic analysis. InConference on Learning Theory300–323. PMLR. LMC: GUARANTEES WITH IMPROVED CONDITION NUMBERS23

2018

[13] [13]

and JORDAN, M

CHENG, X., YIN, D., BARTLETT, P. and JORDAN, M. (2020). Stochastic Gradient and Langevin Processes. InProceedings of the 37th International Conference on Machine Learning(H. D. III and A. SINGH, eds.).Proceedings of Machine Learning Research1191810–1819. PMLR

2020

[14] [14]

L., LU, C., MAUNU, T., RIGOLLET, P

CHEWI, S., GOUIC, T. L., LU, C., MAUNU, T., RIGOLLET, P. and STROMME, A. (2020). Exponential ergodicity of mirror-Langevin diffusions

2020

[15] [15]

DALALYAN, A. (2017). Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InProceedings of the 2017 Conference on Learning Theory(S. KALEand O. SHAMIR, eds.).Proceedings of Machine Learning Research65678–689

2017

[16] [16]

DALALYAN, A. S. and KARAGULYAN, A. (2019). User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications

2019

[17] [17]

DALALYAN, A. S. and RIOU-DURAND, L. (2020). On sampling from a log-concave density using kinetic Langevin diffusions.Bernoulli261956–1988

2020

[18] [18]

A., REDDI, S

DUBEY, K. A., REDDI, S. J., WILLIAMSON, S. A., PÓCZOS, B., SMOLA, A. J. and XING, E. P. (2016). Variance Reduction in Stochastic Gradient Langevin Dynamics. InAdvances in Neural Information Processing Systems291154–1162

2016

[19] [19]

and MIASOJEDOW, B

DURMUS, A., MAJEWSKI, S. and MIASOJEDOW, B. (2019). Analysis of Langevin Monte Carlo via Convex Optimization.J. Mach. Learn. Res.2073–1

2019

[20] [20]

Durmus and E

DURMUS, A. and MOULINES, E. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.Ann. Appl. Probab.271551–1587. https://doi.org/10.1214/16-AAP1238

work page doi:10.1214/16-aap1238 2017

[21] [21]

Durmus and E

DURMUS, A. and MOULINES, E. (2019). High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli252854–2882. https://doi.org/10.3150/18-BEJ1073

work page doi:10.3150/18-bej1073 2019

[22] [22]

and PEREYRA, M

DURMUS, A., MOULINES, É. and PEREYRA, M. (2018). Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau.SIAM Journal on Imaging Sciences11 473–506

2018

[23] [23]

EINSTEIN, A. (1905). Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen.Annalen der Physik322549–560

1905

[24] [24]

B., STERN, H

GELMAN, A., CARLIN, J. B., STERN, H. S., DUNSON, D. B., VEHTARI, A. and RUBIN, D. B. (2013). Bayesian Data Analysis, 3 ed. CRC Press

2013

[25] [25]

and ZDEBOROVÁ, L

GERACE, F., LOUREIRO, B., KRZAKALA, F., MÉZARD, M. and ZDEBOROVÁ, L. (2020). Generalisa- tion Error in Learning with Random Features and the Hidden Manifold Model. InProceedings of the 37th International Conference on Machine Learning.Proceedings of Machine Learning Research119 3452–3462. PMLR

2020

[26] [26]

and MILLER, M

GRENANDER, U. and MILLER, M. I. (1994). Representations of Knowledge in Complex Systems.Journal of the Royal Statistical Society. Series B56549–603

1994

[27] [27]

HASTINGS, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika5797–109

1970

[28] [28]

HE, Y., ERDOGDU, M. A. et al. (2020). Ergodicity of randomized midpoint sampling for strongly log- concave distributions.arXiv preprint arXiv:2007.XXXX

2020

[29] [29]

IBRAHIM, J. G. and LAUD, P. W. (1991). On Bayesian Analysis of Generalized Linear Models.Journal of the American Statistical Association86981–986

1991

[30] [30]

LANGEVIN, P. (1908). Sur la théorie du mouvement brownien.Comptes Rendus de l’Académie des Sciences 146530–533

1908

[31] [31]

LEHEC, J. (2023). The Langevin Monte Carlo algorithm in the non-smooth log-concave case.The Annals of Applied Probability334858–4874

2023

[32] [32]

and FOX, E

MA, Y.-A., CHEN, T. and FOX, E. (2015). A complete recipe for stochastic gradient MCMC.Advances in Neural Information Processing Systems28

2015

[33] [33]

and MONTANARI, A

MEI, S. and MONTANARI, A. (2022). The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve.Communications on Pure and Applied Mathematics75 667–766. https://doi.org/10.1002/cpa.22008

work page doi:10.1002/cpa.22008 2022

[34] [34]

W., ROSENBLUTH, M

METROPOLIS, N., ROSENBLUTH, A. W., ROSENBLUTH, M. N., TELLER, A. H. and TELLER, E. (1953). Equation of State Calculations by Fast Computing Machines.Journal of Chemical Physics211087– 1092

1953

[35] [35]

MONMARCHÉ, P. (2021). High-Dimensional MCMC with a Standard Splitting Scheme for the Under- damped Langevin Diffusion.Electronic Journal of Statistics154117–4166. https://doi.org/10.1214/ 21-EJS1888

2021

[36] [36]

J., BARTLETT, P

MOU, W., WAINWRIGHT, M. J., BARTLETT, P. L. and JORDAN, M. I. (2021). High-order Langevin dif- fusion yields an accelerated MCMC algorithm.Journal of Machine Learning Research221–41

2021

[37] [37]

B., HASENCLEVER, L., VOLLMER, S

NAGAPETYAN, T., DUNCAN, A. B., HASENCLEVER, L., VOLLMER, S. J., SZPRUCH, L. and ZY- GALAKIS, K. (2017). The True Cost of Stochastic Gradient Langevin Dynamics.ArXiv e-prints. 24

2017

[38] [38]

NELDER, J. A. and WEDDERBURN, R. W. M. (1972). Generalized Linear Models.Journal of the Royal Statistical Society. Series A135370–384

1972

[39] [39]

and TELGARSKY, M

RAGINSKY, M., RAKHLIN, A. and TELGARSKY, M. (2017). Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis.Proceedings of Machine Learning Research651674– 1703

2017

[40] [40]

ROBERTS, G. O. and ROSENTHAL, J. S. (1998). Optimal Scaling of Discrete Approximations to Langevin Diffusions.Journal of the Royal Statistical Society. Series B60255–268

1998

[41] [41]

ROBERTS, G. O. and TWEEDIE, R. L. (1996). Exponential Convergence of Langevin Distributions and Their Discrete Approximations.Bernoulli2341–363

1996

[42] [42]

and LEE, Y

SHEN, R. and LEE, Y. T. (2019). The randomized midpoint method for log-concave sampling. InAdvances in Neural Information Processing Systems2098–2109

2019

[43] [43]

W., THIERY, A

TEH, Y. W., THIERY, A. H. and VOLLMER, S. J. (2016). Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics.Journal of Machine Learning Research171–33

2016

[44] [44]

UHLENBECK, G. E. and ORNSTEIN, L. S. (1930). On the Theory of the Brownian Motion.Physical Review 36823–841

1930

[45] [45]

VEMPALA, S. S. and WIBISONO, A. (2019). Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices. InAdvances in Neural Information Processing Systems328092–8104

2019

[46] [46]

(2018).High-Dimensional Probability: An Introduction with Applications in Data Science

VERSHYNIN, R. (2018).High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics47. Cambridge University Press

2018

[47] [47]

and TEH, Y

WELLING, M. and TEH, Y. W. (2011). Bayesian Learning via Stochastic Gradient Langevin Dynamics. InProceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011681–688

2011

[48] [48]

WIBISONO, A. (2018). Sampling as Optimization in the Space of Measures: The Langevin Dynamics as a Composite Optimization Problem. InProceedings of the 31st Conference on Learning Theory.Pro- ceedings of Machine Learning Research752093–3027. PMLR

2018

[49] [49]

and DALALYAN, A

YU, L. and DALALYAN, A. (2025). Parallelized midpoint randomization for Langevin Monte Carlo. Stochastic Processes and their Applications190104764

2025

[50] [50]

S., CHEWI, S., LI, M., BALASUBRAMANIAN, K

ZHANG, M. S., CHEWI, S., LI, M., BALASUBRAMANIAN, K. and ERDOGDU, M. A. (2023). Improved Discretization Analysis for Underdamped Langevin Monte Carlo. InProceedings of Thirty Sixth Con- ference on Learning Theory.Proceedings of Machine Learning Research19536–71. PMLR

2023

[51] [51]

and GU, Q

ZOU, D., XU, P. and GU, Q. (2019). Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics. InProceedings of the Twenty-Second International Conference on Arti- ficial Intelligence and Statistics.Proceedings of Machine Learning Research892936–2945. PMLR

2019