When Langevin Monte Carlo Meets Randomization: New Sampling Algorithms with Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness

Bin Yang; Xiaojie Wang

arxiv: 2509.25630 · v3 · submitted 2025-09-30 · 📊 stat.ML · cs.LG· cs.NA· math.NA

When Langevin Monte Carlo Meets Randomization: New Sampling Algorithms with Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness

Xiaojie Wang , Bin Yang This is my paper

Pith reviewed 2026-05-18 13:00 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NA

keywords Langevin Monte Carlorandomized samplinglog-Sobolev inequalityWasserstein distancenon-asymptotic boundshigh-dimensional samplingnon-log-concave distributions

0 comments

The pith

Randomized splitting Langevin Monte Carlo achieves uniform W2 error bounds of order O(sqrt(d) h) under log-Sobolev inequality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the randomized splitting Langevin Monte Carlo algorithm as a computationally cheaper alternative to randomized Langevin Monte Carlo for sampling from high-dimensional distributions. It proves that both algorithms enjoy a uniform-in-time error bound of order O(sqrt(d) h) in Wasserstein-2 distance when the target satisfies the log-Sobolev inequality and has Lipschitz gradients. This rate matches the best known bounds previously available only under the stronger log-concavity assumption. The work also develops modified versions of the algorithms to handle cases where the gradient grows superlinearly, establishing corresponding non-asymptotic error bounds.

Core claim

Under the gradient Lipschitz condition and the log-Sobolev inequality, both RLMC and the newly proposed RSLMC algorithms admit a uniform-in-time error bound of order O(sqrt(d) h) in the 2-Wasserstein distance. This matches the optimal rate known for log-concave distributions. For potentials whose gradients are not globally Lipschitz but exhibit superlinear growth, modified randomized splitting and non-splitting variants are introduced, for which non-asymptotic error bounds are derived.

What carries the argument

The randomized splitting Langevin Monte Carlo (RSLMC) scheme, which interleaves randomized updates to reduce the number of gradient evaluations while preserving the convergence properties under the log-Sobolev inequality.

Load-bearing premise

The target distribution satisfies the log-Sobolev inequality.

What would settle it

A simulation or calculation on a concrete distribution that obeys the log-Sobolev inequality with Lipschitz gradients but yields W2 error larger than O(sqrt(d) h) for small step sizes h would disprove the stated bound.

read the original abstract

Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we propose a new kind of randomized splitting Langevin Monte Carlo (RSLMC) algorithm for sampling from high dimensional distributions without log-concavity. Compared with the existing randomized Langevin Monte Carlo (RLMC), the newly proposed RSLMC algorithm requires less evaluations of gradients and is thus computationally cheaper. Under the gradient Lipschitz condition and the log-Sobolev inequality, we prove a uniform-in-time error bound in $\mathcal{W}_2$-distance of order $O(\sqrt{d}h)$ for both RLMC and RSLMC sampling algorithms, which matches the best one in the literature under the log-concavity condition. Moreover, when the gradient of the potential $U$ is non-globally Lipschitz with superlinear growth, new modified R(S)LMC algorithms are introduced and analyzed, with non-asymptotic error bounds established. Numerical examples are finally reported to corroborate the theoretical findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RSLMC cuts gradient evaluations while keeping solid non-asymptotic bounds under LSI that match log-concave rates.

read the letter

Here's the quick take on this one. The main advance is the RSLMC algorithm, which uses a randomized splitting scheme to sample from distributions that satisfy the log-Sobolev inequality but not necessarily log-concavity. It cuts the number of gradient evaluations compared to standard RLMC while delivering the same order of uniform-in-time Wasserstein-2 error bound, O(sqrt(d) h), under gradient Lipschitzness. They do a decent job extending the analysis to this weaker setting and also provide modified versions for when the gradient grows superlinearly. The numerical examples are there to show it works in practice, which is helpful. On the positive side, the rates match what is known for the log-concave case, so it's not a regression in theory. The splitting idea seems like a genuine way to save computation without losing the guarantee. The soft spot is in the uniform-in-time claim. LSI gives good contraction for the continuous process, but turning that into a discrete bound without log-concavity can be delicate. The stress-test note is worth checking: does the proof really close with only the stated assumptions, or does it need extra control on moments or local behavior? If the full paper has a clean argument, great; otherwise the bound might only hold under slightly stronger conditions. This paper is aimed at people developing and analyzing MCMC methods for high-dimensional, non-convex problems. Someone looking for new algorithmic tricks with supporting theory would find it useful. I think it deserves a serious referee. The contribution is focused and the claims are checkable.

Referee Report

1 major / 2 minor

Summary. The paper proposes randomized Langevin Monte Carlo (RLMC) and a new randomized splitting Langevin Monte Carlo (RSLMC) algorithm for sampling high-dimensional distributions without log-concavity. Under gradient Lipschitzness of the potential U and the log-Sobolev inequality (LSI) on the target, it claims uniform-in-time W2 error bounds of order O(√d h) for both algorithms, matching the best known rates under log-concavity. Modified variants are introduced for non-globally Lipschitz gradients with superlinear growth, with corresponding non-asymptotic bounds derived. Numerical examples are provided to support the theory.

Significance. If the non-asymptotic bounds hold under the stated assumptions, the work meaningfully extends discretization analysis of Langevin dynamics beyond log-concavity by leveraging LSI, which permits certain non-convex targets. The uniform-in-time W2 guarantee and the computational savings from RSLMC (fewer gradient evaluations) are practically relevant. The extension to superlinear growth cases further widens applicability. Strengths include explicit rates and numerical corroboration; the result would be stronger with fully machine-checked or fully expanded discretization arguments.

major comments (1)

[Proof of uniform-in-time W2 bound (abstract and §3)] The proof of the uniform-in-time W2 bound (claimed in the abstract and likely in §3 or Theorem 3.1) under only global gradient Lipschitzness plus LSI requires explicit verification that no additional one-sided Lipschitz or moment-control assumptions are implicitly used. The continuous process contracts under LSI, but the synchronous coupling or Girsanov analysis of the randomized splitting discretization must bound accumulated local errors without strong-convexity drift; if flat regions or heavy tails consistent with LSI but not log-concavity cause the local truncation to grow, the O(√d h) uniform bound fails. Please add a dedicated lemma or remark clarifying the moment bounds employed.

minor comments (2)

[Algorithm 2] Clarify the precise definition of the randomized splitting step in RSLMC versus standard RLMC to highlight the reduction in gradient evaluations.
[Numerical experiments] In the numerical section, report the specific values of dimension d, step-size h, and the exact potentials used so that the observed W2 errors can be directly compared to the O(√d h) prediction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comment below and will revise the manuscript to incorporate the requested clarification.

read point-by-point responses

Referee: [Proof of uniform-in-time W2 bound (abstract and §3)] The proof of the uniform-in-time W2 bound (claimed in the abstract and likely in §3 or Theorem 3.1) under only global gradient Lipschitzness plus LSI requires explicit verification that no additional one-sided Lipschitz or moment-control assumptions are implicitly used. The continuous process contracts under LSI, but the synchronous coupling or Girsanov analysis of the randomized splitting discretization must bound accumulated local errors without strong-convexity drift; if flat regions or heavy tails consistent with LSI but not log-concavity cause the local truncation to grow, the O(√d h) uniform bound fails. Please add a dedicated lemma or remark clarifying the moment bounds employed.

Authors: We thank the referee for highlighting this point. Our proof proceeds by first establishing exponential contraction of the continuous Langevin process in W2 under LSI (which holds without log-concavity), then controlling the discretization error via synchronous coupling of the randomized splitting scheme together with a Girsanov change-of-measure argument. Global gradient Lipschitzness directly bounds the local truncation error per step by O(h), while LSI supplies the necessary uniform-in-time moment controls (via the associated Poincaré inequality and exponential integrability) to prevent error accumulation even in flat regions or under heavy tails permitted by LSI. No one-sided Lipschitz or extra moment assumptions are invoked beyond those stated. Nevertheless, we agree that an explicit statement would strengthen the presentation; we will therefore insert a new dedicated remark (Remark 3.2) and a short supporting lemma (Lemma 3.3) that derives the required uniform second-moment bound directly from LSI plus gradient Lipschitzness. This revision will be made in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: bounds derived from standard LSI + Lipschitz assumptions via independent analysis

full rationale

The paper derives explicit non-asymptotic uniform-in-time W2 error bounds of order O(sqrt(d) h) for RLMC and RSLMC directly from the gradient Lipschitz condition on U together with the log-Sobolev inequality on the target. These functional inequalities are external to the algorithms and are invoked as standard assumptions in the sampling literature; the discretization error analysis (via coupling or Girsanov-type arguments) produces the stated rate without reducing any claimed prediction to a fitted quantity, self-definition, or load-bearing self-citation chain. The extension beyond log-concavity is achieved by replacing strong convexity with LSI, which is a mathematically independent step rather than a renaming or smuggling of prior ansatzes. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the log-Sobolev inequality and global gradient Lipschitzness as domain assumptions standard in sampling theory; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption The target distribution satisfies the log-Sobolev inequality
Invoked to obtain uniform-in-time W2 bounds under gradient Lipschitzness for both RLMC and RSLMC.
domain assumption The gradient of the potential U is globally Lipschitz
Required for the O(sqrt(d) h) error bound; relaxed in the modified algorithms for superlinear growth.

pith-pipeline@v0.9.0 · 5735 in / 1324 out tokens · 50603 ms · 2026-05-18T13:00:59.014333+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Accelerating Langevin Monte Carlo via Efficient Stochastic Runge--Kutta Methods beyond Log-Concavity
math.ST 2026-05 unverdicted novelty 6.0

A Hessian-free stochastic Runge-Kutta LMC algorithm achieves strong order 1.5 with two gradient evaluations per step and uniform-in-time convergence O(d^{3/2} h^{3/2}) in non-log-concave settings.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper

[1]

Altschuler and Sinho Chewi

Jason M. Altschuler and Sinho Chewi. Shifted Composition III: Local Error Framework for KL Divergence. ArXiv, abs/2412.17997, 2024

work page arXiv 2024
[2]

Convergence of Langevin MCMC in KL-divergence

Xiang Cheng and Peter Bartlett. Convergence of Langevin MCMC in KL-divergence. InAlgorithmic Learning Theory, pages 186–211. PMLR, 2018

work page 2018
[3]

Chatterji, Yasin Abbasi-Yadkori, Peter L

Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, and Michael I. Jordan. Sharp convergence rates for Langevin dynamics in the nonconvex setting, 2018

work page 2018
[4]

Analysis of Langevin monte carlo from poincare to log-sobolev.Foundations of Computational Mathematics, pages 1–51, 2024

Sinho Chewi, Murat A Erdogdu, Mufan Li, Ruoqi Shen, and Matthew S Zhang. Analysis of Langevin monte carlo from poincare to log-sobolev.Foundations of Computational Mathematics, pages 1–51, 2024

work page 2024
[5]

Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm

Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, and Philippe Rigollet. Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm. InConference on Learning Theory, pages 1260–1300. PMLR, 2021

work page 2021
[6]

Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent

Arnak Dalalyan. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InConference on Learning Theory, pages 678–689. PMLR, 2017

work page 2017
[7]

Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities.Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017

work page 2017
[8]

User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019

Arnak S Dalalyan and Avetik Karagulyan. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019

work page 2019
[9]

On the randomized solution of initial value problems.Journal of Complexity, 27(3):300–311,

Thomas Daun. On the randomized solution of initial value problems.Journal of Complexity, 27(3):300–311,

work page
[10]

Analysis of Langevin Monte Carlo via convex optimization.Journal of Machine Learning Research, 20(73):1–46, 2019

Alain Durmus, Szymon Majewski, and Bła˙zej Miasojedow. Analysis of Langevin Monte Carlo via convex optimization.Journal of Machine Learning Research, 20(73):1–46, 2019

work page 2019
[11]

Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.Ann

Alain Durmus and Éric Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.Ann. Appl. Probab., 27(3):1551–1587, 2017

work page 2017
[12]

High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019

Alain Durmus and Éric Moulines. High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019

work page 2019
[13]

The randomized complexity of initial value problems.Journal of Complexity, 24(2):77–88, 2008

Stefan Heinrich and Bernhard Milla. The randomized complexity of initial value problems.Journal of Complexity, 24(2):77–88, 2008

work page 2008
[14]

Hutzenthaler, A

M. Hutzenthaler, A. Jentzen, and P. E. Kloeden. Strong and weak divergence in finite time of Euler’s method for stochastic differential equations with non-globally Lipschitz continuous coefficients.The Royal Society, 467:1563–1576, 2011

work page 2011
[15]

A random Euler scheme for Carathéodory differential equations

Arnulf Jentzen and Andreas Neuenkirch. A random Euler scheme for Carathéodory differential equations. Journal of computational and applied mathematics, 224(1):346–359, 2009

work page 2009
[16]

Error analysis of randomized Runge–Kutta methods for differential equations with time-irregular coefficients.Computational Methods in Applied Mathematics, 17(3):479–498, 2017

Raphael Kruse and Yue Wu. Error analysis of randomized Runge–Kutta methods for differential equations with time-irregular coefficients.Computational Methods in Applied Mathematics, 17(3):479–498, 2017. 9

work page 2017
[17]

A randomized Milstein method for stochastic differential equations with non-differentiable drift coefficients.Discrete and Continuous Dynamical Systems-B, 24(8):3475–3502, 2019

Raphael Kruse and Yue Wu. A randomized Milstein method for stochastic differential equations with non-differentiable drift coefficients.Discrete and Continuous Dynamical Systems-B, 24(8):3475–3502, 2019

work page 2019
[18]

Sqrt (d) Dimension Dependence of Langevin Monte Carlo

Ruilin Li, Hongyuan Zha, and Molei Tao. Sqrt (d) Dimension Dependence of Langevin Monte Carlo. In The International Conference on Learning Representations, 2022

work page 2022
[19]

Unadjusted Langevin algorithms for SDEs with Hölder drift

Xiang Li, Fengyu Wang, and Lihu Xu. Unadjusted Langevin algorithms for SDEs with Hölder drift. Science China Mathematics, pages 1–26, 2025

work page 2025
[20]

Stochastic Runge-Kutta accelerates Langevin Monte Carlo and beyond.Advances in neural information processing systems, 32, 2019

Xuechen Li, Yi Wu, Lester Mackey, and Murat A Erdogdu. Stochastic Runge-Kutta accelerates Langevin Monte Carlo and beyond.Advances in neural information processing systems, 32, 2019

work page 2019
[21]

Springer, 2001

Jun S Liu and Jun S Liu.Monte Carlo strategies in scientific computing, volume 10. Springer, 2001

work page 2001
[22]

Non-asymptotic Error Bounds for Randomized Kinetic Langevin Monte Carlo without Log-Concavity.Preprint, 2025

Wanjie Lv, Xiaojie Wang, and Bin Yang. Non-asymptotic Error Bounds for Randomized Kinetic Langevin Monte Carlo without Log-Concavity.Preprint, 2025

work page 2025
[23]

Tamed Langevin sampling under weaker conditions.arXiv preprint arXiv:2405.17693, 2024

Iosif Lytras and Panayotis Mertikopoulos. Tamed Langevin sampling under weaker conditions.arXiv preprint arXiv:2405.17693, 2024

work page arXiv 2024
[24]

Taming under isoperimetry.Stochastic Processes and their Applications, page 104684, 2025

Iosif Lytras and Sotirios Sabanis. Taming under isoperimetry.Stochastic Processes and their Applications, page 104684, 2025

work page 2025
[25]

Non-asymptotic bounds for sampling algorithms without log-concavity.Annals of Applied Probability, 30(4):1534–1581, 2020

Mateusz B Majka, Aleksandar Mijatovi ´c, and Lukasz Szpruch. Non-asymptotic bounds for sampling algorithms without log-concavity.Annals of Applied Probability, 30(4):1534–1581, 2020

work page 2020
[26]

Wainwright, and Peter L

Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, and Peter L. Bartlett. Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity.Bernoulli, 28(3):1577 – 1601, 2022

work page 2022
[27]

Non-asymptotic convergence bounds for modified tamed unad- justed Langevin algorithm in non-convex setting.Journal of Mathematical Analysis and Applications, 543(1):128892, 2025

Ariel Neufeld, Ying Zhang, et al. Non-asymptotic convergence bounds for modified tamed unad- justed Langevin algorithm in non-convex setting.Journal of Mathematical Analysis and Applications, 543(1):128892, 2025

work page 2025
[28]

Unadjusted Langevin algorithm with multiplicative noise: Total variation and Wasserstein bounds.The Annals of Applied Probability, 33(1):726–779, 2023

Gilles Pagès and Fabien Panloup. Unadjusted Langevin algorithm with multiplicative noise: Total variation and Wasserstein bounds.The Annals of Applied Probability, 33(1):726–779, 2023

work page 2023
[29]

Projected Langevin Monte Carlo algorithms in non-convex and super-linear setting.Journal of Computational Physics, 526:113754, 2025

Chenxu Pang, Xiaojie Wang, and Yue Wu. Projected Langevin Monte Carlo algorithms in non-convex and super-linear setting.Journal of Computational Physics, 526:113754, 2025

work page 2025
[30]

Springer, 2014

Grigorios A Pavliotis.Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, volume 60. Springer, 2014

work page 2014
[31]

Strong approximation of solutions of stochastic differential equations with time-irregular coefficients via randomized Euler algorithm.Applied Numerical Mathematics, 78:80–94, 2014

Paweł Przybyłowicz and Paweł Morkisz. Strong approximation of solutions of stochastic differential equations with time-irregular coefficients via randomized Euler algorithm.Applied Numerical Mathematics, 78:80–94, 2014

work page 2014
[32]

Springer, 1999

Christian P Robert, George Casella, and George Casella.Monte Carlo statistical methods, volume 2. Springer, 1999

work page 1999
[33]

Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, 2(4):341–363, 1996

Gareth O Roberts and Richard L Tweedie. Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, 2(4):341–363, 1996

work page 1996
[34]

Higher order Langevin Monte Carlo algorithm.Electronic Journal of Statistics, 13:3805–3850, 2019

Sotirios Sabanis and Ying Zhang. Higher order Langevin Monte Carlo algorithm.Electronic Journal of Statistics, 13:3805–3850, 2019

work page 2019
[35]

The randomized midpoint method for log-concave sampling.Advances in Neural Information Processing Systems, 32, 2019

Ruoqi Shen and Yin Tat Lee. The randomized midpoint method for log-concave sampling.Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[36]

Numerical methods for systems with measurable coefficients.Appl

Gilbert Stengle. Numerical methods for systems with measurable coefficients.Appl. Math. Lett., 3(4):25– 29, 1990

work page 1990
[37]

Error analysis of a randomized numerical method.Numer

Gilbert Stengle. Error analysis of a randomized numerical method.Numer. Math., 70(1):119–128, 1995

work page 1995
[38]

Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices.Advances in neural information processing systems, 32, 2019

Santosh Vempala and Andre Wibisono. Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices.Advances in neural information processing systems, 32, 2019

work page 2019
[39]

Exponential contraction in Wasserstein distances for diffusion semigroups with negative curvature.Potential Anal., 53(3):1123–1144, 2020

Feng-Yu Wang. Exponential contraction in Wasserstein distances for diffusion semigroups with negative curvature.Potential Anal., 53(3):1123–1144, 2020. 10

work page 2020
[40]

Accelerating Langevin Monte Carlo via Stochastic Runge-Kutta beyond Log-Concavity.Preprint, 2025

Xiaojie Wang and Bin Yang. Accelerating Langevin Monte Carlo via Stochastic Runge-Kutta beyond Log-Concavity.Preprint, 2025

work page 2025
[41]

Non-asymptotic Error Bounds in W2-Distance with Sqrt(d) Dimension Dependence and First Order Convergence for Langevin Monte Carlo beyond Log-Concavity

Bin Yang and Xiaojie Wang. Non-asymptotic Error Bounds in W2-Distance with Sqrt(d) Dimension Dependence and First Order Convergence for Langevin Monte Carlo beyond Log-Concavity. InForty- second International Conference on Machine Learning, 2025

work page 2025
[42]

Langevin monte carlo for strongly log-concave distribu- tions: Randomized midpoint revisited

Lu Yu, Avetik Karagulyan, and Arnak Dalalyan. Langevin monte carlo for strongly log-concave distribu- tions: Randomized midpoint revisited. InICLR International Conference on Learning Representations, 2024. A Proofs of results in Subsection 3.1 A.1 Proof of Lemma 3.2 Proof of Lemma 3.2We first recast the RLMC (7) as, for anyn∈N 0, Yn+1 =Y n − ∇U(Y n)h+ √ ...

work page 2024
[43]

(66) Thus, we derive the desired assertion. 14 B Proof of Theorem 3.4 Proof of Theorem 3.4By employing the triangle inequality, we obtain that for anyn≥n 1, W2 νqn, π ≤ W2 νqn−n1 qn1 , νqn−n1 pn1h +W 2 νqn−n1 pn1h, π .(67) Now, we estimateW 2(νqn−n1 qn1 , νqn−n1 pn1h)andW 2(νqn−n1 pn1h, π), separately. Note that W2 νqn−n1 qn1 , νqn−n1 pn1h =W 2 L(Y(t n−n1...

work page
[44]

Now, we estimate the second term for two case: k= 1 and k≥2

+ 2µ′ and for short we denote Ξn+1 := 2 √ 2 T h( ¯Yn),∆W n+1 + 6 ∆Wn+1 2 +C F ∆W τ n+1 2 +C M dh.(97) Forp∈N, takingp-th power and then expectations, the binomial expansion theorem implies E h ¯Yn+1 2pi ≤ 1− 3µh 2 p E h T h( ¯Yn) 2pi + pX k=1 C p k 1− 3µh 2 p−k E h T h( ¯Yn) 2p−2k (Ξn+1)k i , (98) where C p k := p! k!(p−k)! . Now, we estimate the second t...

work page
[45]

Keep this in mind, one can derive from (100) that E h (Ξn+1)k Ftn i ≤C T h( ¯Yn) k E h ∆Wn+1 k Ftn i +E h ∆Wn+1 2k Ftn i +E h ∆W τ n+1 2k Ftn i +d khk ≤C (k−1)!!d k 2 h k 2 T h( ¯Yn) k + (2k−1)!!d khk + (2k−1)!!d khk +d khk . (104) So, we get, fork≥2, C p k 1− 3µh 2 p−k E h T h( ¯Yn) 2p−2k (Ξn+1)k i ≤C p k C 1− 3µh 2 p−k d k 2 h k 2 E h T h( ¯Yn) 2p−ki +C...

work page
[46]

By iteration, we employ1−u≤e −u, u >0to acquire E h ¯Yn+1 2pi ≤ 1− µh 2 n+1 E |x0|2p +M 3dph nX i=1 1− µh 2 i ≤e− µtn+1 2 E |x0|2p + 2M3dp µ

Putting this into (98), one can use 1− 3µh 2 p ≤1− 3µh 2 ,p≥1to obtain E h ¯Yn+1 2pi ≤ 1− 3µh 2 p E h T h( ¯Yn) 2pi +µhE h T h( ¯Yn) 2pi +M 3dph ≤ 1− µh 2 E h T h( ¯Yn) 2pi +M 3dph ≤ 1− µh 2 E h ¯Yn 2pi +M 3dph, (111) where we used (88) in the last step. By iteration, we employ1−u≤e −u, u >0to acquire E h ¯Yn+1 2pi ≤ 1− µh 2 n+1 E |x0|2p +M 3dph nX i=1 1−...

work page
[47]

Proof of Lemma 3.8In light of Theorem 3.3 of [ 41], one can combine Assumptions 2.1, 3.6, and Lemmas 3.7,C.3,to obtain the desired assertion

Thus, we finish this proof. Proof of Lemma 3.8In light of Theorem 3.3 of [ 41], one can combine Assumptions 2.1, 3.6, and Lemmas 3.7,C.3,to obtain the desired assertion. 23

work page

[1] [1]

Altschuler and Sinho Chewi

Jason M. Altschuler and Sinho Chewi. Shifted Composition III: Local Error Framework for KL Divergence. ArXiv, abs/2412.17997, 2024

work page arXiv 2024

[2] [2]

Convergence of Langevin MCMC in KL-divergence

Xiang Cheng and Peter Bartlett. Convergence of Langevin MCMC in KL-divergence. InAlgorithmic Learning Theory, pages 186–211. PMLR, 2018

work page 2018

[3] [3]

Chatterji, Yasin Abbasi-Yadkori, Peter L

Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, and Michael I. Jordan. Sharp convergence rates for Langevin dynamics in the nonconvex setting, 2018

work page 2018

[4] [4]

Analysis of Langevin monte carlo from poincare to log-sobolev.Foundations of Computational Mathematics, pages 1–51, 2024

Sinho Chewi, Murat A Erdogdu, Mufan Li, Ruoqi Shen, and Matthew S Zhang. Analysis of Langevin monte carlo from poincare to log-sobolev.Foundations of Computational Mathematics, pages 1–51, 2024

work page 2024

[5] [5]

Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm

Sinho Chewi, Chen Lu, Kwangjun Ahn, Xiang Cheng, Thibaut Le Gouic, and Philippe Rigollet. Optimal dimension dependence of the Metropolis-adjusted Langevin algorithm. InConference on Learning Theory, pages 1260–1300. PMLR, 2021

work page 2021

[6] [6]

Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent

Arnak Dalalyan. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InConference on Learning Theory, pages 678–689. PMLR, 2017

work page 2017

[7] [7]

Arnak S Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities.Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017

work page 2017

[8] [8]

User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019

Arnak S Dalalyan and Avetik Karagulyan. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019

work page 2019

[9] [9]

On the randomized solution of initial value problems.Journal of Complexity, 27(3):300–311,

Thomas Daun. On the randomized solution of initial value problems.Journal of Complexity, 27(3):300–311,

work page

[10] [10]

Analysis of Langevin Monte Carlo via convex optimization.Journal of Machine Learning Research, 20(73):1–46, 2019

Alain Durmus, Szymon Majewski, and Bła˙zej Miasojedow. Analysis of Langevin Monte Carlo via convex optimization.Journal of Machine Learning Research, 20(73):1–46, 2019

work page 2019

[11] [11]

Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.Ann

Alain Durmus and Éric Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm.Ann. Appl. Probab., 27(3):1551–1587, 2017

work page 2017

[12] [12]

High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019

Alain Durmus and Éric Moulines. High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019

work page 2019

[13] [13]

The randomized complexity of initial value problems.Journal of Complexity, 24(2):77–88, 2008

Stefan Heinrich and Bernhard Milla. The randomized complexity of initial value problems.Journal of Complexity, 24(2):77–88, 2008

work page 2008

[14] [14]

Hutzenthaler, A

M. Hutzenthaler, A. Jentzen, and P. E. Kloeden. Strong and weak divergence in finite time of Euler’s method for stochastic differential equations with non-globally Lipschitz continuous coefficients.The Royal Society, 467:1563–1576, 2011

work page 2011

[15] [15]

A random Euler scheme for Carathéodory differential equations

Arnulf Jentzen and Andreas Neuenkirch. A random Euler scheme for Carathéodory differential equations. Journal of computational and applied mathematics, 224(1):346–359, 2009

work page 2009

[16] [16]

Error analysis of randomized Runge–Kutta methods for differential equations with time-irregular coefficients.Computational Methods in Applied Mathematics, 17(3):479–498, 2017

Raphael Kruse and Yue Wu. Error analysis of randomized Runge–Kutta methods for differential equations with time-irregular coefficients.Computational Methods in Applied Mathematics, 17(3):479–498, 2017. 9

work page 2017

[17] [17]

A randomized Milstein method for stochastic differential equations with non-differentiable drift coefficients.Discrete and Continuous Dynamical Systems-B, 24(8):3475–3502, 2019

Raphael Kruse and Yue Wu. A randomized Milstein method for stochastic differential equations with non-differentiable drift coefficients.Discrete and Continuous Dynamical Systems-B, 24(8):3475–3502, 2019

work page 2019

[18] [18]

Sqrt (d) Dimension Dependence of Langevin Monte Carlo

Ruilin Li, Hongyuan Zha, and Molei Tao. Sqrt (d) Dimension Dependence of Langevin Monte Carlo. In The International Conference on Learning Representations, 2022

work page 2022

[19] [19]

Unadjusted Langevin algorithms for SDEs with Hölder drift

Xiang Li, Fengyu Wang, and Lihu Xu. Unadjusted Langevin algorithms for SDEs with Hölder drift. Science China Mathematics, pages 1–26, 2025

work page 2025

[20] [20]

Stochastic Runge-Kutta accelerates Langevin Monte Carlo and beyond.Advances in neural information processing systems, 32, 2019

Xuechen Li, Yi Wu, Lester Mackey, and Murat A Erdogdu. Stochastic Runge-Kutta accelerates Langevin Monte Carlo and beyond.Advances in neural information processing systems, 32, 2019

work page 2019

[21] [21]

Springer, 2001

Jun S Liu and Jun S Liu.Monte Carlo strategies in scientific computing, volume 10. Springer, 2001

work page 2001

[22] [22]

Non-asymptotic Error Bounds for Randomized Kinetic Langevin Monte Carlo without Log-Concavity.Preprint, 2025

Wanjie Lv, Xiaojie Wang, and Bin Yang. Non-asymptotic Error Bounds for Randomized Kinetic Langevin Monte Carlo without Log-Concavity.Preprint, 2025

work page 2025

[23] [23]

Tamed Langevin sampling under weaker conditions.arXiv preprint arXiv:2405.17693, 2024

Iosif Lytras and Panayotis Mertikopoulos. Tamed Langevin sampling under weaker conditions.arXiv preprint arXiv:2405.17693, 2024

work page arXiv 2024

[24] [24]

Taming under isoperimetry.Stochastic Processes and their Applications, page 104684, 2025

Iosif Lytras and Sotirios Sabanis. Taming under isoperimetry.Stochastic Processes and their Applications, page 104684, 2025

work page 2025

[25] [25]

Non-asymptotic bounds for sampling algorithms without log-concavity.Annals of Applied Probability, 30(4):1534–1581, 2020

Mateusz B Majka, Aleksandar Mijatovi ´c, and Lukasz Szpruch. Non-asymptotic bounds for sampling algorithms without log-concavity.Annals of Applied Probability, 30(4):1534–1581, 2020

work page 2020

[26] [26]

Wainwright, and Peter L

Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, and Peter L. Bartlett. Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity.Bernoulli, 28(3):1577 – 1601, 2022

work page 2022

[27] [27]

Non-asymptotic convergence bounds for modified tamed unad- justed Langevin algorithm in non-convex setting.Journal of Mathematical Analysis and Applications, 543(1):128892, 2025

Ariel Neufeld, Ying Zhang, et al. Non-asymptotic convergence bounds for modified tamed unad- justed Langevin algorithm in non-convex setting.Journal of Mathematical Analysis and Applications, 543(1):128892, 2025

work page 2025

[28] [28]

Unadjusted Langevin algorithm with multiplicative noise: Total variation and Wasserstein bounds.The Annals of Applied Probability, 33(1):726–779, 2023

Gilles Pagès and Fabien Panloup. Unadjusted Langevin algorithm with multiplicative noise: Total variation and Wasserstein bounds.The Annals of Applied Probability, 33(1):726–779, 2023

work page 2023

[29] [29]

Projected Langevin Monte Carlo algorithms in non-convex and super-linear setting.Journal of Computational Physics, 526:113754, 2025

Chenxu Pang, Xiaojie Wang, and Yue Wu. Projected Langevin Monte Carlo algorithms in non-convex and super-linear setting.Journal of Computational Physics, 526:113754, 2025

work page 2025

[30] [30]

Springer, 2014

Grigorios A Pavliotis.Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, volume 60. Springer, 2014

work page 2014

[31] [31]

Strong approximation of solutions of stochastic differential equations with time-irregular coefficients via randomized Euler algorithm.Applied Numerical Mathematics, 78:80–94, 2014

Paweł Przybyłowicz and Paweł Morkisz. Strong approximation of solutions of stochastic differential equations with time-irregular coefficients via randomized Euler algorithm.Applied Numerical Mathematics, 78:80–94, 2014

work page 2014

[32] [32]

Springer, 1999

Christian P Robert, George Casella, and George Casella.Monte Carlo statistical methods, volume 2. Springer, 1999

work page 1999

[33] [33]

Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, 2(4):341–363, 1996

Gareth O Roberts and Richard L Tweedie. Exponential convergence of Langevin distributions and their discrete approximations.Bernoulli, 2(4):341–363, 1996

work page 1996

[34] [34]

Higher order Langevin Monte Carlo algorithm.Electronic Journal of Statistics, 13:3805–3850, 2019

Sotirios Sabanis and Ying Zhang. Higher order Langevin Monte Carlo algorithm.Electronic Journal of Statistics, 13:3805–3850, 2019

work page 2019

[35] [35]

The randomized midpoint method for log-concave sampling.Advances in Neural Information Processing Systems, 32, 2019

Ruoqi Shen and Yin Tat Lee. The randomized midpoint method for log-concave sampling.Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[36] [36]

Numerical methods for systems with measurable coefficients.Appl

Gilbert Stengle. Numerical methods for systems with measurable coefficients.Appl. Math. Lett., 3(4):25– 29, 1990

work page 1990

[37] [37]

Error analysis of a randomized numerical method.Numer

Gilbert Stengle. Error analysis of a randomized numerical method.Numer. Math., 70(1):119–128, 1995

work page 1995

[38] [38]

Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices.Advances in neural information processing systems, 32, 2019

Santosh Vempala and Andre Wibisono. Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices.Advances in neural information processing systems, 32, 2019

work page 2019

[39] [39]

Exponential contraction in Wasserstein distances for diffusion semigroups with negative curvature.Potential Anal., 53(3):1123–1144, 2020

Feng-Yu Wang. Exponential contraction in Wasserstein distances for diffusion semigroups with negative curvature.Potential Anal., 53(3):1123–1144, 2020. 10

work page 2020

[40] [40]

Accelerating Langevin Monte Carlo via Stochastic Runge-Kutta beyond Log-Concavity.Preprint, 2025

Xiaojie Wang and Bin Yang. Accelerating Langevin Monte Carlo via Stochastic Runge-Kutta beyond Log-Concavity.Preprint, 2025

work page 2025

[41] [41]

Non-asymptotic Error Bounds in W2-Distance with Sqrt(d) Dimension Dependence and First Order Convergence for Langevin Monte Carlo beyond Log-Concavity

Bin Yang and Xiaojie Wang. Non-asymptotic Error Bounds in W2-Distance with Sqrt(d) Dimension Dependence and First Order Convergence for Langevin Monte Carlo beyond Log-Concavity. InForty- second International Conference on Machine Learning, 2025

work page 2025

[42] [42]

Langevin monte carlo for strongly log-concave distribu- tions: Randomized midpoint revisited

Lu Yu, Avetik Karagulyan, and Arnak Dalalyan. Langevin monte carlo for strongly log-concave distribu- tions: Randomized midpoint revisited. InICLR International Conference on Learning Representations, 2024. A Proofs of results in Subsection 3.1 A.1 Proof of Lemma 3.2 Proof of Lemma 3.2We first recast the RLMC (7) as, for anyn∈N 0, Yn+1 =Y n − ∇U(Y n)h+ √ ...

work page 2024

[43] [43]

(66) Thus, we derive the desired assertion. 14 B Proof of Theorem 3.4 Proof of Theorem 3.4By employing the triangle inequality, we obtain that for anyn≥n 1, W2 νqn, π ≤ W2 νqn−n1 qn1 , νqn−n1 pn1h +W 2 νqn−n1 pn1h, π .(67) Now, we estimateW 2(νqn−n1 qn1 , νqn−n1 pn1h)andW 2(νqn−n1 pn1h, π), separately. Note that W2 νqn−n1 qn1 , νqn−n1 pn1h =W 2 L(Y(t n−n1...

work page

[44] [44]

Now, we estimate the second term for two case: k= 1 and k≥2

+ 2µ′ and for short we denote Ξn+1 := 2 √ 2 T h( ¯Yn),∆W n+1 + 6 ∆Wn+1 2 +C F ∆W τ n+1 2 +C M dh.(97) Forp∈N, takingp-th power and then expectations, the binomial expansion theorem implies E h ¯Yn+1 2pi ≤ 1− 3µh 2 p E h T h( ¯Yn) 2pi + pX k=1 C p k 1− 3µh 2 p−k E h T h( ¯Yn) 2p−2k (Ξn+1)k i , (98) where C p k := p! k!(p−k)! . Now, we estimate the second t...

work page

[45] [45]

Keep this in mind, one can derive from (100) that E h (Ξn+1)k Ftn i ≤C T h( ¯Yn) k E h ∆Wn+1 k Ftn i +E h ∆Wn+1 2k Ftn i +E h ∆W τ n+1 2k Ftn i +d khk ≤C (k−1)!!d k 2 h k 2 T h( ¯Yn) k + (2k−1)!!d khk + (2k−1)!!d khk +d khk . (104) So, we get, fork≥2, C p k 1− 3µh 2 p−k E h T h( ¯Yn) 2p−2k (Ξn+1)k i ≤C p k C 1− 3µh 2 p−k d k 2 h k 2 E h T h( ¯Yn) 2p−ki +C...

work page

[46] [46]

By iteration, we employ1−u≤e −u, u >0to acquire E h ¯Yn+1 2pi ≤ 1− µh 2 n+1 E |x0|2p +M 3dph nX i=1 1− µh 2 i ≤e− µtn+1 2 E |x0|2p + 2M3dp µ

Putting this into (98), one can use 1− 3µh 2 p ≤1− 3µh 2 ,p≥1to obtain E h ¯Yn+1 2pi ≤ 1− 3µh 2 p E h T h( ¯Yn) 2pi +µhE h T h( ¯Yn) 2pi +M 3dph ≤ 1− µh 2 E h T h( ¯Yn) 2pi +M 3dph ≤ 1− µh 2 E h ¯Yn 2pi +M 3dph, (111) where we used (88) in the last step. By iteration, we employ1−u≤e −u, u >0to acquire E h ¯Yn+1 2pi ≤ 1− µh 2 n+1 E |x0|2p +M 3dph nX i=1 1−...

work page

[47] [47]

Proof of Lemma 3.8In light of Theorem 3.3 of [ 41], one can combine Assumptions 2.1, 3.6, and Lemmas 3.7,C.3,to obtain the desired assertion

Thus, we finish this proof. Proof of Lemma 3.8In light of Theorem 3.3 of [ 41], one can combine Assumptions 2.1, 3.6, and Lemmas 3.7,C.3,to obtain the desired assertion. 23

work page