Ergodicity of Langevin Dynamics and its Discretizations for Non-smooth Potentials

Andreas Habring; Lorenz Fruehwirth

arxiv: 2411.12051 · v2 · pith:O4WECHOZnew · submitted 2024-11-18 · 🧮 math.NA · cs.NA· math.OC

Ergodicity of Langevin Dynamics and its Discretizations for Non-smooth Potentials

Lorenz Fruehwirth , Andreas Habring This is my paper

Pith reviewed 2026-05-25 08:34 UTC · model grok-4.3

classification 🧮 math.NA cs.NAmath.OC

keywords ergodicityLangevin dynamicsnon-smooth potentialsMarkov chain Monte Carlogeometric ergodicitylaw of large numberssubgradient methodssampling

0 comments

The pith

Subgradient Langevin dynamics converge exponentially to the target Gibbs distribution for strongly convex but non-differentiable potentials, with geometrically ergodic discretizations satisfying the law of large numbers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Markov chain Monte Carlo sampling via Langevin dynamics remains valid when the potential is strongly convex yet lacks differentiability. In continuous time the subgradient version of the dynamics reaches the target density exponentially fast. Explicit and semi-implicit time discretizations reach geometric ergodicity, approach the target as the step size vanishes, and obey a law of large numbers so that averages along a single trajectory estimate expectations. This matters for applications such as imaging, where many natural potentials are convex but not smooth and where reusing consecutive samples cuts computational cost.

Core claim

For potentials U that are strongly convex but possibly non-differentiable, the subgradient Langevin dynamics are exponentially ergodic to the target density π(x) ∝ e^{-U(x)} in continuous time. Certain explicit and semi-implicit discretizations are geometrically ergodic, converge to π as the discretization step size tends to zero, and satisfy the law of large numbers, allowing consecutive iterates of the Markov chain to compute statistics of the stationary distribution.

What carries the argument

Subgradient Langevin dynamics, obtained by replacing the gradient of U in the drift of the overdamped Langevin equation with an element of its subdifferential.

If this is right

The continuous subgradient dynamics contract exponentially toward π.
The discrete schemes become arbitrarily accurate approximations to π for sufficiently small step sizes.
A single long trajectory from any of the discrete schemes can be used to estimate expectations via simple averaging.
The results apply directly to sampling problems in imaging where the potential is convex but non-smooth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same contraction arguments might adapt to other first-order stochastic processes whose drift is set-valued.
In practice the law of large numbers reduces the need to run many independent chains, lowering total wall-clock time for Monte Carlo estimates.
The geometric rates could be used to derive explicit error bounds that incorporate both discretization and mixing time.

Load-bearing premise

The potential U must be strongly convex.

What would settle it

A numerical experiment on a strongly convex non-differentiable U where the empirical distribution of the discretized chain fails to approach π as the step size is driven to zero, or where the chain exhibits no geometric convergence rate, would falsify the claims.

Figures

Figures reproduced from arXiv: 2411.12051 by Andreas Habring, Lorenz Fruehwirth.

**Figure 1.** Figure 1: Potential U(x) = F(x) +G(x). On the left we show the Wasserstein-2 and on the right the total variation distance between samples and target π for different step sizes and methods. the Wasserstein-2 and—since the convergence of MYULA is proven in TV [Durmus et al., 2022, Theorem 2]—the total variation distance between the target density and the iterates of the proposed algorithms and MYULA for different s… view at source ↗

**Figure 2.** Figure 2: PotentialU(x) = F(x)+G(K x) with linear operator inG. On the left we show the Wasserstein2 and on the right the total variation distance between samples and target π for different step sizes and methods. 6.2 Imaging examples In this section we show two applications in the context of inverse imaging problems, namely image denoising and deconvolution. In both cases, we define G as the total variation functi… view at source ↗

**Figure 3.** Figure 3: Denoising: estimated expected values and variances. From left to right: Corrupted image [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Denoising: L2-error of estimated expected value (left) and variance (right) of the proposed explicit scheme and MYULA each compared to BP results for the peppers image. We use a burnin phase of 5e5. The symbols x¯k ,σk denote the emprical expected value and variance using k successive iterates, x¯,σ the estimates from BP. 6.2.2 Image Deconvolution As a last experiment we consider image deconvolution. That … view at source ↗

**Figure 5.** Figure 5: Deconvolution: estimated expected values and variances. From left to right: Corrupted [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: Denoising: L2-error between estimated expected value (left) and variance (right) of the proposed explicit scheme and MYULA each compared to BP results for the peppers image. We use a burnin phase of 5e5. The symbols x¯k ,σk denote the emprical expected value and variance using k successive iterates, x¯,σ the estimates from BP. and we denote M = maxx∈Ω |U(x)|. Let ψ ∈C ∞(R) be such that ψ(t) =    1 |… view at source ↗

read the original abstract

This article is concerned with sampling from Gibbs distributions $\pi(x)\propto e^{-U(x)}$ using Markov chain Monte Carlo methods. In particular, we investigate Langevin dynamics in the continuous- and the discrete-time setting for such distributions with potentials $U(x)$ which are strongly-convex but possibly non-differentiable. We show that the corresponding subgradient Langevin dynamics are exponentially ergodic to the target density $\pi$ in the continuous setting and that certain explicit as well as semi-implicit discretizations are geometrically ergodic and approximate $\pi$ for vanishing discretization step size. Moreover, we prove that the discrete schemes satisfy the law of large numbers allowing to use consecutive iterates of a Markov chain in order to compute statistics of the stationary distribution posing a significant reduction of computational complexity in practice. Numerical experiments are provided confirming the theoretical findings and showcasing the practical relevance of the proposed methods in imaging applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends ergodicity results for subgradient Langevin dynamics to non-smooth strongly convex potentials, with geometric rates and LLN for some discretizations.

read the letter

The punchline is that this work proves exponential ergodicity for the continuous subgradient Langevin dynamics and geometric ergodicity plus the law of large numbers for explicit and semi-implicit discretizations, all under strong convexity but allowing non-differentiability. This is new in the sense that it combines these properties for the non-smooth case, building on prior Langevin analysis. It does well by addressing a practically relevant class of potentials used in imaging, with numerical experiments to support the theory. The LLN result stands out because it lets practitioners use consecutive iterates from the Markov chain to estimate statistics without extra steps. The main limitation is the dependence on strong convexity for the contraction arguments; the results do not extend to non-strongly convex cases. The paper needs to ensure the subdifferential is handled correctly to maintain strong monotonicity and well-posedness of the dynamics. If the proofs are complete and the error bounds are explicit, that would strengthen it. This paper is for specialists in sampling methods and numerical analysis of SDEs. A reader interested in MCMC for convex optimization would find value in the guarantees and the practical LLN aspect. It deserves a serious referee because the claims are specific and the topic has clear applications, even though the advance is incremental within the field.

Referee Report

3 major / 2 minor

Summary. The paper proves that for strongly convex but possibly non-differentiable potentials U, the continuous-time subgradient Langevin dynamics are exponentially ergodic to the target Gibbs measure π; explicit and semi-implicit discretizations are geometrically ergodic, converge to π as the step size vanishes, and satisfy a law of large numbers that permits using consecutive chain iterates for Monte Carlo estimation. Numerical experiments on imaging problems are included to illustrate the results.

Significance. If the proofs are correct, the work provides a rigorous extension of ergodicity theory to non-smooth convex potentials that arise in total-variation or indicator-constrained sampling problems. The LLN result is practically useful because it removes the need for burn-in or thinning. The combination of continuous and discrete analysis plus reproducible numerics strengthens the contribution.

major comments (3)

[§2.2, Theorem 3.1] §2.2 and Theorem 3.1: the strong monotonicity of the subdifferential ∂U is invoked to obtain the contraction in Wasserstein distance, but the argument requires verifying that the multi-valued inclusion remains well-posed and that the resolvent is single-valued or appropriately measurable; the current sketch does not explicitly cite the required measurability or selection theorem.
[Theorem 4.3] Theorem 4.3 (geometric ergodicity of the semi-implicit scheme): the step-size restriction appears to depend on the strong-convexity modulus μ and the Lipschitz constant of the smooth part; it is unclear whether the bound remains uniform when the non-smooth part is only convex (e.g., an indicator function) rather than strongly convex.
[§5] §5 (LLN): the proof of the law of large numbers for the discrete chain relies on geometric ergodicity plus a moment bound; the moment bound is stated to follow from the same Lyapunov function used for ergodicity, but the verification that the Lyapunov function works uniformly in the discretization parameter h is only sketched.

minor comments (2)

[§2] Notation for the subdifferential is introduced in §2 but the precise definition of the subgradient Langevin SDE (equation (2.3)) should explicitly indicate the measurable selection used for the driving noise term.
[Numerical experiments] Figure 1 and the imaging experiments: the caption should state the precise value of the regularization parameter and the discretization step size h used, to allow direct comparison with the theoretical rates.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments highlight areas where additional rigor and clarification will strengthen the presentation. We address each major comment point-by-point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§2.2, Theorem 3.1] §2.2 and Theorem 3.1: the strong monotonicity of the subdifferential ∂U is invoked to obtain the contraction in Wasserstein distance, but the argument requires verifying that the multi-valued inclusion remains well-posed and that the resolvent is single-valued or appropriately measurable; the current sketch does not explicitly cite the required measurability or selection theorem.

Authors: We agree that the sketch in §2.2 can be made fully rigorous by citing the appropriate background. Strong monotonicity of ∂U (which follows from μ-strong convexity of U) implies that the resolvent is single-valued. Well-posedness of the differential inclusion and measurability of selections are standard consequences of convex analysis. In the revision we will add a short paragraph in §2.2 referencing Theorem 8.1.3 of Aubin–Frankowska (Set-Valued Analysis) for measurable selection and noting that the strong-monotonicity assumption guarantees a unique absolutely continuous solution. This does not change the statement or proof of Theorem 3.1. revision: yes
Referee: [Theorem 4.3] Theorem 4.3 (geometric ergodicity of the semi-implicit scheme): the step-size restriction appears to depend on the strong-convexity modulus μ and the Lipschitz constant of the smooth part; it is unclear whether the bound remains uniform when the non-smooth part is only convex (e.g., an indicator function) rather than strongly convex.

Authors: The step-size condition in Theorem 4.3 is derived from the μ-strong convexity of the full potential U = f + g, where f is smooth and L-Lipschitz and g is convex (possibly non-strongly convex). When g is an indicator of a convex set, strong convexity is supplied entirely by f; the same algebraic estimates used in the proof continue to hold with the same μ and L, so the bound remains uniform in this case. We will insert a remark after Theorem 4.3 explicitly stating that the result applies verbatim when g is merely convex (including indicators) provided U itself is μ-strongly convex, and we will verify that no additional restriction on h arises. revision: yes
Referee: [§5] §5 (LLN): the proof of the law of large numbers for the discrete chain relies on geometric ergodicity plus a moment bound; the moment bound is stated to follow from the same Lyapunov function used for ergodicity, but the verification that the Lyapunov function works uniformly in the discretization parameter h is only sketched.

Authors: We acknowledge that the uniformity of the moment bound with respect to h deserves a more detailed argument. The Lyapunov function V constructed for geometric ergodicity satisfies a uniform drift condition for all h ≤ h0 (with h0 depending only on μ and L). In the revision we will expand the proof of the moment bound in §5 by explicitly tracking the constants and showing that E[V(X_k)] remains bounded by a constant independent of h (for h small) and of the iteration index k. This will make the application of the ergodic theorem for the LLN fully rigorous. revision: yes

Circularity Check

0 steps flagged

No circularity: direct mathematical proofs of ergodicity

full rationale

The paper establishes exponential ergodicity for the continuous subgradient Langevin dynamics and geometric ergodicity plus LLN for explicit/semi-implicit discretizations via contraction arguments in Wasserstein distance that rely on the strong-convexity assumption and subdifferential monotonicity. These are standard first-principles derivations from the SDE/inclusion properties and do not reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations. The derivation chain is self-contained against external benchmarks such as existing ergodicity theory for convex potentials.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the domain assumption of strong convexity of U together with standard properties of subdifferentials and stochastic processes; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption U is strongly convex
Invoked to obtain contraction and ergodicity rates for both continuous and discrete dynamics.
standard math Subdifferential of U exists and satisfies standard convex analysis properties
Required to define the subgradient Langevin dynamics when U is non-differentiable.

pith-pipeline@v0.9.0 · 5691 in / 1461 out tokens · 41963 ms · 2026-05-25T08:34:41.164026+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

doi: 10.1070/SM2002v193n07ABEH000665. F . Bolley, I. Gentil, and A. Guillin. Convergence to equilibrium in Wasserstein distance for Fokker– Planck equations. Journal of Functional Analysis, 263(8):2430–2457,

work page doi:10.1070/sm2002v193n07abeh000665
[2]

Burger, M

M. Burger, M. J. Ehrhardt, L. Kuger, and L. Weigand. Coupling analysis of the asymptotic behaviour of a primal-dual Langevin algorithm. arXiv preprint arXiv:2405.18098,

work page arXiv
[3]

doi: 10.1007/s10851-010-0251-1. Y. Chen, S. Chewi, A. Salim, and A. Wibisono. Improved analysis for a proximal algorithm for sampling. In Proceedings of Thirty Fifth Conference on Learning Theory , volume 178 of Pro- ceedings of Machine Learning Research , pages 2984–3014. PMLR, 02–05 Jul

work page doi:10.1007/s10851-010-0251-1
[4]

doi: https://doi.org/10.1016/j.media.2022.102479

ISSN 1361-8415. doi: https://doi.org/10.1016/j.media.2022.102479. A. S. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave dens- ities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676,

work page doi:10.1016/j.media.2022.102479 2022
[5]

M. J. Ehrhardt, L. Kuger, and C.-B. Schönlieb. Proximal Langevin sampling with inexact proximal mapping. arXiv preprint arXiv:2306.17737,

work page arXiv
[6]

doi: 10.3150/bj/1066223276. L. Hodgkinson, R. Salomone, and F . Roosta. Implicit Langevin algorithms for sampling from log- concave densities. Journal of Machine Learning Research, 22(136):1–30,

work page doi:10.3150/bj/1066223276
[7]

doi: 10.1080/10618600.2020. 1811105. J. Liang and Y. Chen. A proximal algorithm for sampling from non-smooth potentials. In2022 Winter Simulation Conference (WSC), pages 3229–3240,

work page doi:10.1080/10618600.2020 2020
[8]

doi: 10.1109/WSC57314.2022.10015293. G. Luo, M. Blumenthal, M. Heide, and M. Uecker. Bayesian mri reconstruction with joint uncertainty estimation using diffusion models. Magnetic Resonance in Medicine , 90(1):295–311,

work page doi:10.1109/wsc57314.2022.10015293 2022
[9]

doi: https://doi.org/10.1002/mrm.29624. T . D. Luu, J. Fadili, and C. Chesneau. Sampling from non-smooth distributions through Langevin diffusion. Methodology and Computing in Applied Probability , 23(4):1173–1201,

work page doi:10.1002/mrm.29624
[10]

URL https://proceedings.neurips.cc/paper_files/paper/ 2020/file/2779fda014fbadb761f67dd708c1325e-Paper.pdf. Y. Song, J. Sohl-Dickstein, D. P . Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2020
[11]

doi: 10.1109/TSP .2019.2894825. M. Vono, N. Dobigeon, and P . Chainais. Asymptotically exact data augmentation: Models, properties, and algorithms. Journal of Computational and Graphical Statistics , 30(2):335–348,

work page doi:10.1109/tsp 2019
[12]

1080/10618600.2020.1826954. M. Vono, D. Paulin, and A. Doucet. Efficient MCMC sampling with dimension-free convergence rate using ADMM-type splitting. J. Mach. Learn. Res., 23(1), jan

work page arXiv 2020
[13]

M. Zach, E. Kobler, and T . Pock. Computed tomography reconstruction using generative energy- based priors. arXiv preprint arXiv:2203.12658,

work page arXiv
[14]

doi: 10.1109/TMI.2023.3311345. 26

work page doi:10.1109/tmi.2023.3311345 2023

[1] [1]

doi: 10.1070/SM2002v193n07ABEH000665. F . Bolley, I. Gentil, and A. Guillin. Convergence to equilibrium in Wasserstein distance for Fokker– Planck equations. Journal of Functional Analysis, 263(8):2430–2457,

work page doi:10.1070/sm2002v193n07abeh000665

[2] [2]

Burger, M

M. Burger, M. J. Ehrhardt, L. Kuger, and L. Weigand. Coupling analysis of the asymptotic behaviour of a primal-dual Langevin algorithm. arXiv preprint arXiv:2405.18098,

work page arXiv

[3] [3]

doi: 10.1007/s10851-010-0251-1. Y. Chen, S. Chewi, A. Salim, and A. Wibisono. Improved analysis for a proximal algorithm for sampling. In Proceedings of Thirty Fifth Conference on Learning Theory , volume 178 of Pro- ceedings of Machine Learning Research , pages 2984–3014. PMLR, 02–05 Jul

work page doi:10.1007/s10851-010-0251-1

[4] [4]

doi: https://doi.org/10.1016/j.media.2022.102479

ISSN 1361-8415. doi: https://doi.org/10.1016/j.media.2022.102479. A. S. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave dens- ities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676,

work page doi:10.1016/j.media.2022.102479 2022

[5] [5]

M. J. Ehrhardt, L. Kuger, and C.-B. Schönlieb. Proximal Langevin sampling with inexact proximal mapping. arXiv preprint arXiv:2306.17737,

work page arXiv

[6] [6]

doi: 10.3150/bj/1066223276. L. Hodgkinson, R. Salomone, and F . Roosta. Implicit Langevin algorithms for sampling from log- concave densities. Journal of Machine Learning Research, 22(136):1–30,

work page doi:10.3150/bj/1066223276

[7] [7]

doi: 10.1080/10618600.2020. 1811105. J. Liang and Y. Chen. A proximal algorithm for sampling from non-smooth potentials. In2022 Winter Simulation Conference (WSC), pages 3229–3240,

work page doi:10.1080/10618600.2020 2020

[8] [8]

doi: 10.1109/WSC57314.2022.10015293. G. Luo, M. Blumenthal, M. Heide, and M. Uecker. Bayesian mri reconstruction with joint uncertainty estimation using diffusion models. Magnetic Resonance in Medicine , 90(1):295–311,

work page doi:10.1109/wsc57314.2022.10015293 2022

[9] [9]

doi: https://doi.org/10.1002/mrm.29624. T . D. Luu, J. Fadili, and C. Chesneau. Sampling from non-smooth distributions through Langevin diffusion. Methodology and Computing in Applied Probability , 23(4):1173–1201,

work page doi:10.1002/mrm.29624

[10] [10]

URL https://proceedings.neurips.cc/paper_files/paper/ 2020/file/2779fda014fbadb761f67dd708c1325e-Paper.pdf. Y. Song, J. Sohl-Dickstein, D. P . Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2020

[11] [11]

doi: 10.1109/TSP .2019.2894825. M. Vono, N. Dobigeon, and P . Chainais. Asymptotically exact data augmentation: Models, properties, and algorithms. Journal of Computational and Graphical Statistics , 30(2):335–348,

work page doi:10.1109/tsp 2019

[12] [12]

1080/10618600.2020.1826954. M. Vono, D. Paulin, and A. Doucet. Efficient MCMC sampling with dimension-free convergence rate using ADMM-type splitting. J. Mach. Learn. Res., 23(1), jan

work page arXiv 2020

[13] [13]

M. Zach, E. Kobler, and T . Pock. Computed tomography reconstruction using generative energy- based priors. arXiv preprint arXiv:2203.12658,

work page arXiv

[14] [14]

doi: 10.1109/TMI.2023.3311345. 26

work page doi:10.1109/tmi.2023.3311345 2023