Speeding Up Nonsmooth Bayesian MCMC Sampling via Inexact Proximal Unadjusted Langevin Algorithm

Alireza Kabgani; Masoud Ahookhosh; Susan Ghaderi; Yves Moreau

arxiv: 2605.17306 · v1 · pith:OP6HBND7new · submitted 2026-05-17 · 🧮 math.OC

Speeding Up Nonsmooth Bayesian MCMC Sampling via Inexact Proximal Unadjusted Langevin Algorithm

Susan Ghaderi , Alireza Kabgani , Yves Moreau , Masoud Ahookhosh This is my paper

Pith reviewed 2026-05-19 22:48 UTC · model grok-4.3

classification 🧮 math.OC

keywords nonsmooth Bayesian samplingproximal Langevin algorithminexact proximal methodsMCMC samplingMoreau envelopecomposite potentialsmedical image reconstructionnon-asymptotic convergence

0 comments

The pith

An inexact proximal unadjusted Langevin algorithm samples from nonsmooth composite posteriors by using controlled approximations in place of exact proximal operators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces iPULA to sample from posterior distributions with nonsmooth composite potentials where exact proximal operators are unavailable. It replaces exact proximal steps with approximations that are controlled through the Moreau envelope smoothing of the potential. Non-asymptotic convergence guarantees are established that explicitly bound the sampling error in terms of the inexactness level and show that convergence rates are preserved up to a quantifiable bias. This matters for practical Bayesian inference tasks such as medical image reconstruction, where closed-form proximal operators do not exist. Experiments on image reconstruction tasks confirm that the method works effectively and aligns with the derived error bounds.

Core claim

The central claim is that iPULA achieves non-asymptotic convergence for sampling from posteriors with nonsmooth composite potentials by smoothing via the Moreau envelope and replacing exact proximal steps with inexact computations, with the sampling error bounded explicitly by a term that grows with the level of inexactness while the overall convergence rate order remains unchanged up to this added bias.

What carries the argument

The iPULA iteration that applies the Moreau envelope to smooth the composite potential and then uses inexact proximal evaluations to approximate the required gradient steps inside an unadjusted Langevin update.

If this is right

The total sampling error decomposes into the usual discretization and bias terms plus an additive term controlled by the proximal approximation tolerance.
The same convergence order as the exact proximal unadjusted Langevin algorithm is retained provided the inexactness tolerance is chosen to decay appropriately with the step size.
The method extends the practical reach of proximal Langevin sampling to composite potentials whose proximal operators lack closed forms, such as those arising in medical imaging.
If the inexactness level is held below a threshold proportional to the step size, the stationary distribution remains close to the true posterior in Wasserstein distance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same controlled-inexactness strategy could be transferred to other proximal MCMC schemes such as proximal MALA or proximal Hamiltonian Monte Carlo.
In very high dimensions the computational saving from loose proximal tolerances may outweigh the added bias, but this trade-off remains to be quantified.
The framework suggests a route to adaptive step-size and tolerance schedules that balance bias and variance without requiring exact proximal oracles.

Load-bearing premise

The inexact proximal computations can be carried out with enough accuracy control that the resulting bias stays measurable and does not erase the convergence rate.

What would settle it

Implement iPULA on a test problem with known exact proximal operator, deliberately vary the accuracy of the proximal approximation across runs, and check whether the observed distance to the target posterior grows exactly as predicted by the bias term in the non-asymptotic bound.

Figures

Figures reproduced from arXiv: 2605.17306 by Alireza Kabgani, Masoud Ahookhosh, Susan Ghaderi, Yves Moreau.

**Figure 2.** Figure 2: Convergence and sampling behavior of iPULA, Grad-sub, Prox-sub, and MYULA on the [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

read the original abstract

We study sampling from posterior distributions with nonsmooth composite potentials, a setting in which proximal-based Langevin methods are theoretically appealing but in practice limited to simple functions with closed-form proximal operators. We introduce iPULA for composite potentials, an inexact proximal unadjusted Langevin algorithm that replaces exact proximal steps with controlled approximations. Our approach leverages the Moreau envelope to smooth the potential, while allowing inexact evaluation of its gradient through inexact proximal computations. We establish non-asymptotic convergence guarantees for iPULA, explicitly characterizing the impact of inexactness on the sampling error and showing that the inexactness preserves convergence rates up to a quantifiable bias. We demonstrate the practical relevance of iPULA on a medical image reconstruction task, where proximal operators cannot be computed exactly. Experiments demonstrate the effectiveness of iPULA and support our theoretical results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an inexact proximal ULA that tracks bias from approximate proximal steps in nonsmooth composite sampling, with non-asymptotic bounds and an imaging demo.

read the letter

This paper introduces iPULA to run unadjusted Langevin sampling on posteriors with nonsmooth composite potentials when exact proximal operators are unavailable. It smooths via the Moreau envelope and substitutes controlled inexact proximal evaluations, then derives non-asymptotic convergence guarantees that quantify the extra bias from those approximations. The medical imaging reconstruction example shows the method in a setting where closed-form proxes do not exist. That combination of explicit bias tracking and a realistic application is the concrete advance over earlier exact proximal Langevin results. The bounds appear to preserve the usual rates up to an additive term that depends on the inexactness level, which is useful if the error can be driven down reliably. The experiments support that the sampler produces usable reconstructions, at least on the reported task. The soft spot is the requirement that proximal errors decay fast enough for the accumulated bias to stay controlled. The stress-test note flags that any fixed or slowly decaying tolerance could inject a persistent perturbation whose sum over steps may exceed the target 1/sqrt(T) or 1/T rate, especially when the Moreau gradient interacts with the ULA discretization on composite potentials. Without the full proof it is unclear whether the paper supplies an extra assumption on the error sequence that practical solvers automatically satisfy. The citation pattern looks standard and the central claim does not collapse into fitted parameters. This is for people working on MCMC for inverse problems and imaging who need to move beyond closed-form proxes. A reader focused on proximal sampling methods would get value from the bias analysis and the application, even if more validation is needed. It deserves a serious referee because the algorithmic idea and the non-asymptotic treatment of inexactness are grounded enough to warrant detailed review.

Referee Report

2 major / 2 minor

Summary. The paper introduces iPULA, an inexact proximal unadjusted Langevin algorithm for sampling from posterior distributions with nonsmooth composite potentials. It replaces exact proximal steps with controlled approximations via the Moreau envelope, derives non-asymptotic convergence guarantees that characterize the sampling error impact of inexactness as a quantifiable bias while preserving rates, and demonstrates the method on a medical image reconstruction task where exact proximal operators are unavailable.

Significance. If the non-asymptotic bounds hold, the work supplies a practical and theoretically supported route to MCMC sampling for composite nonsmooth potentials that arise in imaging and inverse problems, where exact proximal maps are intractable. The explicit bias characterization from inexact proximal evaluations is a useful contribution that lets users trade computational effort against convergence without losing the underlying 1/sqrt(T) or 1/T scaling.

major comments (2)

[§4.2, Theorem 4.1] §4.2, Theorem 4.1 (and the supporting Lemma 4.3): the non-asymptotic Wasserstein bound claims that inexact proximal errors contribute only an additive bias term whose sum remains controlled. However, the propagation step from the per-iteration proximal error ε_k through the Moreau-gradient perturbation into the ULA contraction appears to require that ∑ ε_k < ∞ (or a comparable summability condition). For composite potentials this assumption is not automatically inherited from standard practical solvers; a fixed or slowly decaying tolerance would inject a persistent perturbation that can dominate the target rate. Please supply the explicit error-recursion inequality and the minimal decay condition on {ε_k} that makes the bias term o(1/sqrt(T)).
[§3.2, Eq. (12)–(14)] §3.2, Eq. (12)–(14): the definition of the inexact proximal map and its relation to the Moreau envelope gradient is used to replace the exact proximal step. The subsequent analysis treats the resulting gradient error as an additive perturbation whose magnitude is bounded by ε_k, but the Lipschitz constant of the Moreau envelope depends on the smoothing parameter λ. It is unclear whether the stated bound remains uniform when λ is chosen adaptively or when the composite structure (smooth + nonsmooth) makes the effective Lipschitz constant grow with dimension. Clarify the dependence of the bias term on λ and on the composite splitting.

minor comments (2)

[§3 and §4] Notation for the proximal error tolerance is introduced in §3 but reused with different symbols in the convergence statements of §4; a single consistent symbol (e.g., ε_k throughout) would improve readability.
[Figure 2] Figure 2 caption states that the plotted curves correspond to “different inexactness levels,” yet the legend only labels the exact-proximal baseline; adding the numerical values of the tolerance sequence used for each curve would make the experiment reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments have prompted us to clarify key aspects of the convergence analysis. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (and the supporting Lemma 4.3): the non-asymptotic Wasserstein bound claims that inexact proximal errors contribute only an additive bias term whose sum remains controlled. However, the propagation step from the per-iteration proximal error ε_k through the Moreau-gradient perturbation into the ULA contraction appears to require that ∑ ε_k < ∞ (or a comparable summability condition). For composite potentials this assumption is not automatically inherited from standard practical solvers; a fixed or slowly decaying tolerance would inject a persistent perturbation that can dominate the target rate. Please supply the explicit error-recursion inequality and the minimal decay condition on {ε_k} that makes the bias term o(1/sqrt(T)).

Authors: We appreciate this observation on the error propagation. The proof of Theorem 4.1 proceeds from the one-step recursion W_{k+1} ≤ (1 - c η) W_k + O(η²) + C ε_k, where the contraction factor arises from the strong convexity and smoothness properties after Moreau smoothing. Unrolling the recursion produces a bias contribution bounded by sum_{k=1}^T (1 - c η)^{T-k} ε_k. With the standard choice η ∼ 1/√T that yields the target 1/√T rate, the accumulated bias is o(1/√T) whenever ε_k = O(k^{-1-δ}) for any δ > 0 (or, more weakly, whenever the Cesàro mean of the weighted ε_k vanishes at the required rate). We have inserted the explicit recursion as Lemma 4.4 and added the minimal decay condition to the statement of Theorem 4.1. This schedule is attainable by standard first-order solvers for the proximal subproblems with a tolerance that decreases polynomially in the outer iteration index. revision: yes
Referee: [§3.2, Eq. (12)–(14)] §3.2, Eq. (12)–(14): the definition of the inexact proximal map and its relation to the Moreau envelope gradient is used to replace the exact proximal step. The subsequent analysis treats the resulting gradient error as an additive perturbation whose magnitude is bounded by ε_k, but the Lipschitz constant of the Moreau envelope depends on the smoothing parameter λ. It is unclear whether the stated bound remains uniform when λ is chosen adaptively or when the composite structure (smooth + nonsmooth) makes the effective Lipschitz constant grow with dimension. Clarify the dependence of the bias term on λ and on the composite splitting.

Authors: The gradient of the Moreau envelope of any proper lower-semicontinuous function is always 1/λ-Lipschitz, independently of dimension and of whether the original potential is composite. In our composite setting the proximal step is applied solely to the nonsmooth summand; the resulting gradient perturbation is controlled by ε_k and enters the analysis as an additive term whose size is independent of the Lipschitz constant of the smooth summand. The overall bias in the Wasserstein bound scales with λ through the standard Moreau approximation error O(λ Lip(f_nonsmooth)). We have expanded the discussion following Eq. (14) to state these facts explicitly, to note that λ may be held fixed or adapted slowly without altering the asymptotic rate, and to confirm uniformity with respect to dimension under the standing assumptions on the potential. revision: yes

Circularity Check

0 steps flagged

Convergence analysis for iPULA derives independent bounds on inexactness bias without reducing to self-definition or fitted inputs

full rationale

The paper introduces iPULA as an algorithmic construction that replaces exact proximal steps with controlled approximations via the Moreau envelope, then derives non-asymptotic convergence guarantees by propagating per-step proximal errors through Wasserstein or KL contractions. This derivation relies on standard assumptions about error decay (e.g., summable tolerances) and standard smoothing properties of the Moreau envelope; these are not shown to be equivalent to the target rates by the paper's own equations. No self-citation is load-bearing for the central claim, and the analysis does not rename known results or smuggle ansatzes. The result remains self-contained against external benchmarks for proximal Langevin methods.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on standard domain assumptions from nonsmooth optimization and sampling theory; no free parameters or new invented entities are introduced beyond the algorithm itself.

axioms (1)

domain assumption The composite potential admits a proximal operator that can be approximated to controllable accuracy.
Invoked to justify replacing exact proximal steps with inexact ones while retaining convergence.

invented entities (1)

iPULA (inexact proximal unadjusted Langevin algorithm) no independent evidence
purpose: To enable sampling from nonsmooth composite posteriors when exact proximal operators are unavailable.
New algorithmic construction presented in the paper.

pith-pipeline@v0.9.0 · 5687 in / 1301 out tokens · 25211 ms · 2026-05-19T22:48:11.619405+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce iPULA … leverages the Moreau envelope to smooth the potential, while allowing inexact evaluation of its gradient through inexact proximal computations … non-asymptotic convergence guarantees … impact of inexactness on the sampling error
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

U_γ(x) := inf_y U(y) + ½γ‖x−y‖² … ∇U_γ(x) = 1/γ (x − Prox_γU(x))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Ahookhosh, A

M. Ahookhosh, A. Themelis, and P. Patrinos. A Bregman forward-backward linesearch algo- rithm for nonconvex composite optimization: Superlinear convergence to nonisolated local min- ima.SIAM Journal on Optimization, 31(1):653–685, 2021

work page 2021
[2]

H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2nd edition, 2017

work page 2017
[3]

Brooks, A

S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, editors.Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, 2011

work page 2011
[4]

N. S. Chatterji, J. Diakonikolas, M. I. Jordan, and P. L. Bartlett. Langevin Monte Carlo without smoothness. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machine Learning Research, pages 1716–1726, 2020

work page 2020
[5]

F. R. Crucinio, A. Durmus, P. Jim´ enez, and G. O. Roberts. Optimal scaling results for Moreau- Yosida Metropolis-adjusted Langevin algorithms.Bernoulli, 31(3):1889–1907, 2025

work page 1907
[6]

A. S. Dalalyan. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InProceedings of the 2017 Conference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 678–689, 2017

work page 2017
[7]

A. S. Dalalyan and A. G. Karagulyan. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019

work page 2019
[8]

A. S. Dalalyan, A. Karagulyan, and L. Riou-Durand. Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets.Journal of Machine Learning Research, 23(235):1– 38, 2022

work page 2022
[9]

Davis, D

D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette. Subgradient methods for sharp weakly convex functions.Journal of Optimization Theory and Applications, 179(3):962–982, 2018

work page 2018
[10]

K. A. Dubey, S. J. Reddi, S. A. Williamson, B. P´ oczos, A. J. Smola, and E. P. Xing. Variance reduction in stochastic gradient Langevin dynamics. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[11]

Durmus and E

A. Durmus and E. Moulines. High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019

work page 2019
[12]

Durmus, E

A. Durmus, E. Moulines, and M. Pereyra. Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau.SIAM Journal on Imaging Sciences, 11(1):473–506, 2018

work page 2018
[13]

Eftekhari, L

A. Eftekhari, L. Vargas, and K. C. Zygalakis. The forward-backward envelope for sampling with the overdamped Langevin algorithm.Statistics and Computing, 33(4):85, 2023

work page 2023
[14]

Ghaderi, M

S. Ghaderi, M. Ahookhosh, A. Arany, A. Skupin, P. Patrinos, and Y. Moreau. Smoothing unad- justed Langevin algorithms for nonsmooth composite potential functions.Applied Mathematics and Computation, 464:128377, 2024. 18

work page 2024
[15]

Habring, M

A. Habring, M. Holler, and T. Pock. Subgradient Langevin methods for sampling from nonsmooth potentials.SIAM Journal on Mathematics of Data Science, 6(4):897–925, 2024

work page 2024
[16]

F. Han, S. Osher, and W. Li. Splitting regularized Wasserstein proximal algorithms for nonsmooth sampling problems.arXiv preprint arXiv:2502.16773, 2025

work page arXiv 2025
[17]

Johnston, I

T. Johnston, I. Lytras, N. Makras, and S. Sabanis. The performance of the unadjusted Langevin algorithm without smoothness assumptions.arXiv preprint arXiv:2502.03458, 2025

work page arXiv 2025
[18]

Kabgani and M

A. Kabgani and M. Ahookhosh. Moreau envelope and proximal-point methods under the lens of high-order regularization.Set-Valued and Variational Analysis, 33:47, 2025

work page 2025
[19]

Kabgani and M

A. Kabgani and M. Ahookhosh. ItsOPT: An inexact two-level smoothing framework for nonconvex optimization via high-order Moreau envelope.arXiv preprint arXiv:2410.19928, 2024

work page arXiv 2024
[20]

Kabgani and M

A. Kabgani and M. Ahookhosh. ItsDEAL: Inexact two-level smoothing descent algorithms for weakly convex optimization.arXiv preprint arXiv:2501.02155, 2025

work page arXiv 2025
[21]

Klatzer, P

T. Klatzer, P. Dobson, Y. Altmann, M. Pereyra, J. M. Sanz-Serna, and K. C. Zygalakis. Acceler- ated Bayesian imaging by relaxed proximal-point Langevin sampling.SIAM Journal on Imaging Sciences, 17(2):1078–1117, 2024

work page 2024
[22]

T. T.-K. Lau and H. Liu. Bregman proximal Langevin Monte Carlo via Bregman–Moreau en- velopes. InProceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 12049–12077, 2022

work page 2022
[23]

T. T.-K. Lau, H. Liu, and T. Pock. Non-log-concave and nonsmooth sampling via Langevin Monte Carlo algorithms. InAdvanced Techniques in Optimization for Machine Learning and Imaging, volume 61 ofSpringer INdAM Series, pages 83–149. Springer, 2024

work page 2024
[24]

Y.-A. Ma, T. Chen, and E. B. Fox. A complete recipe for stochastic gradient MCMC. InAdvances in Neural Information Processing Systems, volume 28, 2015

work page 2015
[25]

Mordukhovich and N.M

B.S. Mordukhovich and N.M. Nam. An Easy Path to Convex Analysis and Applications.Williston: Morgan & Claypool, 2014

work page 2014
[26]

J. J. Moreau. Propri´ et´ es des applications “prox”.Comptes Rendus Hebdomadaires des S´ eances de l’Acad´ emie des Sciences, 256:1069–1071, 1963

work page 1963
[27]

Nesterov and V

Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017

work page 2017
[28]

M. Pereyra. Proximal Markov chain Monte Carlo algorithms.Statistics and Computing, 26(4):745– 760, 2016

work page 2016
[29]

Planiden and X

C. Planiden and X. Wang. Strongly convex functions, Moreau envelopes, and the generic nature of convex functions with strong minimizers.SIAM Journal on Optimization, 26(2):1341–1364, 2016

work page 2016
[30]

Rahimi, S

M. Rahimi, S. Ghaderi, Y. Moreau, and M. Ahookhosh. Projected subgradient methods for paraconvex optimization: Application to robust low-rank matrix recovery.arXiv preprint arXiv:2501.00427, 2024

work page arXiv 2024
[31]

C. P. Robert and G. Casella.Monte Carlo Statistical Methods. Springer, 2nd edition, 2004

work page 2004
[32]

Salim, D

A. Salim, D. Kovalev, and P. Richt´ arik. Stochastic proximal Langevin algorithm: Potential split- ting and nonasymptotic rates. InAdvances in Neural Information Processing Systems, volume 32, 2019. 19

work page 2019
[33]

Shukla, D

A. Shukla, D. Vats, and E. C. Chi. MCMC importance sampling via Moreau–Yosida envelopes. arXiv preprint arXiv:2501.02228, 2025

work page arXiv 2025
[34]

Stella, A

L. Stella, A. Themelis, and P. Patrinos. Forward-backward quasi-Newton methods for nonsmooth optimization problems.Computational Optimization and Applications, 67(3):443–487, 2017

work page 2017
[35]

Themelis, M

A. Themelis, M. Ahookhosh, and P. Patrinos. On the acceleration of forward-backward splitting via an inexact Newton method. InSplitting Algorithms, Modern Operator Theory, and Applica- tions, pages 363–412. Springer, 2019. 20

work page 2019

[1] [1]

Ahookhosh, A

M. Ahookhosh, A. Themelis, and P. Patrinos. A Bregman forward-backward linesearch algo- rithm for nonconvex composite optimization: Superlinear convergence to nonisolated local min- ima.SIAM Journal on Optimization, 31(1):653–685, 2021

work page 2021

[2] [2]

H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2nd edition, 2017

work page 2017

[3] [3]

Brooks, A

S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, editors.Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, 2011

work page 2011

[4] [4]

N. S. Chatterji, J. Diakonikolas, M. I. Jordan, and P. L. Bartlett. Langevin Monte Carlo without smoothness. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machine Learning Research, pages 1716–1726, 2020

work page 2020

[5] [5]

F. R. Crucinio, A. Durmus, P. Jim´ enez, and G. O. Roberts. Optimal scaling results for Moreau- Yosida Metropolis-adjusted Langevin algorithms.Bernoulli, 31(3):1889–1907, 2025

work page 1907

[6] [6]

A. S. Dalalyan. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InProceedings of the 2017 Conference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 678–689, 2017

work page 2017

[7] [7]

A. S. Dalalyan and A. G. Karagulyan. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019

work page 2019

[8] [8]

A. S. Dalalyan, A. Karagulyan, and L. Riou-Durand. Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets.Journal of Machine Learning Research, 23(235):1– 38, 2022

work page 2022

[9] [9]

Davis, D

D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette. Subgradient methods for sharp weakly convex functions.Journal of Optimization Theory and Applications, 179(3):962–982, 2018

work page 2018

[10] [10]

K. A. Dubey, S. J. Reddi, S. A. Williamson, B. P´ oczos, A. J. Smola, and E. P. Xing. Variance reduction in stochastic gradient Langevin dynamics. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[11] [11]

Durmus and E

A. Durmus and E. Moulines. High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019

work page 2019

[12] [12]

Durmus, E

A. Durmus, E. Moulines, and M. Pereyra. Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau.SIAM Journal on Imaging Sciences, 11(1):473–506, 2018

work page 2018

[13] [13]

Eftekhari, L

A. Eftekhari, L. Vargas, and K. C. Zygalakis. The forward-backward envelope for sampling with the overdamped Langevin algorithm.Statistics and Computing, 33(4):85, 2023

work page 2023

[14] [14]

Ghaderi, M

S. Ghaderi, M. Ahookhosh, A. Arany, A. Skupin, P. Patrinos, and Y. Moreau. Smoothing unad- justed Langevin algorithms for nonsmooth composite potential functions.Applied Mathematics and Computation, 464:128377, 2024. 18

work page 2024

[15] [15]

Habring, M

A. Habring, M. Holler, and T. Pock. Subgradient Langevin methods for sampling from nonsmooth potentials.SIAM Journal on Mathematics of Data Science, 6(4):897–925, 2024

work page 2024

[16] [16]

F. Han, S. Osher, and W. Li. Splitting regularized Wasserstein proximal algorithms for nonsmooth sampling problems.arXiv preprint arXiv:2502.16773, 2025

work page arXiv 2025

[17] [17]

Johnston, I

T. Johnston, I. Lytras, N. Makras, and S. Sabanis. The performance of the unadjusted Langevin algorithm without smoothness assumptions.arXiv preprint arXiv:2502.03458, 2025

work page arXiv 2025

[18] [18]

Kabgani and M

A. Kabgani and M. Ahookhosh. Moreau envelope and proximal-point methods under the lens of high-order regularization.Set-Valued and Variational Analysis, 33:47, 2025

work page 2025

[19] [19]

Kabgani and M

A. Kabgani and M. Ahookhosh. ItsOPT: An inexact two-level smoothing framework for nonconvex optimization via high-order Moreau envelope.arXiv preprint arXiv:2410.19928, 2024

work page arXiv 2024

[20] [20]

Kabgani and M

A. Kabgani and M. Ahookhosh. ItsDEAL: Inexact two-level smoothing descent algorithms for weakly convex optimization.arXiv preprint arXiv:2501.02155, 2025

work page arXiv 2025

[21] [21]

Klatzer, P

T. Klatzer, P. Dobson, Y. Altmann, M. Pereyra, J. M. Sanz-Serna, and K. C. Zygalakis. Acceler- ated Bayesian imaging by relaxed proximal-point Langevin sampling.SIAM Journal on Imaging Sciences, 17(2):1078–1117, 2024

work page 2024

[22] [22]

T. T.-K. Lau and H. Liu. Bregman proximal Langevin Monte Carlo via Bregman–Moreau en- velopes. InProceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 12049–12077, 2022

work page 2022

[23] [23]

T. T.-K. Lau, H. Liu, and T. Pock. Non-log-concave and nonsmooth sampling via Langevin Monte Carlo algorithms. InAdvanced Techniques in Optimization for Machine Learning and Imaging, volume 61 ofSpringer INdAM Series, pages 83–149. Springer, 2024

work page 2024

[24] [24]

Y.-A. Ma, T. Chen, and E. B. Fox. A complete recipe for stochastic gradient MCMC. InAdvances in Neural Information Processing Systems, volume 28, 2015

work page 2015

[25] [25]

Mordukhovich and N.M

B.S. Mordukhovich and N.M. Nam. An Easy Path to Convex Analysis and Applications.Williston: Morgan & Claypool, 2014

work page 2014

[26] [26]

J. J. Moreau. Propri´ et´ es des applications “prox”.Comptes Rendus Hebdomadaires des S´ eances de l’Acad´ emie des Sciences, 256:1069–1071, 1963

work page 1963

[27] [27]

Nesterov and V

Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017

work page 2017

[28] [28]

M. Pereyra. Proximal Markov chain Monte Carlo algorithms.Statistics and Computing, 26(4):745– 760, 2016

work page 2016

[29] [29]

Planiden and X

C. Planiden and X. Wang. Strongly convex functions, Moreau envelopes, and the generic nature of convex functions with strong minimizers.SIAM Journal on Optimization, 26(2):1341–1364, 2016

work page 2016

[30] [30]

Rahimi, S

M. Rahimi, S. Ghaderi, Y. Moreau, and M. Ahookhosh. Projected subgradient methods for paraconvex optimization: Application to robust low-rank matrix recovery.arXiv preprint arXiv:2501.00427, 2024

work page arXiv 2024

[31] [31]

C. P. Robert and G. Casella.Monte Carlo Statistical Methods. Springer, 2nd edition, 2004

work page 2004

[32] [32]

Salim, D

A. Salim, D. Kovalev, and P. Richt´ arik. Stochastic proximal Langevin algorithm: Potential split- ting and nonasymptotic rates. InAdvances in Neural Information Processing Systems, volume 32, 2019. 19

work page 2019

[33] [33]

Shukla, D

A. Shukla, D. Vats, and E. C. Chi. MCMC importance sampling via Moreau–Yosida envelopes. arXiv preprint arXiv:2501.02228, 2025

work page arXiv 2025

[34] [34]

Stella, A

L. Stella, A. Themelis, and P. Patrinos. Forward-backward quasi-Newton methods for nonsmooth optimization problems.Computational Optimization and Applications, 67(3):443–487, 2017

work page 2017

[35] [35]

Themelis, M

A. Themelis, M. Ahookhosh, and P. Patrinos. On the acceleration of forward-backward splitting via an inexact Newton method. InSplitting Algorithms, Modern Operator Theory, and Applica- tions, pages 363–412. Springer, 2019. 20

work page 2019