pith. sign in

arxiv: 2605.17306 · v1 · pith:OP6HBND7new · submitted 2026-05-17 · 🧮 math.OC

Speeding Up Nonsmooth Bayesian MCMC Sampling via Inexact Proximal Unadjusted Langevin Algorithm

Pith reviewed 2026-05-19 22:48 UTC · model grok-4.3

classification 🧮 math.OC
keywords nonsmooth Bayesian samplingproximal Langevin algorithminexact proximal methodsMCMC samplingMoreau envelopecomposite potentialsmedical image reconstructionnon-asymptotic convergence
0
0 comments X

The pith

An inexact proximal unadjusted Langevin algorithm samples from nonsmooth composite posteriors by using controlled approximations in place of exact proximal operators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces iPULA to sample from posterior distributions with nonsmooth composite potentials where exact proximal operators are unavailable. It replaces exact proximal steps with approximations that are controlled through the Moreau envelope smoothing of the potential. Non-asymptotic convergence guarantees are established that explicitly bound the sampling error in terms of the inexactness level and show that convergence rates are preserved up to a quantifiable bias. This matters for practical Bayesian inference tasks such as medical image reconstruction, where closed-form proximal operators do not exist. Experiments on image reconstruction tasks confirm that the method works effectively and aligns with the derived error bounds.

Core claim

The central claim is that iPULA achieves non-asymptotic convergence for sampling from posteriors with nonsmooth composite potentials by smoothing via the Moreau envelope and replacing exact proximal steps with inexact computations, with the sampling error bounded explicitly by a term that grows with the level of inexactness while the overall convergence rate order remains unchanged up to this added bias.

What carries the argument

The iPULA iteration that applies the Moreau envelope to smooth the composite potential and then uses inexact proximal evaluations to approximate the required gradient steps inside an unadjusted Langevin update.

If this is right

  • The total sampling error decomposes into the usual discretization and bias terms plus an additive term controlled by the proximal approximation tolerance.
  • The same convergence order as the exact proximal unadjusted Langevin algorithm is retained provided the inexactness tolerance is chosen to decay appropriately with the step size.
  • The method extends the practical reach of proximal Langevin sampling to composite potentials whose proximal operators lack closed forms, such as those arising in medical imaging.
  • If the inexactness level is held below a threshold proportional to the step size, the stationary distribution remains close to the true posterior in Wasserstein distance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same controlled-inexactness strategy could be transferred to other proximal MCMC schemes such as proximal MALA or proximal Hamiltonian Monte Carlo.
  • In very high dimensions the computational saving from loose proximal tolerances may outweigh the added bias, but this trade-off remains to be quantified.
  • The framework suggests a route to adaptive step-size and tolerance schedules that balance bias and variance without requiring exact proximal oracles.

Load-bearing premise

The inexact proximal computations can be carried out with enough accuracy control that the resulting bias stays measurable and does not erase the convergence rate.

What would settle it

Implement iPULA on a test problem with known exact proximal operator, deliberately vary the accuracy of the proximal approximation across runs, and check whether the observed distance to the target posterior grows exactly as predicted by the bias term in the non-asymptotic bound.

Figures

Figures reproduced from arXiv: 2605.17306 by Alireza Kabgani, Masoud Ahookhosh, Susan Ghaderi, Yves Moreau.

Figure 1
Figure 1. Figure 1: Brain MRI deblurring results for MYULA, iPULA, Grad-sub, and Prox-sub. [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Convergence and sampling behavior of iPULA, Grad-sub, Prox-sub, and MYULA on the [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
read the original abstract

We study sampling from posterior distributions with nonsmooth composite potentials, a setting in which proximal-based Langevin methods are theoretically appealing but in practice limited to simple functions with closed-form proximal operators. We introduce iPULA for composite potentials, an inexact proximal unadjusted Langevin algorithm that replaces exact proximal steps with controlled approximations. Our approach leverages the Moreau envelope to smooth the potential, while allowing inexact evaluation of its gradient through inexact proximal computations. We establish non-asymptotic convergence guarantees for iPULA, explicitly characterizing the impact of inexactness on the sampling error and showing that the inexactness preserves convergence rates up to a quantifiable bias. We demonstrate the practical relevance of iPULA on a medical image reconstruction task, where proximal operators cannot be computed exactly. Experiments demonstrate the effectiveness of iPULA and support our theoretical results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces iPULA, an inexact proximal unadjusted Langevin algorithm for sampling from posterior distributions with nonsmooth composite potentials. It replaces exact proximal steps with controlled approximations via the Moreau envelope, derives non-asymptotic convergence guarantees that characterize the sampling error impact of inexactness as a quantifiable bias while preserving rates, and demonstrates the method on a medical image reconstruction task where exact proximal operators are unavailable.

Significance. If the non-asymptotic bounds hold, the work supplies a practical and theoretically supported route to MCMC sampling for composite nonsmooth potentials that arise in imaging and inverse problems, where exact proximal maps are intractable. The explicit bias characterization from inexact proximal evaluations is a useful contribution that lets users trade computational effort against convergence without losing the underlying 1/sqrt(T) or 1/T scaling.

major comments (2)
  1. [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (and the supporting Lemma 4.3): the non-asymptotic Wasserstein bound claims that inexact proximal errors contribute only an additive bias term whose sum remains controlled. However, the propagation step from the per-iteration proximal error ε_k through the Moreau-gradient perturbation into the ULA contraction appears to require that ∑ ε_k < ∞ (or a comparable summability condition). For composite potentials this assumption is not automatically inherited from standard practical solvers; a fixed or slowly decaying tolerance would inject a persistent perturbation that can dominate the target rate. Please supply the explicit error-recursion inequality and the minimal decay condition on {ε_k} that makes the bias term o(1/sqrt(T)).
  2. [§3.2, Eq. (12)–(14)] §3.2, Eq. (12)–(14): the definition of the inexact proximal map and its relation to the Moreau envelope gradient is used to replace the exact proximal step. The subsequent analysis treats the resulting gradient error as an additive perturbation whose magnitude is bounded by ε_k, but the Lipschitz constant of the Moreau envelope depends on the smoothing parameter λ. It is unclear whether the stated bound remains uniform when λ is chosen adaptively or when the composite structure (smooth + nonsmooth) makes the effective Lipschitz constant grow with dimension. Clarify the dependence of the bias term on λ and on the composite splitting.
minor comments (2)
  1. [§3 and §4] Notation for the proximal error tolerance is introduced in §3 but reused with different symbols in the convergence statements of §4; a single consistent symbol (e.g., ε_k throughout) would improve readability.
  2. [Figure 2] Figure 2 caption states that the plotted curves correspond to “different inexactness levels,” yet the legend only labels the exact-proximal baseline; adding the numerical values of the tolerance sequence used for each curve would make the experiment reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments have prompted us to clarify key aspects of the convergence analysis. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (and the supporting Lemma 4.3): the non-asymptotic Wasserstein bound claims that inexact proximal errors contribute only an additive bias term whose sum remains controlled. However, the propagation step from the per-iteration proximal error ε_k through the Moreau-gradient perturbation into the ULA contraction appears to require that ∑ ε_k < ∞ (or a comparable summability condition). For composite potentials this assumption is not automatically inherited from standard practical solvers; a fixed or slowly decaying tolerance would inject a persistent perturbation that can dominate the target rate. Please supply the explicit error-recursion inequality and the minimal decay condition on {ε_k} that makes the bias term o(1/sqrt(T)).

    Authors: We appreciate this observation on the error propagation. The proof of Theorem 4.1 proceeds from the one-step recursion W_{k+1} ≤ (1 - c η) W_k + O(η²) + C ε_k, where the contraction factor arises from the strong convexity and smoothness properties after Moreau smoothing. Unrolling the recursion produces a bias contribution bounded by sum_{k=1}^T (1 - c η)^{T-k} ε_k. With the standard choice η ∼ 1/√T that yields the target 1/√T rate, the accumulated bias is o(1/√T) whenever ε_k = O(k^{-1-δ}) for any δ > 0 (or, more weakly, whenever the Cesàro mean of the weighted ε_k vanishes at the required rate). We have inserted the explicit recursion as Lemma 4.4 and added the minimal decay condition to the statement of Theorem 4.1. This schedule is attainable by standard first-order solvers for the proximal subproblems with a tolerance that decreases polynomially in the outer iteration index. revision: yes

  2. Referee: [§3.2, Eq. (12)–(14)] §3.2, Eq. (12)–(14): the definition of the inexact proximal map and its relation to the Moreau envelope gradient is used to replace the exact proximal step. The subsequent analysis treats the resulting gradient error as an additive perturbation whose magnitude is bounded by ε_k, but the Lipschitz constant of the Moreau envelope depends on the smoothing parameter λ. It is unclear whether the stated bound remains uniform when λ is chosen adaptively or when the composite structure (smooth + nonsmooth) makes the effective Lipschitz constant grow with dimension. Clarify the dependence of the bias term on λ and on the composite splitting.

    Authors: The gradient of the Moreau envelope of any proper lower-semicontinuous function is always 1/λ-Lipschitz, independently of dimension and of whether the original potential is composite. In our composite setting the proximal step is applied solely to the nonsmooth summand; the resulting gradient perturbation is controlled by ε_k and enters the analysis as an additive term whose size is independent of the Lipschitz constant of the smooth summand. The overall bias in the Wasserstein bound scales with λ through the standard Moreau approximation error O(λ Lip(f_nonsmooth)). We have expanded the discussion following Eq. (14) to state these facts explicitly, to note that λ may be held fixed or adapted slowly without altering the asymptotic rate, and to confirm uniformity with respect to dimension under the standing assumptions on the potential. revision: yes

Circularity Check

0 steps flagged

Convergence analysis for iPULA derives independent bounds on inexactness bias without reducing to self-definition or fitted inputs

full rationale

The paper introduces iPULA as an algorithmic construction that replaces exact proximal steps with controlled approximations via the Moreau envelope, then derives non-asymptotic convergence guarantees by propagating per-step proximal errors through Wasserstein or KL contractions. This derivation relies on standard assumptions about error decay (e.g., summable tolerances) and standard smoothing properties of the Moreau envelope; these are not shown to be equivalent to the target rates by the paper's own equations. No self-citation is load-bearing for the central claim, and the analysis does not rename known results or smuggle ansatzes. The result remains self-contained against external benchmarks for proximal Langevin methods.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on standard domain assumptions from nonsmooth optimization and sampling theory; no free parameters or new invented entities are introduced beyond the algorithm itself.

axioms (1)
  • domain assumption The composite potential admits a proximal operator that can be approximated to controllable accuracy.
    Invoked to justify replacing exact proximal steps with inexact ones while retaining convergence.
invented entities (1)
  • iPULA (inexact proximal unadjusted Langevin algorithm) no independent evidence
    purpose: To enable sampling from nonsmooth composite posteriors when exact proximal operators are unavailable.
    New algorithmic construction presented in the paper.

pith-pipeline@v0.9.0 · 5687 in / 1301 out tokens · 25211 ms · 2026-05-19T22:48:11.619405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Ahookhosh, A

    M. Ahookhosh, A. Themelis, and P. Patrinos. A Bregman forward-backward linesearch algo- rithm for nonconvex composite optimization: Superlinear convergence to nonisolated local min- ima.SIAM Journal on Optimization, 31(1):653–685, 2021

  2. [2]

    H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2nd edition, 2017

  3. [3]

    Brooks, A

    S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, editors.Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, 2011

  4. [4]

    N. S. Chatterji, J. Diakonikolas, M. I. Jordan, and P. L. Bartlett. Langevin Monte Carlo without smoothness. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machine Learning Research, pages 1716–1726, 2020

  5. [5]

    F. R. Crucinio, A. Durmus, P. Jim´ enez, and G. O. Roberts. Optimal scaling results for Moreau- Yosida Metropolis-adjusted Langevin algorithms.Bernoulli, 31(3):1889–1907, 2025

  6. [6]

    A. S. Dalalyan. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InProceedings of the 2017 Conference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 678–689, 2017

  7. [7]

    A. S. Dalalyan and A. G. Karagulyan. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019

  8. [8]

    A. S. Dalalyan, A. Karagulyan, and L. Riou-Durand. Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets.Journal of Machine Learning Research, 23(235):1– 38, 2022

  9. [9]

    Davis, D

    D. Davis, D. Drusvyatskiy, K. J. MacPhee, and C. Paquette. Subgradient methods for sharp weakly convex functions.Journal of Optimization Theory and Applications, 179(3):962–982, 2018

  10. [10]

    K. A. Dubey, S. J. Reddi, S. A. Williamson, B. P´ oczos, A. J. Smola, and E. P. Xing. Variance reduction in stochastic gradient Langevin dynamics. InAdvances in Neural Information Processing Systems, volume 29, 2016

  11. [11]

    Durmus and E

    A. Durmus and E. Moulines. High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019

  12. [12]

    Durmus, E

    A. Durmus, E. Moulines, and M. Pereyra. Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau.SIAM Journal on Imaging Sciences, 11(1):473–506, 2018

  13. [13]

    Eftekhari, L

    A. Eftekhari, L. Vargas, and K. C. Zygalakis. The forward-backward envelope for sampling with the overdamped Langevin algorithm.Statistics and Computing, 33(4):85, 2023

  14. [14]

    Ghaderi, M

    S. Ghaderi, M. Ahookhosh, A. Arany, A. Skupin, P. Patrinos, and Y. Moreau. Smoothing unad- justed Langevin algorithms for nonsmooth composite potential functions.Applied Mathematics and Computation, 464:128377, 2024. 18

  15. [15]

    Habring, M

    A. Habring, M. Holler, and T. Pock. Subgradient Langevin methods for sampling from nonsmooth potentials.SIAM Journal on Mathematics of Data Science, 6(4):897–925, 2024

  16. [16]

    F. Han, S. Osher, and W. Li. Splitting regularized Wasserstein proximal algorithms for nonsmooth sampling problems.arXiv preprint arXiv:2502.16773, 2025

  17. [17]

    Johnston, I

    T. Johnston, I. Lytras, N. Makras, and S. Sabanis. The performance of the unadjusted Langevin algorithm without smoothness assumptions.arXiv preprint arXiv:2502.03458, 2025

  18. [18]

    Kabgani and M

    A. Kabgani and M. Ahookhosh. Moreau envelope and proximal-point methods under the lens of high-order regularization.Set-Valued and Variational Analysis, 33:47, 2025

  19. [19]

    Kabgani and M

    A. Kabgani and M. Ahookhosh. ItsOPT: An inexact two-level smoothing framework for nonconvex optimization via high-order Moreau envelope.arXiv preprint arXiv:2410.19928, 2024

  20. [20]

    Kabgani and M

    A. Kabgani and M. Ahookhosh. ItsDEAL: Inexact two-level smoothing descent algorithms for weakly convex optimization.arXiv preprint arXiv:2501.02155, 2025

  21. [21]

    Klatzer, P

    T. Klatzer, P. Dobson, Y. Altmann, M. Pereyra, J. M. Sanz-Serna, and K. C. Zygalakis. Acceler- ated Bayesian imaging by relaxed proximal-point Langevin sampling.SIAM Journal on Imaging Sciences, 17(2):1078–1117, 2024

  22. [22]

    T. T.-K. Lau and H. Liu. Bregman proximal Langevin Monte Carlo via Bregman–Moreau en- velopes. InProceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 12049–12077, 2022

  23. [23]

    T. T.-K. Lau, H. Liu, and T. Pock. Non-log-concave and nonsmooth sampling via Langevin Monte Carlo algorithms. InAdvanced Techniques in Optimization for Machine Learning and Imaging, volume 61 ofSpringer INdAM Series, pages 83–149. Springer, 2024

  24. [24]

    Y.-A. Ma, T. Chen, and E. B. Fox. A complete recipe for stochastic gradient MCMC. InAdvances in Neural Information Processing Systems, volume 28, 2015

  25. [25]

    Mordukhovich and N.M

    B.S. Mordukhovich and N.M. Nam. An Easy Path to Convex Analysis and Applications.Williston: Morgan & Claypool, 2014

  26. [26]

    J. J. Moreau. Propri´ et´ es des applications “prox”.Comptes Rendus Hebdomadaires des S´ eances de l’Acad´ emie des Sciences, 256:1069–1071, 1963

  27. [27]

    Nesterov and V

    Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017

  28. [28]

    M. Pereyra. Proximal Markov chain Monte Carlo algorithms.Statistics and Computing, 26(4):745– 760, 2016

  29. [29]

    Planiden and X

    C. Planiden and X. Wang. Strongly convex functions, Moreau envelopes, and the generic nature of convex functions with strong minimizers.SIAM Journal on Optimization, 26(2):1341–1364, 2016

  30. [30]

    Rahimi, S

    M. Rahimi, S. Ghaderi, Y. Moreau, and M. Ahookhosh. Projected subgradient methods for paraconvex optimization: Application to robust low-rank matrix recovery.arXiv preprint arXiv:2501.00427, 2024

  31. [31]

    C. P. Robert and G. Casella.Monte Carlo Statistical Methods. Springer, 2nd edition, 2004

  32. [32]

    Salim, D

    A. Salim, D. Kovalev, and P. Richt´ arik. Stochastic proximal Langevin algorithm: Potential split- ting and nonasymptotic rates. InAdvances in Neural Information Processing Systems, volume 32, 2019. 19

  33. [33]

    Shukla, D

    A. Shukla, D. Vats, and E. C. Chi. MCMC importance sampling via Moreau–Yosida envelopes. arXiv preprint arXiv:2501.02228, 2025

  34. [34]

    Stella, A

    L. Stella, A. Themelis, and P. Patrinos. Forward-backward quasi-Newton methods for nonsmooth optimization problems.Computational Optimization and Applications, 67(3):443–487, 2017

  35. [35]

    Themelis, M

    A. Themelis, M. Ahookhosh, and P. Patrinos. On the acceleration of forward-backward splitting via an inexact Newton method. InSplitting Algorithms, Modern Operator Theory, and Applica- tions, pages 363–412. Springer, 2019. 20