Speeding Up Nonsmooth Bayesian MCMC Sampling via Inexact Proximal Unadjusted Langevin Algorithm
Pith reviewed 2026-05-19 22:48 UTC · model grok-4.3
The pith
An inexact proximal unadjusted Langevin algorithm samples from nonsmooth composite posteriors by using controlled approximations in place of exact proximal operators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that iPULA achieves non-asymptotic convergence for sampling from posteriors with nonsmooth composite potentials by smoothing via the Moreau envelope and replacing exact proximal steps with inexact computations, with the sampling error bounded explicitly by a term that grows with the level of inexactness while the overall convergence rate order remains unchanged up to this added bias.
What carries the argument
The iPULA iteration that applies the Moreau envelope to smooth the composite potential and then uses inexact proximal evaluations to approximate the required gradient steps inside an unadjusted Langevin update.
If this is right
- The total sampling error decomposes into the usual discretization and bias terms plus an additive term controlled by the proximal approximation tolerance.
- The same convergence order as the exact proximal unadjusted Langevin algorithm is retained provided the inexactness tolerance is chosen to decay appropriately with the step size.
- The method extends the practical reach of proximal Langevin sampling to composite potentials whose proximal operators lack closed forms, such as those arising in medical imaging.
- If the inexactness level is held below a threshold proportional to the step size, the stationary distribution remains close to the true posterior in Wasserstein distance.
Where Pith is reading between the lines
- The same controlled-inexactness strategy could be transferred to other proximal MCMC schemes such as proximal MALA or proximal Hamiltonian Monte Carlo.
- In very high dimensions the computational saving from loose proximal tolerances may outweigh the added bias, but this trade-off remains to be quantified.
- The framework suggests a route to adaptive step-size and tolerance schedules that balance bias and variance without requiring exact proximal oracles.
Load-bearing premise
The inexact proximal computations can be carried out with enough accuracy control that the resulting bias stays measurable and does not erase the convergence rate.
What would settle it
Implement iPULA on a test problem with known exact proximal operator, deliberately vary the accuracy of the proximal approximation across runs, and check whether the observed distance to the target posterior grows exactly as predicted by the bias term in the non-asymptotic bound.
Figures
read the original abstract
We study sampling from posterior distributions with nonsmooth composite potentials, a setting in which proximal-based Langevin methods are theoretically appealing but in practice limited to simple functions with closed-form proximal operators. We introduce iPULA for composite potentials, an inexact proximal unadjusted Langevin algorithm that replaces exact proximal steps with controlled approximations. Our approach leverages the Moreau envelope to smooth the potential, while allowing inexact evaluation of its gradient through inexact proximal computations. We establish non-asymptotic convergence guarantees for iPULA, explicitly characterizing the impact of inexactness on the sampling error and showing that the inexactness preserves convergence rates up to a quantifiable bias. We demonstrate the practical relevance of iPULA on a medical image reconstruction task, where proximal operators cannot be computed exactly. Experiments demonstrate the effectiveness of iPULA and support our theoretical results.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces iPULA, an inexact proximal unadjusted Langevin algorithm for sampling from posterior distributions with nonsmooth composite potentials. It replaces exact proximal steps with controlled approximations via the Moreau envelope, derives non-asymptotic convergence guarantees that characterize the sampling error impact of inexactness as a quantifiable bias while preserving rates, and demonstrates the method on a medical image reconstruction task where exact proximal operators are unavailable.
Significance. If the non-asymptotic bounds hold, the work supplies a practical and theoretically supported route to MCMC sampling for composite nonsmooth potentials that arise in imaging and inverse problems, where exact proximal maps are intractable. The explicit bias characterization from inexact proximal evaluations is a useful contribution that lets users trade computational effort against convergence without losing the underlying 1/sqrt(T) or 1/T scaling.
major comments (2)
- [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (and the supporting Lemma 4.3): the non-asymptotic Wasserstein bound claims that inexact proximal errors contribute only an additive bias term whose sum remains controlled. However, the propagation step from the per-iteration proximal error ε_k through the Moreau-gradient perturbation into the ULA contraction appears to require that ∑ ε_k < ∞ (or a comparable summability condition). For composite potentials this assumption is not automatically inherited from standard practical solvers; a fixed or slowly decaying tolerance would inject a persistent perturbation that can dominate the target rate. Please supply the explicit error-recursion inequality and the minimal decay condition on {ε_k} that makes the bias term o(1/sqrt(T)).
- [§3.2, Eq. (12)–(14)] §3.2, Eq. (12)–(14): the definition of the inexact proximal map and its relation to the Moreau envelope gradient is used to replace the exact proximal step. The subsequent analysis treats the resulting gradient error as an additive perturbation whose magnitude is bounded by ε_k, but the Lipschitz constant of the Moreau envelope depends on the smoothing parameter λ. It is unclear whether the stated bound remains uniform when λ is chosen adaptively or when the composite structure (smooth + nonsmooth) makes the effective Lipschitz constant grow with dimension. Clarify the dependence of the bias term on λ and on the composite splitting.
minor comments (2)
- [§3 and §4] Notation for the proximal error tolerance is introduced in §3 but reused with different symbols in the convergence statements of §4; a single consistent symbol (e.g., ε_k throughout) would improve readability.
- [Figure 2] Figure 2 caption states that the plotted curves correspond to “different inexactness levels,” yet the legend only labels the exact-proximal baseline; adding the numerical values of the tolerance sequence used for each curve would make the experiment reproducible.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. The comments have prompted us to clarify key aspects of the convergence analysis. We address each major comment below and indicate the revisions made to the manuscript.
read point-by-point responses
-
Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (and the supporting Lemma 4.3): the non-asymptotic Wasserstein bound claims that inexact proximal errors contribute only an additive bias term whose sum remains controlled. However, the propagation step from the per-iteration proximal error ε_k through the Moreau-gradient perturbation into the ULA contraction appears to require that ∑ ε_k < ∞ (or a comparable summability condition). For composite potentials this assumption is not automatically inherited from standard practical solvers; a fixed or slowly decaying tolerance would inject a persistent perturbation that can dominate the target rate. Please supply the explicit error-recursion inequality and the minimal decay condition on {ε_k} that makes the bias term o(1/sqrt(T)).
Authors: We appreciate this observation on the error propagation. The proof of Theorem 4.1 proceeds from the one-step recursion W_{k+1} ≤ (1 - c η) W_k + O(η²) + C ε_k, where the contraction factor arises from the strong convexity and smoothness properties after Moreau smoothing. Unrolling the recursion produces a bias contribution bounded by sum_{k=1}^T (1 - c η)^{T-k} ε_k. With the standard choice η ∼ 1/√T that yields the target 1/√T rate, the accumulated bias is o(1/√T) whenever ε_k = O(k^{-1-δ}) for any δ > 0 (or, more weakly, whenever the Cesàro mean of the weighted ε_k vanishes at the required rate). We have inserted the explicit recursion as Lemma 4.4 and added the minimal decay condition to the statement of Theorem 4.1. This schedule is attainable by standard first-order solvers for the proximal subproblems with a tolerance that decreases polynomially in the outer iteration index. revision: yes
-
Referee: [§3.2, Eq. (12)–(14)] §3.2, Eq. (12)–(14): the definition of the inexact proximal map and its relation to the Moreau envelope gradient is used to replace the exact proximal step. The subsequent analysis treats the resulting gradient error as an additive perturbation whose magnitude is bounded by ε_k, but the Lipschitz constant of the Moreau envelope depends on the smoothing parameter λ. It is unclear whether the stated bound remains uniform when λ is chosen adaptively or when the composite structure (smooth + nonsmooth) makes the effective Lipschitz constant grow with dimension. Clarify the dependence of the bias term on λ and on the composite splitting.
Authors: The gradient of the Moreau envelope of any proper lower-semicontinuous function is always 1/λ-Lipschitz, independently of dimension and of whether the original potential is composite. In our composite setting the proximal step is applied solely to the nonsmooth summand; the resulting gradient perturbation is controlled by ε_k and enters the analysis as an additive term whose size is independent of the Lipschitz constant of the smooth summand. The overall bias in the Wasserstein bound scales with λ through the standard Moreau approximation error O(λ Lip(f_nonsmooth)). We have expanded the discussion following Eq. (14) to state these facts explicitly, to note that λ may be held fixed or adapted slowly without altering the asymptotic rate, and to confirm uniformity with respect to dimension under the standing assumptions on the potential. revision: yes
Circularity Check
Convergence analysis for iPULA derives independent bounds on inexactness bias without reducing to self-definition or fitted inputs
full rationale
The paper introduces iPULA as an algorithmic construction that replaces exact proximal steps with controlled approximations via the Moreau envelope, then derives non-asymptotic convergence guarantees by propagating per-step proximal errors through Wasserstein or KL contractions. This derivation relies on standard assumptions about error decay (e.g., summable tolerances) and standard smoothing properties of the Moreau envelope; these are not shown to be equivalent to the target rates by the paper's own equations. No self-citation is load-bearing for the central claim, and the analysis does not rename known results or smuggle ansatzes. The result remains self-contained against external benchmarks for proximal Langevin methods.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The composite potential admits a proximal operator that can be approximated to controllable accuracy.
invented entities (1)
-
iPULA (inexact proximal unadjusted Langevin algorithm)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce iPULA … leverages the Moreau envelope to smooth the potential, while allowing inexact evaluation of its gradient through inexact proximal computations … non-asymptotic convergence guarantees … impact of inexactness on the sampling error
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leancostAlphaLog_high_calibrated_iff unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
U_γ(x) := inf_y U(y) + ½γ‖x−y‖² … ∇U_γ(x) = 1/γ (x − Prox_γU(x))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Ahookhosh, A. Themelis, and P. Patrinos. A Bregman forward-backward linesearch algo- rithm for nonconvex composite optimization: Superlinear convergence to nonisolated local min- ima.SIAM Journal on Optimization, 31(1):653–685, 2021
work page 2021
-
[2]
H. H. Bauschke and P. L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, 2nd edition, 2017
work page 2017
- [3]
-
[4]
N. S. Chatterji, J. Diakonikolas, M. I. Jordan, and P. L. Bartlett. Langevin Monte Carlo without smoothness. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machine Learning Research, pages 1716–1726, 2020
work page 2020
-
[5]
F. R. Crucinio, A. Durmus, P. Jim´ enez, and G. O. Roberts. Optimal scaling results for Moreau- Yosida Metropolis-adjusted Langevin algorithms.Bernoulli, 31(3):1889–1907, 2025
work page 1907
-
[6]
A. S. Dalalyan. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. InProceedings of the 2017 Conference on Learning Theory, volume 65 ofProceedings of Machine Learning Research, pages 678–689, 2017
work page 2017
-
[7]
A. S. Dalalyan and A. G. Karagulyan. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient.Stochastic Processes and their Applications, 129(12):5278–5311, 2019
work page 2019
-
[8]
A. S. Dalalyan, A. Karagulyan, and L. Riou-Durand. Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets.Journal of Machine Learning Research, 23(235):1– 38, 2022
work page 2022
- [9]
-
[10]
K. A. Dubey, S. J. Reddi, S. A. Williamson, B. P´ oczos, A. J. Smola, and E. P. Xing. Variance reduction in stochastic gradient Langevin dynamics. InAdvances in Neural Information Processing Systems, volume 29, 2016
work page 2016
-
[11]
A. Durmus and E. Moulines. High-dimensional Bayesian inference via the unadjusted Langevin algorithm.Bernoulli, 25(4A):2854–2882, 2019
work page 2019
- [12]
-
[13]
A. Eftekhari, L. Vargas, and K. C. Zygalakis. The forward-backward envelope for sampling with the overdamped Langevin algorithm.Statistics and Computing, 33(4):85, 2023
work page 2023
-
[14]
S. Ghaderi, M. Ahookhosh, A. Arany, A. Skupin, P. Patrinos, and Y. Moreau. Smoothing unad- justed Langevin algorithms for nonsmooth composite potential functions.Applied Mathematics and Computation, 464:128377, 2024. 18
work page 2024
-
[15]
A. Habring, M. Holler, and T. Pock. Subgradient Langevin methods for sampling from nonsmooth potentials.SIAM Journal on Mathematics of Data Science, 6(4):897–925, 2024
work page 2024
- [16]
-
[17]
T. Johnston, I. Lytras, N. Makras, and S. Sabanis. The performance of the unadjusted Langevin algorithm without smoothness assumptions.arXiv preprint arXiv:2502.03458, 2025
-
[18]
A. Kabgani and M. Ahookhosh. Moreau envelope and proximal-point methods under the lens of high-order regularization.Set-Valued and Variational Analysis, 33:47, 2025
work page 2025
-
[19]
A. Kabgani and M. Ahookhosh. ItsOPT: An inexact two-level smoothing framework for nonconvex optimization via high-order Moreau envelope.arXiv preprint arXiv:2410.19928, 2024
-
[20]
A. Kabgani and M. Ahookhosh. ItsDEAL: Inexact two-level smoothing descent algorithms for weakly convex optimization.arXiv preprint arXiv:2501.02155, 2025
-
[21]
T. Klatzer, P. Dobson, Y. Altmann, M. Pereyra, J. M. Sanz-Serna, and K. C. Zygalakis. Acceler- ated Bayesian imaging by relaxed proximal-point Langevin sampling.SIAM Journal on Imaging Sciences, 17(2):1078–1117, 2024
work page 2024
-
[22]
T. T.-K. Lau and H. Liu. Bregman proximal Langevin Monte Carlo via Bregman–Moreau en- velopes. InProceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 12049–12077, 2022
work page 2022
-
[23]
T. T.-K. Lau, H. Liu, and T. Pock. Non-log-concave and nonsmooth sampling via Langevin Monte Carlo algorithms. InAdvanced Techniques in Optimization for Machine Learning and Imaging, volume 61 ofSpringer INdAM Series, pages 83–149. Springer, 2024
work page 2024
-
[24]
Y.-A. Ma, T. Chen, and E. B. Fox. A complete recipe for stochastic gradient MCMC. InAdvances in Neural Information Processing Systems, volume 28, 2015
work page 2015
-
[25]
B.S. Mordukhovich and N.M. Nam. An Easy Path to Convex Analysis and Applications.Williston: Morgan & Claypool, 2014
work page 2014
-
[26]
J. J. Moreau. Propri´ et´ es des applications “prox”.Comptes Rendus Hebdomadaires des S´ eances de l’Acad´ emie des Sciences, 256:1069–1071, 1963
work page 1963
-
[27]
Y. Nesterov and V. Spokoiny. Random gradient-free minimization of convex functions.Founda- tions of Computational Mathematics, 17(2):527–566, 2017
work page 2017
-
[28]
M. Pereyra. Proximal Markov chain Monte Carlo algorithms.Statistics and Computing, 26(4):745– 760, 2016
work page 2016
-
[29]
C. Planiden and X. Wang. Strongly convex functions, Moreau envelopes, and the generic nature of convex functions with strong minimizers.SIAM Journal on Optimization, 26(2):1341–1364, 2016
work page 2016
- [30]
-
[31]
C. P. Robert and G. Casella.Monte Carlo Statistical Methods. Springer, 2nd edition, 2004
work page 2004
- [32]
- [33]
- [34]
-
[35]
A. Themelis, M. Ahookhosh, and P. Patrinos. On the acceleration of forward-backward splitting via an inexact Newton method. InSplitting Algorithms, Modern Operator Theory, and Applica- tions, pages 363–412. Springer, 2019. 20
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.