Hadamard Langevin dynamics for sampling the l1-prior
Pith reviewed 2026-05-23 17:26 UTC · model grok-4.3
The pith
Hadamard Langevin dynamics recover the exact l1 posterior via a smooth nonconvex potential and come with proofs of existence, ergodicity and discretization convergence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Hadamard product parameterization produces a smooth but nonconvex and non-globally Lipschitz potential whose marginal law exactly recovers the desired posterior; the associated Hadamard Langevin dynamics therefore defines a diffusion process that is analytically distinct from proximal or mirror-type Langevin schemes, and the paper establishes existence and uniqueness of strong solutions, geometric ergodicity of the continuous dynamics, and convergence of the discretized scheme as the step size tends to zero.
What carries the argument
The Hadamard product parameterization of the l1-norm, which yields a smooth potential whose marginal exactly matches the target l1 posterior and generates the Hadamard Langevin dynamics.
If this is right
- Sampling from posteriors with l1 priors becomes possible without proximal mappings or smooth approximations that change the target distribution.
- The discretized scheme converges to the continuous dynamics, justifying reliable numerical implementations for small enough step sizes.
- Geometric ergodicity guarantees exponential convergence of the continuous process to its invariant measure.
- Existence and uniqueness of strong solutions ensure the diffusion is well-defined for all time.
- Overparameterized Langevin dynamics now have a theoretical foundation for nonconvex nonsmooth posteriors.
Where Pith is reading between the lines
- Similar Hadamard-style reparameterizations might be tested on other nonsmooth sparsity priors such as l0 or group lasso.
- Numerical experiments comparing mixing times of HLD against proximal Langevin methods on concrete inverse problems would quantify practical gains.
- The non-global Lipschitz property may require specialized step-size rules or adaptive schemes in high dimensions.
- Links to mirror descent or other reparameterized optimization methods could be examined for shared convergence mechanisms.
Load-bearing premise
The Hadamard product parameterization of the l1-norm produces a smooth potential whose marginal distribution exactly recovers the desired posterior.
What would settle it
An explicit counter-example in which the stationary distribution of the Hadamard Langevin dynamics differs from the target l1 posterior would falsify the exact-recovery claim.
read the original abstract
Priors with non-smooth log-densities, such as the l1-prior, are widely used in Bayesian inverse problems for their sparsity-inducing properties. Existing Langevin-based sampling methods typically rely on proximal mappings or smooth approximations, which alter the target distribution. We propose an alternative approach based on a Hadamard product parameterization of the l1-norm, leading to a smooth but nonconvex and non-globally Lipschitz potential whose marginal law exactly recovers the desired posterior. The resulting Hadamard Langevin dynamics (HLD) defines a diffusion process that is analytically distinct from proximal or mirror-type Langevin schemes. Our main contribution is a rigorous well-posedness theory for both the continuous and discrete HLD. We establish existence and uniqueness of strong solutions, geometric ergodicity of the continuous dynamics, and convergence of the discretized scheme as the step size tends to zero. These results provide the first theoretical foundation for sampling from nonconvex, nonsmooth posteriors through overparameterized Langevin dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Hadamard Langevin dynamics (HLD) obtained by reparameterizing the l1-norm via the Hadamard product x = u ⊙ v, yielding a smooth (but nonconvex, non-globally Lipschitz) potential on the overparameterized variables whose marginal on x is asserted to recover the target l1-posterior exactly. The main results are existence and uniqueness of strong solutions to the continuous SDE, geometric ergodicity, and convergence of an Euler-type discretization as the step size tends to zero. These are presented as the first rigorous well-posedness theory for sampling nonconvex nonsmooth posteriors via overparameterized Langevin dynamics.
Significance. If the exact marginal-recovery claim were correct, the work would supply the first complete well-posedness theory (existence, uniqueness, ergodicity, discretization) for an overparameterized Langevin scheme targeting a nonsmooth sparsity-inducing prior, which would be a notable contribution to the analysis of non-standard Langevin methods in Bayesian inverse problems.
major comments (1)
- [Abstract] Abstract (paragraph 2) and the central modeling claim: the statement that the Hadamard parameterization produces a potential “whose marginal law exactly recovers the desired posterior” is incorrect. With joint potential (λ/2)(‖u‖² + ‖v‖²) the marginal density on x is proportional to the modified Bessel function K_0(λ|x|), obtained from the integral ∫ du/|u| exp(−(λ/2)(u² + (x/u)²)), which is asymptotically similar to but distinct from the Laplace density exp(−λ|x|) both near the origin and in its precise functional form. Consequently the subsequent well-posedness, ergodicity, and discretization theorems do not apply to the intended l1-prior posterior.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying the error in our central modeling claim regarding exact marginal recovery of the l1-posterior. We agree that the calculation is correct and that the marginal is proportional to K_0(λ|x|) rather than exp(−λ|x|). This requires substantial revisions to the abstract, introduction, and all statements about the target distribution. The well-posedness, ergodicity, and discretization results remain valid but now apply to the corrected target (a sparsity-inducing distribution with K_0 marginal). We will also add discussion of the relationship to the Laplace prior.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph 2) and the central modeling claim: the statement that the Hadamard parameterization produces a potential “whose marginal law exactly recovers the desired posterior” is incorrect. With joint potential (λ/2)(‖u‖² + ‖v‖²) the marginal density on x is proportional to the modified Bessel function K_0(λ|x|), obtained from the integral ∫ du/|u| exp(−(λ/2)(u² + (x/u)²)), which is asymptotically similar to but distinct from the Laplace density exp(−λ|x|) both near the origin and in its precise functional form. Consequently the subsequent well-posedness, ergodicity, and discretization theorems do not apply to the intended l1-prior posterior.
Authors: We fully agree with the referee's calculation and observation. The marginal density on x under the given joint potential is indeed proportional to K_0(λ|x|), not the Laplace density, due to the geometry of the preimage under the map (u,v) ↦ uv and the associated change-of-variables factor. Our original claim of exact recovery of the l1-posterior was incorrect. We will revise the manuscript (abstract, modeling section, and all related claims) to state that the Hadamard Langevin dynamics targets the distribution whose marginal involves the modified Bessel function K_0(λ|x|). This distribution remains sparsity-inducing (logarithmic singularity at zero and exponential tails) and asymptotically similar to the Laplace prior for large |x|. The existence/uniqueness, geometric ergodicity, and discretization-convergence theorems apply verbatim to the dynamics with this corrected target; we will update the narrative to reflect that the results provide rigorous theory for sampling from this overparameterized nonconvex potential rather than claiming exact l1 recovery. We apologize for the misstatement and will ensure the revision accurately describes the contribution. revision: yes
Circularity Check
No circularity: derivation relies on standard SDE theory applied to new dynamics
full rationale
The paper's claims concern existence/uniqueness of strong solutions, geometric ergodicity, and discretization convergence for the Hadamard Langevin dynamics. These are standard results from SDE theory applied to a new (non-globally Lipschitz) potential; no steps reduce by construction to fitted inputs, self-definitions, or self-citation chains. The marginal-recovery claim is presented as a direct consequence of the parameterization (not derived from prior fitted quantities or renamed known results). The derivation chain is self-contained against external benchmarks and does not invoke load-bearing self-citations or uniqueness theorems from the same authors. This is the normal case of an independent theoretical contribution.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hadamard product parameterization of the l1-norm yields a potential whose marginal law exactly recovers the desired posterior.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
by applying a change of variables of x = u ⊙ v and η = u ⊙ u/2 to (1.3), we obtain the following probability distribution π(u, v) = 1/Z_π ∏ ui exp(−β(½λ‖(u,v)‖^{2} + G(u⊙v)))
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2.1. If (u, v) ∼ π … then x = u ⊙ v ∼ ρ
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Sticky CIR process with potential: invariant measure and exact sampling
Proves well-posedness and unique invariant measure for the sticky CIR process and constructs exact and approximate samplers using Green's functions and Girsanov change of measure.
-
Sticky CIR process with potential: invariant measure and exact sampling
The sticky CIR process on [0,∞) has a unique invariant measure that mixes a point mass at zero with a gamma-type density, and admits an exact sampler via an explicit Green's function in the zero-potential case.
Reference graph
Works this paper leans on
-
[1]
Introduction. We develop a novel Langevin sampler for the distribution (1.1) ρ(x) = 1 Z exp −β λ∥x∥1 + G(x) , x ∈ Rd, with normalization constant Z, inverse temperature β > 0, and regularization pa- rameter λ > 0. In data science, G: Rd → R represents the data likelihood and the ℓ1 norm describes the prior. By choosing an appropriate Langevin dynamics wit...
work page internal anchor Pith review Pith/arXiv arXiv 1996
-
[2]
The Hadamard–Langevin dynamics. One limitation of using smoothing methods like the Moreau–Yosida envelope is that convergence to the original distribu- tion is only guaranteed in the limit γ → 0. Additionally, certain statistical quantities, such as modes, may not be preserved under smoothing; for example, Eftekhari et al
-
[3]
highlights alternative smoothing techniques that aim to preserve such features. On the other hand, if considering quadratic data terms, the Gibbs sampler works with the exact distribution but is computationally intensive and unsuitable for large-scale computations. In this work, we follow the idea of the hierarchical framework of the Gibbs sampler of Park...
work page 2008
-
[4]
This includes geometric convergence to the invariant measure (section 3)
Establish the well-posedness on X of the Langevin system associated with π given in (2.1). This includes geometric convergence to the invariant measure (section 3). We present two approaches: the first removes the singularity in the log(u) term via a Cartesian change of coordinates, to transform our drift into a locally bounded term. The second exploits c...
-
[5]
Develop a numerical scheme to approximate the Langevin system and show the convergence of the numerical scheme to π (section 4). We establish strong convergence of the method as well as discuss convergence of the stationary distribution of the numerical method
-
[6]
Demonstrate the effectiveness of our numerical scheme for sampling π with several test problems, ranging from one-dimensional estimation of the conver- gence rate to classical imaging problems such as wavelet inpainting (section 5 and Appendix D). The supplementary material includes appendices for further numerical experiments and more detailed computatio...
work page 2012
-
[7]
We work with G under the following assumptions
Continuous dynamics. We work with G under the following assumptions. Assumption 3.1 (data term G). Assume that (i) G is bounded below, that (ii) G: Rd → R is continuously differentiable, that (iii) x⊤∇G(x) is bounded below (so x⊤∇G(x) ≥ − K for some K), and that (iv) ∥∇G(x)∥∞ is bounded uniformly for x ∈ Rd (so B := supx ∥∇G(x)∥∞ < ∞). The data term G is ...
work page 2013
-
[8]
sup t>0 E " 1 ur i,t # < ∞ for 0 < r < 2 (here ui,t denotes the ith component of ut)
-
[9]
For each T > 0 and ϵ > 0, we have E R T 0 1 u2−ϵ i,s ds q < ∞ for all q ≥ 1. Proof. Let ϕ(u, v) = u−r 1 for example. Since r < 2, we can choose conjugate exponents (p, q) such that q r < 2. Due to the boundedness assumption on ρ0/π and π being a probability measure, ρ0/π ∈ Lp(π). Further, ϕ ∈ Lq(π) as Z ϕq π = 1 Zπ Z 1 uq r−1 1 u2 · · · ud exp −β 1 2 λ ∥(...
-
[10]
By Ito’s formula applied to ( u, v) 7→ uϵ 1 for ϵ > 0, 1 β ϵ2 Z T 0 uϵ−2 1,t dt = uϵ 1,T − uϵ 1,0 + λ ϵ Z T 0 uϵ 1,t dt + ϵ Z T 0 v1,t [∇G(ut ⊙ vt)]1 uϵ−1 1,t dt − r 2 β ϵ Z T 0 uϵ−1 1,t dW 1,1 t . (3.4) By Young’s inequality with conjugate exponents ((2 − ϵ)/(1 − ϵ), (2 − ϵ)), we have for any α > 0 that |v1 ∇G(u ⊙ v) uϵ−1 1 | ≤ 1 2 − ϵ α−(2−ϵ)|v1 [∇G(u ⊙...
work page 1991
-
[11]
E ∥(u, v)t − (u, v)s∥q < c |t − s|q/2
-
[12]
E h |ui,t−ui,s| ui,s ui,t i ≤ c |t − s|1/2−ϵ, i = 1, . . . , d Proof. 1. We treat the u1 component only ( v is handled similarly). From (3.1), u1,t = u1,s + 1 β Z t s 1 u1,r dr + · · · + r 2 β Z t s dW 1 r , where we omit several integrals that are straightforward to estimate under the moment bound in Theorem 3.2. By Holder’s inequality with conjugate exp...
-
[13]
Then, P(R) ≤ P(ui,t < ∆) + P(ui,s < ∆)
Let R = {(ui,t ∧ ui,s) < ∆}. Then, P(R) ≤ P(ui,t < ∆) + P(ui,s < ∆). By Chebyshev’s inequality, P(ui,t < ∆) ≤ ∆2−ϵ E uϵ−2 i,t for ϵ > 0. Hence, using Corollary 3.4, for a constant c, P(R) ≤ c ∆2−ϵ. Now, split the expectation with respect to R as (3.5) E |ui,t − ui,s| ui,s ui,t = E |ui,t − ui,s| ui,s ui,t 1R + E |ui,t − ui,s| ui,s ui,t (1 − 1R) . We estima...
work page 1951
-
[14]
Discretization. We consider the following time-stepping approximation: for time step ∆ t, we seek an approximation ( u, v)n to (u, v)n∆t via uk+1 − uk = −λ uk+1 ∆t − vk ⊙ ∇G(uk ⊙ vk) ∆t + 1 β uk+1 ∆t + r 2 β ∆W 1 k, vk+1 − vk = −λ vk+1 ∆t − uk ⊙ ∇G(uk ⊙ vk) ∆t + r 2 β ∆W 2 k, (4.1) where ∆W i k = W i tk+1 − W i tk. The update rule is implicit in the regul...
work page 2011
-
[15]
The code for reproducing our numerical exper- iments can be found online 1
Numerical experiments. The code for reproducing our numerical exper- iments can be found online 1. Throughout, we consider the setting where G(x) = ∥Ax − y∥2/2. The stepsize and Moreau-envelope regularization choice for Prox-l1 are taken to be γ = 1/(KL) for some K ≥ 1 and ∆t = γ/(5(γL + 1), where L = ∥A∥2 is the Lipschitz constant of the gradient of G. T...
work page 2018
-
[16]
Conclusion. In this work, we proposed a new approach for sampling with the Laplace prior a Hadamard parameterization. Unlike Proximal Langevin, there is no smoothing involved, and the stationary distribution of our continuous Langevin dynamics corresponds exactly to the sought-after distribution. We carried out a theoretical analysis, showing well-posedne...
work page 2016
-
[17]
We have µ(z) = −M(z)∇H(z) + 1 2 1 x/η = −M(z)∇H(z) + divM(z)−1(M(z))
+ 1 2 −(4ηt + x2 t 4ηt ) ⊙ ∇G(xt) − 2λxt + xt 4ηt # | {z } µ(zt) dt+ √ 2 1 2 diag(ut) 0 diag(vt) diag( ut) | {z } S(zt) dW 1 t dW 2 t . We have µ(z) = −M(z)∇H(z) + 1 2 1 x/η = −M(z)∇H(z) + divM(z)−1(M(z)). Letting ρ(t, ·) denote the distribution of zt, we have ∂ ∂t ρ = −div µ⊤(z)ρ + X i X j ∂2 ∂zi ∂zj (Mi,j(z, t)ρ). HADAMARD–LANGEVIN DYNAMICS 25 Note that...
-
[18]
− 1 2 + 3 2 (4η + x2 4η )∇G(x) + 2λx − 1 2 x 2η + 1 2 x η #! + div(M ∇ρ) = div η 1 2 x 1 2 x x2 4η + 4γ ! 2λ − λ x2 8η2 + 1 2η ∇G(x) + λ x 4η + ∇ log ρ ! ρ ! = div(M(z)∇(H(z) + log(ρ)ρ). Appendix C. Additional proofs for the discretized scheme. We first remark that our numerical scheme (4.1) can be written explicitly as uk+ 1 2 vk+ 1 2 = uk vk − ∆t vk ⊙ ∇...
work page 2018
-
[19]
ISSN 1935-8237,1935-8245. doi: 10.1561/2200000015. R. Bai, V. Roˇ ckov´ a, and E. I. George. Spike-and-slab meets lasso: A review of the spike-and-slab lasso. Handbook of Bayesian variable selection , pages 81–108, 2021. doi: https://doi.org/10.1201/9781003089018-4. L. R. Bellet. Ergodic Properties of Markov Processes. In S. Attal, A. Joye, and C.-A. Pill...
-
[20]
doi: 10.1109/ciss.2008.4558489
ISBN 9781424422463. doi: 10.1109/ciss.2008.4558489. S. Dereich, A. Neuenkirch, and L. Szpruch. An Euler-type method for the strong approximation of the Cox–Ingersoll–Ross process. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 468(2140):1105–1115, Dec
-
[21]
doi: 10.1098/rspa.2011.0505. A. Durmus, E. Moulines, and M. Pereyra. Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau. SIAM J. Imaging Sci. , 11(1):473–506, Jan. 2018. ISSN 1936-4954. doi: 10.1137/16m1108340. A. Eftekhari, L. Vargas, and K. C. Zygalakis. The forward–backward envelope for sampling with the ove...
-
[22]
doi: 10.2307/1969318. D. Geman and G. Reynolds. Constrained restoration and the recovery of discontinuities. IEEE Transactions on Pattern Analysis & Machine Intelligence , 14(03):367–383,
-
[23]
ISSN 0162-8828,1939-3539. doi: 10.1109/34.120331. M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Series B Stat. Methodol. , 73(2):123–214, Mar. 2011. ISSN 1369-7412,1467-9868. doi: 10.1111/j.1467-9868.2010.00765.x. P. Glasserman. Monte Carlo methods in financial engineering. 2001. doi: 10.100...
-
[24]
doi: 10.1016/j.jmaa.2017.10.076
ISSN 0022-247X,1096-0813. doi: 10.1016/j.jmaa.2017.10.076. M. Hefter and A. Jentzen. On arbitrarily slow convergence rates for strong numerical approximations of Cox-–Ingersoll—Ross processes and squared Bessel processes. Finance Stoch., 23(1):139–172, Jan. 2019. ISSN 0949-2984,1432-1122. doi: 10.1007/ s00780-018-0375-5. D. J. Higham, X. Mao, and A. M. St...
-
[25]
doi: 10.1016/j.csda.2017.06.007
ISSN 0167-9473,1872-7352. doi: 10.1016/j.csda.2017.06.007. Y.-P. Hsieh, A. Kavis, P. Rolland, and V. Cevher. Mirrored Langevin dynamics. Advances in Neural Information Processing Systems , 31, 2018. doi: 10.48550/ arXiv.1802.10174. URL https://proceedings.neurips.cc/paper files/paper/2018/file/ 6490791e7abf6b29a381288cc23a8223-Paper.pdf. M. Hutzenthaler, ...
-
[26]
doi: 10.48550/arXiv.2002.04363
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.