Hadamard Langevin dynamics for sampling the l1-prior

Clarice Poon; Federico Cornalba; Ivan Cheltsov; Tony Shardlow

arxiv: 2411.11403 · v4 · submitted 2024-11-18 · 🧮 math.NA · cs.NA

Hadamard Langevin dynamics for sampling the l1-prior

Ivan Cheltsov , Federico Cornalba , Clarice Poon , Tony Shardlow This is my paper

Pith reviewed 2026-05-23 17:26 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords Hadamard Langevin dynamicsl1 priorBayesian samplinggeometric ergodicitynonconvex potentialwell-posednessdiffusion processinverse problems

0 comments

The pith

Hadamard Langevin dynamics recover the exact l1 posterior via a smooth nonconvex potential and come with proofs of existence, ergodicity and discretization convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce a Hadamard product parameterization of the l1 norm that turns the nonsmooth prior into a smooth but nonconvex potential whose marginal distribution matches the target posterior exactly. From this parameterization they define Hadamard Langevin dynamics, a diffusion process distinct from proximal or mirror Langevin schemes. They prove existence and uniqueness of strong solutions, geometric ergodicity of the continuous process, and convergence of the Euler discretization to the continuous dynamics as the step size vanishes. A sympathetic reader would care because the results supply the first rigorous justification for applying overparameterized Langevin sampling to nonconvex nonsmooth Bayesian inverse problems without altering the target law.

Core claim

The Hadamard product parameterization produces a smooth but nonconvex and non-globally Lipschitz potential whose marginal law exactly recovers the desired posterior; the associated Hadamard Langevin dynamics therefore defines a diffusion process that is analytically distinct from proximal or mirror-type Langevin schemes, and the paper establishes existence and uniqueness of strong solutions, geometric ergodicity of the continuous dynamics, and convergence of the discretized scheme as the step size tends to zero.

What carries the argument

The Hadamard product parameterization of the l1-norm, which yields a smooth potential whose marginal exactly matches the target l1 posterior and generates the Hadamard Langevin dynamics.

If this is right

Sampling from posteriors with l1 priors becomes possible without proximal mappings or smooth approximations that change the target distribution.
The discretized scheme converges to the continuous dynamics, justifying reliable numerical implementations for small enough step sizes.
Geometric ergodicity guarantees exponential convergence of the continuous process to its invariant measure.
Existence and uniqueness of strong solutions ensure the diffusion is well-defined for all time.
Overparameterized Langevin dynamics now have a theoretical foundation for nonconvex nonsmooth posteriors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar Hadamard-style reparameterizations might be tested on other nonsmooth sparsity priors such as l0 or group lasso.
Numerical experiments comparing mixing times of HLD against proximal Langevin methods on concrete inverse problems would quantify practical gains.
The non-global Lipschitz property may require specialized step-size rules or adaptive schemes in high dimensions.
Links to mirror descent or other reparameterized optimization methods could be examined for shared convergence mechanisms.

Load-bearing premise

The Hadamard product parameterization of the l1-norm produces a smooth potential whose marginal distribution exactly recovers the desired posterior.

What would settle it

An explicit counter-example in which the stationary distribution of the Hadamard Langevin dynamics differs from the target l1 posterior would falsify the exact-recovery claim.

read the original abstract

Priors with non-smooth log-densities, such as the l1-prior, are widely used in Bayesian inverse problems for their sparsity-inducing properties. Existing Langevin-based sampling methods typically rely on proximal mappings or smooth approximations, which alter the target distribution. We propose an alternative approach based on a Hadamard product parameterization of the l1-norm, leading to a smooth but nonconvex and non-globally Lipschitz potential whose marginal law exactly recovers the desired posterior. The resulting Hadamard Langevin dynamics (HLD) defines a diffusion process that is analytically distinct from proximal or mirror-type Langevin schemes. Our main contribution is a rigorous well-posedness theory for both the continuous and discrete HLD. We establish existence and uniqueness of strong solutions, geometric ergodicity of the continuous dynamics, and convergence of the discretized scheme as the step size tends to zero. These results provide the first theoretical foundation for sampling from nonconvex, nonsmooth posteriors through overparameterized Langevin dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Hadamard parameterization fails to recover the exact l1 posterior, so the well-posedness theory applies to a different distribution than advertised.

read the letter

The Hadamard parameterization does not recover the exact l1 posterior. The joint potential on the auxiliary variables leads to a marginal density proportional to the modified Bessel function of the second kind K_0(λ|x|), which differs from exp(-λ|x|) both near the origin and in its overall form. This undercuts the paper's central motivation for sampling from sparsity-inducing priors in Bayesian inverse problems. The paper does introduce a new diffusion called Hadamard Langevin dynamics that is distinct from proximal and mirror schemes. It provides rigorous proofs for existence and uniqueness of strong solutions to the SDE, geometric ergodicity of the continuous dynamics, and convergence of the discretized scheme to the continuous one as the step size goes to zero. These results cover a nonconvex, non-globally Lipschitz potential arising from the overparameterization, which is a setting not directly covered by standard Langevin theory. The work appears self-contained and does not depend on unverified reductions or fitted parameters. The soft spot is the mismatch between the claimed target distribution and the actual marginal. The abstract states that the marginal law exactly recovers the desired posterior, but the change-of-variable calculation shows otherwise. This is not a minor technical detail; it affects whether the results apply to the l1-prior as advertised. The large argument asymptotics are similar, but that is not enough for exact sampling. The citation pattern is clean with no self-citation issues. The authors engage with the literature on Langevin methods for inverse problems in a straightforward way. This paper is mainly for researchers in numerical analysis and computational statistics who are exploring alternative parameterizations for sampling non-smooth posteriors. A reader focused on methods that truly target the l1 distribution would not get what is promised here. I would not bring this to the next reading group. I would not cite this work. It does not deserve peer review because the error in the target distribution is fundamental to the claimed contribution.

Referee Report

1 major / 0 minor

Summary. The paper introduces Hadamard Langevin dynamics (HLD) obtained by reparameterizing the l1-norm via the Hadamard product x = u ⊙ v, yielding a smooth (but nonconvex, non-globally Lipschitz) potential on the overparameterized variables whose marginal on x is asserted to recover the target l1-posterior exactly. The main results are existence and uniqueness of strong solutions to the continuous SDE, geometric ergodicity, and convergence of an Euler-type discretization as the step size tends to zero. These are presented as the first rigorous well-posedness theory for sampling nonconvex nonsmooth posteriors via overparameterized Langevin dynamics.

Significance. If the exact marginal-recovery claim were correct, the work would supply the first complete well-posedness theory (existence, uniqueness, ergodicity, discretization) for an overparameterized Langevin scheme targeting a nonsmooth sparsity-inducing prior, which would be a notable contribution to the analysis of non-standard Langevin methods in Bayesian inverse problems.

major comments (1)

[Abstract] Abstract (paragraph 2) and the central modeling claim: the statement that the Hadamard parameterization produces a potential “whose marginal law exactly recovers the desired posterior” is incorrect. With joint potential (λ/2)(‖u‖² + ‖v‖²) the marginal density on x is proportional to the modified Bessel function K_0(λ|x|), obtained from the integral ∫ du/|u| exp(−(λ/2)(u² + (x/u)²)), which is asymptotically similar to but distinct from the Laplace density exp(−λ|x|) both near the origin and in its precise functional form. Consequently the subsequent well-posedness, ergodicity, and discretization theorems do not apply to the intended l1-prior posterior.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the error in our central modeling claim regarding exact marginal recovery of the l1-posterior. We agree that the calculation is correct and that the marginal is proportional to K_0(λ|x|) rather than exp(−λ|x|). This requires substantial revisions to the abstract, introduction, and all statements about the target distribution. The well-posedness, ergodicity, and discretization results remain valid but now apply to the corrected target (a sparsity-inducing distribution with K_0 marginal). We will also add discussion of the relationship to the Laplace prior.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph 2) and the central modeling claim: the statement that the Hadamard parameterization produces a potential “whose marginal law exactly recovers the desired posterior” is incorrect. With joint potential (λ/2)(‖u‖² + ‖v‖²) the marginal density on x is proportional to the modified Bessel function K_0(λ|x|), obtained from the integral ∫ du/|u| exp(−(λ/2)(u² + (x/u)²)), which is asymptotically similar to but distinct from the Laplace density exp(−λ|x|) both near the origin and in its precise functional form. Consequently the subsequent well-posedness, ergodicity, and discretization theorems do not apply to the intended l1-prior posterior.

Authors: We fully agree with the referee's calculation and observation. The marginal density on x under the given joint potential is indeed proportional to K_0(λ|x|), not the Laplace density, due to the geometry of the preimage under the map (u,v) ↦ uv and the associated change-of-variables factor. Our original claim of exact recovery of the l1-posterior was incorrect. We will revise the manuscript (abstract, modeling section, and all related claims) to state that the Hadamard Langevin dynamics targets the distribution whose marginal involves the modified Bessel function K_0(λ|x|). This distribution remains sparsity-inducing (logarithmic singularity at zero and exponential tails) and asymptotically similar to the Laplace prior for large |x|. The existence/uniqueness, geometric ergodicity, and discretization-convergence theorems apply verbatim to the dynamics with this corrected target; we will update the narrative to reflect that the results provide rigorous theory for sampling from this overparameterized nonconvex potential rather than claiming exact l1 recovery. We apologize for the misstatement and will ensure the revision accurately describes the contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on standard SDE theory applied to new dynamics

full rationale

The paper's claims concern existence/uniqueness of strong solutions, geometric ergodicity, and discretization convergence for the Hadamard Langevin dynamics. These are standard results from SDE theory applied to a new (non-globally Lipschitz) potential; no steps reduce by construction to fitted inputs, self-definitions, or self-citation chains. The marginal-recovery claim is presented as a direct consequence of the parameterization (not derived from prior fitted quantities or renamed known results). The derivation chain is self-contained against external benchmarks and does not invoke load-bearing self-citations or uniqueness theorems from the same authors. This is the normal case of an independent theoretical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on one central domain assumption that the chosen parameterization exactly preserves the target marginal; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Hadamard product parameterization of the l1-norm yields a potential whose marginal law exactly recovers the desired posterior.
Explicitly stated in abstract as the property that makes the approach valid without altering the target distribution.

pith-pipeline@v0.9.0 · 5711 in / 1356 out tokens · 44932 ms · 2026-05-23T17:26:49.654696+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

by applying a change of variables of x = u ⊙ v and η = u ⊙ u/2 to (1.3), we obtain the following probability distribution π(u, v) = 1/Z_π ∏ ui exp(−β(½λ‖(u,v)‖^{2} + G(u⊙v)))
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2.1. If (u, v) ∼ π … then x = u ⊙ v ∼ ρ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sticky CIR process with potential: invariant measure and exact sampling
math.PR 2026-05 unverdicted novelty 7.0

Proves well-posedness and unique invariant measure for the sticky CIR process and constructs exact and approximate samplers using Green's functions and Girsanov change of measure.
Sticky CIR process with potential: invariant measure and exact sampling
math.PR 2026-05 accept novelty 7.0

The sticky CIR process on [0,∞) has a unique invariant measure that mixes a point mass at zero with a gamma-type density, and admits an exact sampler via an explicit Green's function in the zero-potential case.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Introduction. We develop a novel Langevin sampler for the distribution (1.1) ρ(x) = 1 Z exp −β λ∥x∥1 + G(x) , x ∈ Rd, with normalization constant Z, inverse temperature β > 0, and regularization pa- rameter λ > 0. In data science, G: Rd → R represents the data likelihood and the ℓ1 norm describes the prior. By choosing an appropriate Langevin dynamics wit...

work page internal anchor Pith review Pith/arXiv arXiv 1996
[2]

One limitation of using smoothing methods like the Moreau–Yosida envelope is that convergence to the original distribu- tion is only guaranteed in the limit γ → 0

The Hadamard–Langevin dynamics. One limitation of using smoothing methods like the Moreau–Yosida envelope is that convergence to the original distribu- tion is only guaranteed in the limit γ → 0. Additionally, certain statistical quantities, such as modes, may not be preserved under smoothing; for example, Eftekhari et al

work page
[3]

highlights alternative smoothing techniques that aim to preserve such features. On the other hand, if considering quadratic data terms, the Gibbs sampler works with the exact distribution but is computationally intensive and unsuitable for large-scale computations. In this work, we follow the idea of the hierarchical framework of the Gibbs sampler of Park...

work page 2008
[4]

This includes geometric convergence to the invariant measure (section 3)

Establish the well-posedness on X of the Langevin system associated with π given in (2.1). This includes geometric convergence to the invariant measure (section 3). We present two approaches: the first removes the singularity in the log(u) term via a Cartesian change of coordinates, to transform our drift into a locally bounded term. The second exploits c...

work page
[5]

We establish strong convergence of the method as well as discuss convergence of the stationary distribution of the numerical method

Develop a numerical scheme to approximate the Langevin system and show the convergence of the numerical scheme to π (section 4). We establish strong convergence of the method as well as discuss convergence of the stationary distribution of the numerical method

work page
[6]

The supplementary material includes appendices for further numerical experiments and more detailed computations

Demonstrate the effectiveness of our numerical scheme for sampling π with several test problems, ranging from one-dimensional estimation of the conver- gence rate to classical imaging problems such as wavelet inpainting (section 5 and Appendix D). The supplementary material includes appendices for further numerical experiments and more detailed computatio...

work page 2012
[7]

We work with G under the following assumptions

Continuous dynamics. We work with G under the following assumptions. Assumption 3.1 (data term G). Assume that (i) G is bounded below, that (ii) G: Rd → R is continuously differentiable, that (iii) x⊤∇G(x) is bounded below (so x⊤∇G(x) ≥ − K for some K), and that (iv) ∥∇G(x)∥∞ is bounded uniformly for x ∈ Rd (so B := supx ∥∇G(x)∥∞ < ∞). The data term G is ...

work page 2013
[8]

sup t>0 E " 1 ur i,t # < ∞ for 0 < r < 2 (here ui,t denotes the ith component of ut)

work page
[9]

For each T > 0 and ϵ > 0, we have E R T 0 1 u2−ϵ i,s ds q < ∞ for all q ≥ 1. Proof. Let ϕ(u, v) = u−r 1 for example. Since r < 2, we can choose conjugate exponents (p, q) such that q r < 2. Due to the boundedness assumption on ρ0/π and π being a probability measure, ρ0/π ∈ Lp(π). Further, ϕ ∈ Lq(π) as Z ϕq π = 1 Zπ Z 1 uq r−1 1 u2 · · · ud exp −β 1 2 λ ∥(...

work page
[10]

By Ito’s formula applied to ( u, v) 7→ uϵ 1 for ϵ > 0, 1 β ϵ2 Z T 0 uϵ−2 1,t dt = uϵ 1,T − uϵ 1,0 + λ ϵ Z T 0 uϵ 1,t dt + ϵ Z T 0 v1,t [∇G(ut ⊙ vt)]1 uϵ−1 1,t dt − r 2 β ϵ Z T 0 uϵ−1 1,t dW 1,1 t . (3.4) By Young’s inequality with conjugate exponents ((2 − ϵ)/(1 − ϵ), (2 − ϵ)), we have for any α > 0 that |v1 ∇G(u ⊙ v) uϵ−1 1 | ≤ 1 2 − ϵ α−(2−ϵ)|v1 [∇G(u ⊙...

work page 1991
[11]

E ∥(u, v)t − (u, v)s∥q < c |t − s|q/2

work page
[12]

, d Proof

E h |ui,t−ui,s| ui,s ui,t i ≤ c |t − s|1/2−ϵ, i = 1, . . . , d Proof. 1. We treat the u1 component only ( v is handled similarly). From (3.1), u1,t = u1,s + 1 β Z t s 1 u1,r dr + · · · + r 2 β Z t s dW 1 r , where we omit several integrals that are straightforward to estimate under the moment bound in Theorem 3.2. By Holder’s inequality with conjugate exp...

work page
[13]

Then, P(R) ≤ P(ui,t < ∆) + P(ui,s < ∆)

Let R = {(ui,t ∧ ui,s) < ∆}. Then, P(R) ≤ P(ui,t < ∆) + P(ui,s < ∆). By Chebyshev’s inequality, P(ui,t < ∆) ≤ ∆2−ϵ E uϵ−2 i,t for ϵ > 0. Hence, using Corollary 3.4, for a constant c, P(R) ≤ c ∆2−ϵ. Now, split the expectation with respect to R as (3.5) E |ui,t − ui,s| ui,s ui,t = E |ui,t − ui,s| ui,s ui,t 1R + E |ui,t − ui,s| ui,s ui,t (1 − 1R) . We estima...

work page 1951
[14]

Discretization. We consider the following time-stepping approximation: for time step ∆ t, we seek an approximation ( u, v)n to (u, v)n∆t via uk+1 − uk = −λ uk+1 ∆t − vk ⊙ ∇G(uk ⊙ vk) ∆t + 1 β uk+1 ∆t + r 2 β ∆W 1 k, vk+1 − vk = −λ vk+1 ∆t − uk ⊙ ∇G(uk ⊙ vk) ∆t + r 2 β ∆W 2 k, (4.1) where ∆W i k = W i tk+1 − W i tk. The update rule is implicit in the regul...

work page 2011
[15]

The code for reproducing our numerical exper- iments can be found online 1

Numerical experiments. The code for reproducing our numerical exper- iments can be found online 1. Throughout, we consider the setting where G(x) = ∥Ax − y∥2/2. The stepsize and Moreau-envelope regularization choice for Prox-l1 are taken to be γ = 1/(KL) for some K ≥ 1 and ∆t = γ/(5(γL + 1), where L = ∥A∥2 is the Lipschitz constant of the gradient of G. T...

work page 2018
[16]

In this work, we proposed a new approach for sampling with the Laplace prior a Hadamard parameterization

Conclusion. In this work, we proposed a new approach for sampling with the Laplace prior a Hadamard parameterization. Unlike Proximal Langevin, there is no smoothing involved, and the stationary distribution of our continuous Langevin dynamics corresponds exactly to the sought-after distribution. We carried out a theoretical analysis, showing well-posedne...

work page 2016
[17]

We have µ(z) = −M(z)∇H(z) + 1 2 1 x/η = −M(z)∇H(z) + divM(z)−1(M(z))

+ 1 2 −(4ηt + x2 t 4ηt ) ⊙ ∇G(xt) − 2λxt + xt 4ηt # | {z } µ(zt) dt+ √ 2 1 2 diag(ut) 0 diag(vt) diag( ut) | {z } S(zt) dW 1 t dW 2 t . We have µ(z) = −M(z)∇H(z) + 1 2 1 x/η = −M(z)∇H(z) + divM(z)−1(M(z)). Letting ρ(t, ·) denote the distribution of zt, we have ∂ ∂t ρ = −div µ⊤(z)ρ + X i X j ∂2 ∂zi ∂zj (Mi,j(z, t)ρ). HADAMARD–LANGEVIN DYNAMICS 25 Note that...

work page
[18]

Appendix C

− 1 2 + 3 2 (4η + x2 4η )∇G(x) + 2λx − 1 2 x 2η + 1 2 x η #! + div(M ∇ρ) = div η 1 2 x 1 2 x x2 4η + 4γ ! 2λ − λ x2 8η2 + 1 2η ∇G(x) + λ x 4η + ∇ log ρ ! ρ ! = div(M(z)∇(H(z) + log(ρ)ρ). Appendix C. Additional proofs for the discretized scheme. We first remark that our numerical scheme (4.1) can be written explicitly as uk+ 1 2 vk+ 1 2 = uk vk − ∆t vk ⊙ ∇...

work page 2018
[19]

doi: 10.1561/2200000015

ISSN 1935-8237,1935-8245. doi: 10.1561/2200000015. R. Bai, V. Roˇ ckov´ a, and E. I. George. Spike-and-slab meets lasso: A review of the spike-and-slab lasso. Handbook of Bayesian variable selection , pages 81–108, 2021. doi: https://doi.org/10.1201/9781003089018-4. L. R. Bellet. Ergodic Properties of Markov Processes. In S. Attal, A. Joye, and C.-A. Pill...

work page doi:10.1561/2200000015 1935
[20]

doi: 10.1109/ciss.2008.4558489

ISBN 9781424422463. doi: 10.1109/ciss.2008.4558489. S. Dereich, A. Neuenkirch, and L. Szpruch. An Euler-type method for the strong approximation of the Cox–Ingersoll–Ross process. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 468(2140):1105–1115, Dec

work page doi:10.1109/ciss.2008.4558489 2008
[21]

doi: 10.1098/rspa.2011.0505. A. Durmus, E. Moulines, and M. Pereyra. Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau. SIAM J. Imaging Sci. , 11(1):473–506, Jan. 2018. ISSN 1936-4954. doi: 10.1137/16m1108340. A. Eftekhari, L. Vargas, and K. C. Zygalakis. The forward–backward envelope for sampling with the ove...

work page doi:10.1098/rspa.2011.0505 2011
[22]

doi: 10.2307/1969318. D. Geman and G. Reynolds. Constrained restoration and the recovery of discontinuities. IEEE Transactions on Pattern Analysis & Machine Intelligence , 14(03):367–383,

work page doi:10.2307/1969318
[23]

doi: 10.1109/34.120331

ISSN 0162-8828,1939-3539. doi: 10.1109/34.120331. M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Series B Stat. Methodol. , 73(2):123–214, Mar. 2011. ISSN 1369-7412,1467-9868. doi: 10.1111/j.1467-9868.2010.00765.x. P. Glasserman. Monte Carlo methods in financial engineering. 2001. doi: 10.100...

work page doi:10.1109/34.120331 1939
[24]

doi: 10.1016/j.jmaa.2017.10.076

ISSN 0022-247X,1096-0813. doi: 10.1016/j.jmaa.2017.10.076. M. Hefter and A. Jentzen. On arbitrarily slow convergence rates for strong numerical approximations of Cox-–Ingersoll—Ross processes and squared Bessel processes. Finance Stoch., 23(1):139–172, Jan. 2019. ISSN 0949-2984,1432-1122. doi: 10.1007/ s00780-018-0375-5. D. J. Higham, X. Mao, and A. M. St...

work page doi:10.1016/j.jmaa.2017.10.076 2017
[25]

doi: 10.1016/j.csda.2017.06.007

ISSN 0167-9473,1872-7352. doi: 10.1016/j.csda.2017.06.007. Y.-P. Hsieh, A. Kavis, P. Rolland, and V. Cevher. Mirrored Langevin dynamics. Advances in Neural Information Processing Systems , 31, 2018. doi: 10.48550/ arXiv.1802.10174. URL https://proceedings.neurips.cc/paper files/paper/2018/file/ 6490791e7abf6b29a381288cc23a8223-Paper.pdf. M. Hutzenthaler, ...

work page doi:10.1016/j.csda.2017.06.007 2017
[26]

doi: 10.48550/arXiv.2002.04363

work page doi:10.48550/arxiv.2002.04363 2002

[1] [1]

Introduction. We develop a novel Langevin sampler for the distribution (1.1) ρ(x) = 1 Z exp −β λ∥x∥1 + G(x) , x ∈ Rd, with normalization constant Z, inverse temperature β > 0, and regularization pa- rameter λ > 0. In data science, G: Rd → R represents the data likelihood and the ℓ1 norm describes the prior. By choosing an appropriate Langevin dynamics wit...

work page internal anchor Pith review Pith/arXiv arXiv 1996

[2] [2]

One limitation of using smoothing methods like the Moreau–Yosida envelope is that convergence to the original distribu- tion is only guaranteed in the limit γ → 0

The Hadamard–Langevin dynamics. One limitation of using smoothing methods like the Moreau–Yosida envelope is that convergence to the original distribu- tion is only guaranteed in the limit γ → 0. Additionally, certain statistical quantities, such as modes, may not be preserved under smoothing; for example, Eftekhari et al

work page

[3] [3]

highlights alternative smoothing techniques that aim to preserve such features. On the other hand, if considering quadratic data terms, the Gibbs sampler works with the exact distribution but is computationally intensive and unsuitable for large-scale computations. In this work, we follow the idea of the hierarchical framework of the Gibbs sampler of Park...

work page 2008

[4] [4]

This includes geometric convergence to the invariant measure (section 3)

Establish the well-posedness on X of the Langevin system associated with π given in (2.1). This includes geometric convergence to the invariant measure (section 3). We present two approaches: the first removes the singularity in the log(u) term via a Cartesian change of coordinates, to transform our drift into a locally bounded term. The second exploits c...

work page

[5] [5]

We establish strong convergence of the method as well as discuss convergence of the stationary distribution of the numerical method

Develop a numerical scheme to approximate the Langevin system and show the convergence of the numerical scheme to π (section 4). We establish strong convergence of the method as well as discuss convergence of the stationary distribution of the numerical method

work page

[6] [6]

The supplementary material includes appendices for further numerical experiments and more detailed computations

Demonstrate the effectiveness of our numerical scheme for sampling π with several test problems, ranging from one-dimensional estimation of the conver- gence rate to classical imaging problems such as wavelet inpainting (section 5 and Appendix D). The supplementary material includes appendices for further numerical experiments and more detailed computatio...

work page 2012

[7] [7]

We work with G under the following assumptions

Continuous dynamics. We work with G under the following assumptions. Assumption 3.1 (data term G). Assume that (i) G is bounded below, that (ii) G: Rd → R is continuously differentiable, that (iii) x⊤∇G(x) is bounded below (so x⊤∇G(x) ≥ − K for some K), and that (iv) ∥∇G(x)∥∞ is bounded uniformly for x ∈ Rd (so B := supx ∥∇G(x)∥∞ < ∞). The data term G is ...

work page 2013

[8] [8]

sup t>0 E " 1 ur i,t # < ∞ for 0 < r < 2 (here ui,t denotes the ith component of ut)

work page

[9] [9]

For each T > 0 and ϵ > 0, we have E R T 0 1 u2−ϵ i,s ds q < ∞ for all q ≥ 1. Proof. Let ϕ(u, v) = u−r 1 for example. Since r < 2, we can choose conjugate exponents (p, q) such that q r < 2. Due to the boundedness assumption on ρ0/π and π being a probability measure, ρ0/π ∈ Lp(π). Further, ϕ ∈ Lq(π) as Z ϕq π = 1 Zπ Z 1 uq r−1 1 u2 · · · ud exp −β 1 2 λ ∥(...

work page

[10] [10]

By Ito’s formula applied to ( u, v) 7→ uϵ 1 for ϵ > 0, 1 β ϵ2 Z T 0 uϵ−2 1,t dt = uϵ 1,T − uϵ 1,0 + λ ϵ Z T 0 uϵ 1,t dt + ϵ Z T 0 v1,t [∇G(ut ⊙ vt)]1 uϵ−1 1,t dt − r 2 β ϵ Z T 0 uϵ−1 1,t dW 1,1 t . (3.4) By Young’s inequality with conjugate exponents ((2 − ϵ)/(1 − ϵ), (2 − ϵ)), we have for any α > 0 that |v1 ∇G(u ⊙ v) uϵ−1 1 | ≤ 1 2 − ϵ α−(2−ϵ)|v1 [∇G(u ⊙...

work page 1991

[11] [11]

E ∥(u, v)t − (u, v)s∥q < c |t − s|q/2

work page

[12] [12]

, d Proof

E h |ui,t−ui,s| ui,s ui,t i ≤ c |t − s|1/2−ϵ, i = 1, . . . , d Proof. 1. We treat the u1 component only ( v is handled similarly). From (3.1), u1,t = u1,s + 1 β Z t s 1 u1,r dr + · · · + r 2 β Z t s dW 1 r , where we omit several integrals that are straightforward to estimate under the moment bound in Theorem 3.2. By Holder’s inequality with conjugate exp...

work page

[13] [13]

Then, P(R) ≤ P(ui,t < ∆) + P(ui,s < ∆)

Let R = {(ui,t ∧ ui,s) < ∆}. Then, P(R) ≤ P(ui,t < ∆) + P(ui,s < ∆). By Chebyshev’s inequality, P(ui,t < ∆) ≤ ∆2−ϵ E uϵ−2 i,t for ϵ > 0. Hence, using Corollary 3.4, for a constant c, P(R) ≤ c ∆2−ϵ. Now, split the expectation with respect to R as (3.5) E |ui,t − ui,s| ui,s ui,t = E |ui,t − ui,s| ui,s ui,t 1R + E |ui,t − ui,s| ui,s ui,t (1 − 1R) . We estima...

work page 1951

[14] [14]

Discretization. We consider the following time-stepping approximation: for time step ∆ t, we seek an approximation ( u, v)n to (u, v)n∆t via uk+1 − uk = −λ uk+1 ∆t − vk ⊙ ∇G(uk ⊙ vk) ∆t + 1 β uk+1 ∆t + r 2 β ∆W 1 k, vk+1 − vk = −λ vk+1 ∆t − uk ⊙ ∇G(uk ⊙ vk) ∆t + r 2 β ∆W 2 k, (4.1) where ∆W i k = W i tk+1 − W i tk. The update rule is implicit in the regul...

work page 2011

[15] [15]

The code for reproducing our numerical exper- iments can be found online 1

Numerical experiments. The code for reproducing our numerical exper- iments can be found online 1. Throughout, we consider the setting where G(x) = ∥Ax − y∥2/2. The stepsize and Moreau-envelope regularization choice for Prox-l1 are taken to be γ = 1/(KL) for some K ≥ 1 and ∆t = γ/(5(γL + 1), where L = ∥A∥2 is the Lipschitz constant of the gradient of G. T...

work page 2018

[16] [16]

In this work, we proposed a new approach for sampling with the Laplace prior a Hadamard parameterization

Conclusion. In this work, we proposed a new approach for sampling with the Laplace prior a Hadamard parameterization. Unlike Proximal Langevin, there is no smoothing involved, and the stationary distribution of our continuous Langevin dynamics corresponds exactly to the sought-after distribution. We carried out a theoretical analysis, showing well-posedne...

work page 2016

[17] [17]

We have µ(z) = −M(z)∇H(z) + 1 2 1 x/η = −M(z)∇H(z) + divM(z)−1(M(z))

+ 1 2 −(4ηt + x2 t 4ηt ) ⊙ ∇G(xt) − 2λxt + xt 4ηt # | {z } µ(zt) dt+ √ 2 1 2 diag(ut) 0 diag(vt) diag( ut) | {z } S(zt) dW 1 t dW 2 t . We have µ(z) = −M(z)∇H(z) + 1 2 1 x/η = −M(z)∇H(z) + divM(z)−1(M(z)). Letting ρ(t, ·) denote the distribution of zt, we have ∂ ∂t ρ = −div µ⊤(z)ρ + X i X j ∂2 ∂zi ∂zj (Mi,j(z, t)ρ). HADAMARD–LANGEVIN DYNAMICS 25 Note that...

work page

[18] [18]

Appendix C

− 1 2 + 3 2 (4η + x2 4η )∇G(x) + 2λx − 1 2 x 2η + 1 2 x η #! + div(M ∇ρ) = div η 1 2 x 1 2 x x2 4η + 4γ ! 2λ − λ x2 8η2 + 1 2η ∇G(x) + λ x 4η + ∇ log ρ ! ρ ! = div(M(z)∇(H(z) + log(ρ)ρ). Appendix C. Additional proofs for the discretized scheme. We first remark that our numerical scheme (4.1) can be written explicitly as uk+ 1 2 vk+ 1 2 = uk vk − ∆t vk ⊙ ∇...

work page 2018

[19] [19]

doi: 10.1561/2200000015

ISSN 1935-8237,1935-8245. doi: 10.1561/2200000015. R. Bai, V. Roˇ ckov´ a, and E. I. George. Spike-and-slab meets lasso: A review of the spike-and-slab lasso. Handbook of Bayesian variable selection , pages 81–108, 2021. doi: https://doi.org/10.1201/9781003089018-4. L. R. Bellet. Ergodic Properties of Markov Processes. In S. Attal, A. Joye, and C.-A. Pill...

work page doi:10.1561/2200000015 1935

[20] [20]

doi: 10.1109/ciss.2008.4558489

ISBN 9781424422463. doi: 10.1109/ciss.2008.4558489. S. Dereich, A. Neuenkirch, and L. Szpruch. An Euler-type method for the strong approximation of the Cox–Ingersoll–Ross process. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , 468(2140):1105–1115, Dec

work page doi:10.1109/ciss.2008.4558489 2008

[21] [21]

doi: 10.1098/rspa.2011.0505. A. Durmus, E. Moulines, and M. Pereyra. Efficient Bayesian computation by proximal Markov chain Monte Carlo: When Langevin meets Moreau. SIAM J. Imaging Sci. , 11(1):473–506, Jan. 2018. ISSN 1936-4954. doi: 10.1137/16m1108340. A. Eftekhari, L. Vargas, and K. C. Zygalakis. The forward–backward envelope for sampling with the ove...

work page doi:10.1098/rspa.2011.0505 2011

[22] [22]

doi: 10.2307/1969318. D. Geman and G. Reynolds. Constrained restoration and the recovery of discontinuities. IEEE Transactions on Pattern Analysis & Machine Intelligence , 14(03):367–383,

work page doi:10.2307/1969318

[23] [23]

doi: 10.1109/34.120331

ISSN 0162-8828,1939-3539. doi: 10.1109/34.120331. M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Series B Stat. Methodol. , 73(2):123–214, Mar. 2011. ISSN 1369-7412,1467-9868. doi: 10.1111/j.1467-9868.2010.00765.x. P. Glasserman. Monte Carlo methods in financial engineering. 2001. doi: 10.100...

work page doi:10.1109/34.120331 1939

[24] [24]

doi: 10.1016/j.jmaa.2017.10.076

ISSN 0022-247X,1096-0813. doi: 10.1016/j.jmaa.2017.10.076. M. Hefter and A. Jentzen. On arbitrarily slow convergence rates for strong numerical approximations of Cox-–Ingersoll—Ross processes and squared Bessel processes. Finance Stoch., 23(1):139–172, Jan. 2019. ISSN 0949-2984,1432-1122. doi: 10.1007/ s00780-018-0375-5. D. J. Higham, X. Mao, and A. M. St...

work page doi:10.1016/j.jmaa.2017.10.076 2017

[25] [25]

doi: 10.1016/j.csda.2017.06.007

ISSN 0167-9473,1872-7352. doi: 10.1016/j.csda.2017.06.007. Y.-P. Hsieh, A. Kavis, P. Rolland, and V. Cevher. Mirrored Langevin dynamics. Advances in Neural Information Processing Systems , 31, 2018. doi: 10.48550/ arXiv.1802.10174. URL https://proceedings.neurips.cc/paper files/paper/2018/file/ 6490791e7abf6b29a381288cc23a8223-Paper.pdf. M. Hutzenthaler, ...

work page doi:10.1016/j.csda.2017.06.007 2017

[26] [26]

doi: 10.48550/arXiv.2002.04363

work page doi:10.48550/arxiv.2002.04363 2002