Parallel computations for Metropolis Markov chains with Picard maps

Giacomo Zanella; Sebastiano Grazzi

arxiv: 2506.09762 · v2 · submitted 2025-06-11 · 📊 stat.CO · stat.ME

Parallel computations for Metropolis Markov chains with Picard maps

Sebastiano Grazzi , Giacomo Zanella This is my paper

Pith reviewed 2026-05-19 10:07 UTC · model grok-4.3

classification 📊 stat.CO stat.ME

keywords parallel MCMCPicard mapsRandom Walk Metropolislog-concave distributionsgradient-free samplinghigh-dimensional samplingMarkov chain Monte Carlo

0 comments

The pith

Parallel Picard iteration speeds up Random Walk Metropolis sampling for log-concave targets by a factor of sqrt(d) using O(sqrt(d)) processors

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops parallel algorithms for running zeroth-order Metropolis Markov chains by using Picard maps to coordinate updates across processors. For Random Walk Metropolis chains targeting log-concave distributions on R^d, the method produces samples close to the target after O(sqrt(d)) parallel iterations with O(sqrt(d)) processors. This yields a sqrt(d) reduction in the number of sequential steps compared with the standard implementation. The same framework also yields a variant that draws from an approximate distribution in O(1) parallel iterations using O(d) processors. The algorithms require only pointwise evaluations of the log-density and are demonstrated on regression, epidemic, and precision-medicine examples.

Core claim

For Random Walk Metropolis Markov chains targeting log-concave distributions π on R^d, the Picard-map parallelization generates samples close to π in O(sqrt(d)) parallel iterations with O(sqrt(d)) processors, thereby accelerating the sequential chain by a factor sqrt(d). A modification produces samples from an approximate measure π_r in O(1) parallel iterations with O(d) processors.

What carries the argument

The Picard map, an iterative scheme that parallelizes the sequential updates of the Random Walk Metropolis chain by distributing the computation of acceptance probabilities and proposals across multiple processors while preserving the correct stationary distribution.

If this is right

Wall-clock time to equilibrium is reduced by a factor of sqrt(d) relative to the sequential Random Walk Metropolis algorithm when O(sqrt(d)) processors are available.
The method works in settings where the gradient of log π is unavailable, relying solely on pointwise density evaluations.
A cheap approximate sampler from a related measure π_r becomes available in constant parallel time using linear number of processors.
The approach is directly applicable to high-dimensional regression, epidemic models, and precision-medicine tasks that lack closed-form gradients.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same parallelization idea could be tested on other zeroth-order MCMC kernels such as Metropolis-adjusted Langevin or Hamiltonian Monte Carlo without gradients.
For dimensions where O(d) processors are feasible, the O(1)-iteration approximate sampler might serve as a fast initializer for more accurate chains.
Combining the Picard parallelization with existing variance-reduction or preconditioning techniques might produce further practical speed-ups in real applications.

Load-bearing premise

The target distribution is log-concave on R^d, which is used both to guarantee convergence of the underlying Random Walk Metropolis chain and to bound the error introduced by the parallel Picard iteration.

What would settle it

Run the parallel algorithm on a high-dimensional isotropic Gaussian target with known mixing behavior and check whether the total-variation distance to stationarity drops below a fixed threshold after roughly sqrt(d) parallel iterations.

Figures

Figures reproduced from arXiv: 2506.09762 by Giacomo Zanella, Sebastiano Grazzi.

**Figure 1.** Figure 1: Traces on the (x1, x2)-plane of the Picard recursion X(j) for K = 1000 and j = 1, 2, 10, 13 (top left - bottom right). The time-index of each sequence is shown with a yellow-red gradient color. The underlying Markov chain is a d = 100 dimensional Random Walk Metropolis with step-size ξ = 2/ √ d targeting a standard Gaussian distribution. The gray line is the fixed point (i.e. the output of the sequential a… view at source ↗

**Figure 2.** Figure 2: Illustration of the Picard recursion applied sequentially to every block of [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of Online Picard algorithm ( [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Same as Figure 3 for the logistic regression model [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: L (1) (y-axis) for X0 ∼ Unif(x ∈ R d : ∥x − x ⋆∥ = s), s ranging from 0 to 2000 (x-axis) and target given by logistic regression E2 with d = K = 200. person units of infectious pressure exerted during the course of the epidemic, see [Neal and Roberts, 2005, Section 5] for details of the likelihood and posterior measure in the set-up considered here. We set two independent Gamma priors for γ and β with prio… view at source ↗

**Figure 6.** Figure 6: (A(Xi , x◦ ), B(Xi , x◦ )) (left panel) and independent samples of (γ, β) | (X10 i , x◦ ) (right panel), from i = 0 (yellow) to i = 105 (blue) and Xi being the ith step of the RWM. The regions R ≥ 1 is displayed in sky-blue. The true values from which we simulated the data are shown with red points. started at the true values x [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

We develop parallel algorithms for simulating zeroth-order (aka gradient-free) Metropolis Markov chains based on the Picard map. For Random Walk Metropolis Markov chains targeting log-concave distributions $\pi$ on $\mathbb{R}^d$, our algorithm generates samples close to $\pi$ in $\mathcal{O}(\sqrt{d})$ parallel iterations with $\mathcal{O}(\sqrt{d})$ processors, therefore speeding up the convergence of the corresponding sequential implementation by a factor $\sqrt{d}$. Furthermore, a modification of our algorithm generates samples from an approximate measure $ \pi_r$ in $\mathcal{O}(1)$ parallel iterations and $\mathcal{O}(d)$ processors. We empirically assess the performance of the proposed algorithms in high-dimensional regression problems, an epidemic model where the gradient is unavailable and a real-word application in precision medicine. Our algorithms are straightforward to implement and may constitute a useful tool for practitioners seeking to sample from a prescribed distribution $\pi$ using only point-wise evaluations of $\log\pi$ and parallel computing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper develops parallel algorithms for zeroth-order Metropolis Markov chains based on the Picard map. For Random Walk Metropolis chains targeting log-concave distributions π on R^d, the algorithm generates samples close to π in O(√d) parallel iterations with O(√d) processors, yielding a √d speedup over the sequential implementation. A modification generates samples from an approximate measure π_r in O(1) parallel iterations with O(d) processors. The methods are empirically assessed on high-dimensional regression problems, an epidemic model without gradient access, and a precision medicine application.

Significance. If the theoretical claims hold, the work provides a useful tool for parallelizing gradient-free MCMC on log-concave targets, with potential practical impact in settings where only pointwise log-density evaluations are available. The combination of complexity bounds and empirical tests in applied problems is a strength; credit is given for the novel use of Picard maps to achieve the stated parallel speedups.

major comments (2)

[§4] §4 (Parallel Picard iteration): The O(√d) parallel iteration bound is derived from the sequential RWM mixing time of O(d) under log-concavity; however, the contraction factor of the Picard map must be shown to be independent of d (or explicitly bounded) in the chosen distance, otherwise the claimed complexity may contain hidden dimension-dependent terms that undermine the √d speedup.
[§6.2] §6.2 (Epidemic model experiments): The reported wall-clock times and effective sample sizes for the parallel algorithm are compared only to a naive sequential baseline; without reporting communication costs or processor utilization, it is unclear whether the observed speedup matches the theoretical √d factor or is limited by implementation overhead.

minor comments (3)

[Abstract] The definition of the approximate measure π_r and the role of the parameter r should be introduced with a forward reference in the abstract or introduction for reader clarity.
[§6.1] In the high-dimensional regression experiments, include the specific values of d used and the number of independent runs to allow assessment of variability in the reported convergence diagnostics.
[§3] Notation for the Picard map operator should be consistently defined before its first use in the algorithm description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript, positive assessment of its significance, and recommendation for minor revision. We address each major comment below and have revised the manuscript accordingly to improve clarity and completeness.

read point-by-point responses

Referee: [§4] §4 (Parallel Picard iteration): The O(√d) parallel iteration bound is derived from the sequential RWM mixing time of O(d) under log-concavity; however, the contraction factor of the Picard map must be shown to be independent of d (or explicitly bounded) in the chosen distance, otherwise the claimed complexity may contain hidden dimension-dependent terms that undermine the √d speedup.

Authors: We thank the referee for this precise observation on the analysis. The contraction factor of the Picard map is independent of dimension d. This follows directly from the log-concavity of the target and the fixed step-size scaling used in the zeroth-order RWM proposal, which yields a uniform contraction rate in the Wasserstein-1 distance employed in the proof of Theorem 4.1. To address the concern explicitly, we have inserted a new remark immediately after the statement of the parallel complexity bound that recalls the relevant contraction lemma from the appendix and confirms the absence of hidden d-dependent factors in the contraction constant. revision: yes
Referee: [§6.2] §6.2 (Epidemic model experiments): The reported wall-clock times and effective sample sizes for the parallel algorithm are compared only to a naive sequential baseline; without reporting communication costs or processor utilization, it is unclear whether the observed speedup matches the theoretical √d factor or is limited by implementation overhead.

Authors: We agree that additional implementation details would strengthen the empirical evaluation. In the revised Section 6.2 we now report processor utilization (near 100 % for the parallel iterations) and note that communication consists only of a single all-reduce of O(√d) scalar values per parallel step. For the epidemic model dimensions considered, this overhead is negligible relative to the log-density evaluations, and the measured wall-clock speedups remain consistent with the theoretical √d factor. We have also added a brief paragraph comparing the observed scaling to the predicted complexity. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via standard theory

full rationale

The claimed O(√d) parallel speedup for RWM on log-concave targets follows from applying the Picard map to accelerate the standard sequential mixing analysis under the log-concavity assumption, which is an external property used to bound both chain convergence and iteration contraction. No equations reduce a prediction to a fitted input by construction, no self-citations form the load-bearing justification, and the argument invokes no uniqueness theorems or ansatzes imported from the authors' prior work. The development remains independent of the target result and aligns with external MCMC benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests primarily on the domain assumption of log-concavity for theoretical guarantees and introduces the Picard map as the core algorithmic device; no free parameters or new physical entities are evident from the abstract.

axioms (1)

domain assumption Target distributions are log-concave on R^d
Invoked to ensure good mixing of the Random Walk Metropolis chain and to support the convergence analysis of the parallel Picard iteration.

invented entities (1)

Picard map for parallel MCMC no independent evidence
purpose: To enable parallel generation of chain iterates while preserving closeness to the target distribution
The map is the central technical device that allows the parallel speedup; no independent empirical evidence for its properties is provided in the abstract.

pith-pipeline@v0.9.0 · 5693 in / 1308 out tokens · 46442 ms · 2026-05-19T10:07:02.691280+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Corollary 2 … outputs a random variable X with ∥L(X)−π∥TV≤ϵ after J=O(L/m √d polylog(ϵ−1)) parallel iterations … K=⌊√(d/(8c0))⌋
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 2 (Smooth and Hessian Lipschitz target) … −L Id ≼ ∇²V(x) ≼ L Id … Tr(M)≤γ L^{3/2} d^{1/2}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Anari, S

N. Anari, S. Chewi, and T.-D. Vuong. Fast parallel sampling under isoperimetry. arXiv preprint arXiv:2401.09016,

work page arXiv
[2]

URL http://www.jstor.org/stable/30243645

ISSN 00905364, 21688966. URL http://www.jstor.org/stable/30243645. C. Andrieu, A. Lee, S. Power, and A. Q. Wang. Explicit convergence bounds for metropolis markov chains: isoperimetry, spectral gaps and profiles. The Annals of Applied Probability , 34(4):4022–4071,

work page arXiv
[3]

Ascolani, H

F. Ascolani, H. Lavenant, and G. Zanella. Entropy contraction of the gibbs sampler under log-concavity. arXiv preprint arXiv:2410.00858 ,

work page arXiv
[4]

URL https://proceedings.neurips.cc/paper_files/paper/2024/ file/f162fa05675e3db4a733aafc081653cf-Paper-Conference.pdf. Y. Chen and K. Gatmiry. When does metropolized hamiltonian monte carlo provably outper- form metropolis-adjusted langevin algorithm? arXiv preprint arXiv:2304.04724 ,

work page arXiv 2024
[5]

A. S. Dalalyan and L. Riou-Durand. On sampling from a log-concave density using ki- netic Langevin diffusions. Bernoulli, 26(3):1956–1988,

work page 1956
[6]

doi: 10.3150/19-BEJ1178

ISSN 1350-7265. doi: 10.3150/19-BEJ1178. URL https://doi.org/10.3150/19-BEJ1178. R. Dwivedi, Y. Chen, M. J. Wainwright, and B. Yu. Log-concave sampling: Metropolis-Hastings algorithms are fast. J. Mach. Learn. Res. , 20:Paper No. 183, 42,

work page doi:10.3150/19-bej1178
[7]

doi: 10.1063/1.881812. A. Gelman, G. O. Roberts, W. R. Gilks, et al. Efficient metropolis jumping rules. Bayesian statistics, 5(599-608):42,

work page doi:10.1063/1.881812
[8]

doi: 10.1109/TPAMI.1984.4767596. S. Gupta, L. Cai, and S. Chen. Faster diffusion-based sampling with randomized midpoints: Sequential and parallel. arXiv preprint arXiv:2406.00924 ,

work page doi:10.1109/tpami.1984.4767596 1984
[9]

URL http://www.jstor.org/stable/2669532

ISSN 01621459, 1537274X. URL http://www.jstor.org/stable/2669532. P. Marjoram, J. Molitor, V. Plagnol, and S. Tavar´ e. Markov chain monte carlo without likeli- hoods. Proceedings of the National Academy of Sciences , 100(26):15324–15328,

work page arXiv
[10]

Pozza and G

18 F. Pozza and G. Zanella. On the fundamental limitations of multiproposal markov chain monte carlo algorithms. arXiv preprint arXiv:2410.23174 ,

work page arXiv
[11]

URL https://doi.org/ 10.1214/154957804100000024

doi: 10.1214/154957804100000024. URL https://doi.org/ 10.1214/154957804100000024. G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996a. ISSN 1350-7265. doi: 10.2307/ 3318418. URL https://doi.org/10.2307/3318418. G. O. Roberts and R. L. Tweedie. Exponential conv...

work page doi:10.1214/154957804100000024
[12]

Yu and A

L. Yu and A. Dalalyana. Parallelized midpoint randomization for langevin monte carlo. arXiv preprint arXiv:2402.14434,

work page arXiv
[13]

Zhou and M

H. Zhou and M. Sugiyama. Parallel simulation for sampling under isoperimetry and score-based diffusion models. arXiv preprint arXiv:2412.07435 ,

work page arXiv
[14]

Then E[Y T W Y ] ≤ 15E[∥Y ∥2] + i25 exp(−3d/2)

Let W ∼ Wishart(Ii, d) be a Wishart matrix on Ri×i with d degrees of freedom and identity scale matrix, and Y be a random vector on Ri defined on the same probability space, with ∥Y ∥2 ≤ i almost surely. Then E[Y T W Y ] ≤ 15E[∥Y ∥2] + i25 exp(−3d/2) . Proof. By Wainwright [2019, Theorem 6.1], for all t ≥ 0, i ≤ d, P λmax(W ) > (1 + p i/d + t)2 ≤ e−t2d/2 ...

work page 2019
[15]

[2024, Rmk.50]

to deduce p χ2(L(Xnmix), π) ≤ ϵ with nmix = O( L m d), where nmix depends on ϵ, d, L/m and is explicitly provided in Andrieu et al. [2024, Rmk.50]. Consider running the Online Picard algorithm with K = p d/(8c0) for J = ⌈log(d) max{4nmix/K, 2 log(2/ϵ)} parallel iterations and returning Y = X(J) min{nmix,L(J)} as output. When nmix ≤ L(J), we have Y = X(J) ...

work page 2024
[16]

+ 11 exp(−d/10)) . Proof. By Laurent and Massart [2000, Lemma 1], for every t ≥ 2, P(X > t ) ≤ e−td/20. Then, Combining this result and Lemma 4, we have that E[1(X ≥ 2)X] = Z ∞ 2 P(X ≥ u) du + 2P(X >

work page 2000

[1] [1]

Anari, S

N. Anari, S. Chewi, and T.-D. Vuong. Fast parallel sampling under isoperimetry. arXiv preprint arXiv:2401.09016,

work page arXiv

[2] [2]

URL http://www.jstor.org/stable/30243645

ISSN 00905364, 21688966. URL http://www.jstor.org/stable/30243645. C. Andrieu, A. Lee, S. Power, and A. Q. Wang. Explicit convergence bounds for metropolis markov chains: isoperimetry, spectral gaps and profiles. The Annals of Applied Probability , 34(4):4022–4071,

work page arXiv

[3] [3]

Ascolani, H

F. Ascolani, H. Lavenant, and G. Zanella. Entropy contraction of the gibbs sampler under log-concavity. arXiv preprint arXiv:2410.00858 ,

work page arXiv

[4] [4]

URL https://proceedings.neurips.cc/paper_files/paper/2024/ file/f162fa05675e3db4a733aafc081653cf-Paper-Conference.pdf. Y. Chen and K. Gatmiry. When does metropolized hamiltonian monte carlo provably outper- form metropolis-adjusted langevin algorithm? arXiv preprint arXiv:2304.04724 ,

work page arXiv 2024

[5] [5]

A. S. Dalalyan and L. Riou-Durand. On sampling from a log-concave density using ki- netic Langevin diffusions. Bernoulli, 26(3):1956–1988,

work page 1956

[6] [6]

doi: 10.3150/19-BEJ1178

ISSN 1350-7265. doi: 10.3150/19-BEJ1178. URL https://doi.org/10.3150/19-BEJ1178. R. Dwivedi, Y. Chen, M. J. Wainwright, and B. Yu. Log-concave sampling: Metropolis-Hastings algorithms are fast. J. Mach. Learn. Res. , 20:Paper No. 183, 42,

work page doi:10.3150/19-bej1178

[7] [7]

doi: 10.1063/1.881812. A. Gelman, G. O. Roberts, W. R. Gilks, et al. Efficient metropolis jumping rules. Bayesian statistics, 5(599-608):42,

work page doi:10.1063/1.881812

[8] [8]

doi: 10.1109/TPAMI.1984.4767596. S. Gupta, L. Cai, and S. Chen. Faster diffusion-based sampling with randomized midpoints: Sequential and parallel. arXiv preprint arXiv:2406.00924 ,

work page doi:10.1109/tpami.1984.4767596 1984

[9] [9]

URL http://www.jstor.org/stable/2669532

ISSN 01621459, 1537274X. URL http://www.jstor.org/stable/2669532. P. Marjoram, J. Molitor, V. Plagnol, and S. Tavar´ e. Markov chain monte carlo without likeli- hoods. Proceedings of the National Academy of Sciences , 100(26):15324–15328,

work page arXiv

[10] [10]

Pozza and G

18 F. Pozza and G. Zanella. On the fundamental limitations of multiproposal markov chain monte carlo algorithms. arXiv preprint arXiv:2410.23174 ,

work page arXiv

[11] [11]

URL https://doi.org/ 10.1214/154957804100000024

doi: 10.1214/154957804100000024. URL https://doi.org/ 10.1214/154957804100000024. G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996a. ISSN 1350-7265. doi: 10.2307/ 3318418. URL https://doi.org/10.2307/3318418. G. O. Roberts and R. L. Tweedie. Exponential conv...

work page doi:10.1214/154957804100000024

[12] [12]

Yu and A

L. Yu and A. Dalalyana. Parallelized midpoint randomization for langevin monte carlo. arXiv preprint arXiv:2402.14434,

work page arXiv

[13] [13]

Zhou and M

H. Zhou and M. Sugiyama. Parallel simulation for sampling under isoperimetry and score-based diffusion models. arXiv preprint arXiv:2412.07435 ,

work page arXiv

[14] [14]

Then E[Y T W Y ] ≤ 15E[∥Y ∥2] + i25 exp(−3d/2)

Let W ∼ Wishart(Ii, d) be a Wishart matrix on Ri×i with d degrees of freedom and identity scale matrix, and Y be a random vector on Ri defined on the same probability space, with ∥Y ∥2 ≤ i almost surely. Then E[Y T W Y ] ≤ 15E[∥Y ∥2] + i25 exp(−3d/2) . Proof. By Wainwright [2019, Theorem 6.1], for all t ≥ 0, i ≤ d, P λmax(W ) > (1 + p i/d + t)2 ≤ e−t2d/2 ...

work page 2019

[15] [15]

[2024, Rmk.50]

to deduce p χ2(L(Xnmix), π) ≤ ϵ with nmix = O( L m d), where nmix depends on ϵ, d, L/m and is explicitly provided in Andrieu et al. [2024, Rmk.50]. Consider running the Online Picard algorithm with K = p d/(8c0) for J = ⌈log(d) max{4nmix/K, 2 log(2/ϵ)} parallel iterations and returning Y = X(J) min{nmix,L(J)} as output. When nmix ≤ L(J), we have Y = X(J) ...

work page 2024

[16] [16]

+ 11 exp(−d/10)) . Proof. By Laurent and Massart [2000, Lemma 1], for every t ≥ 2, P(X > t ) ≤ e−td/20. Then, Combining this result and Lemma 4, we have that E[1(X ≥ 2)X] = Z ∞ 2 P(X ≥ u) du + 2P(X >

work page 2000