Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators

George Biros; Youguang Chen

arxiv: 2602.21426 · v2 · pith:RONGUT73new · submitted 2026-02-24 · 💻 cs.LG · stat.CO

Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators

Youguang Chen , George Biros This is my paper

Pith reviewed 2026-05-21 11:32 UTC · model grok-4.3

classification 💻 cs.LG stat.CO

keywords independent Metropolis-Hastingsproximal optimizationBayesian inverse problemsposterior samplingapproximate operatorsMarkov chain Monte Carloproposal correction

0 comments

The pith

Proximal correction of samples from a biased approximate posterior tightens the proposal and raises acceptance rates in independent Metropolis-Hastings sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Proximal-IMH to draw samples from exact posteriors in Bayesian inverse problems by starting with draws from a cheaper but biased approximate posterior. Each draw is then adjusted through an auxiliary proximal optimization that trades off fidelity to the exact model against stability near the approximate reference. In idealized settings the authors prove that this local correction brings the proposal distribution closer to the target, which increases acceptance probabilities and speeds mixing. The scheme applies to both linear and nonlinear operators and is demonstrated on multimodal and data-driven prior examples where exact sampling is expensive.

Core claim

Proximal-IMH removes bias from samples of an approximate posterior by solving an auxiliary optimization problem. This yields a local adjustment that trades off adherence to the exact model against stability around the approximate reference point. For idealized settings, the proximal correction tightens the match between approximate and exact posteriors, thereby improving acceptance rates and mixing. The method applies to both linear and nonlinear input-output operators and is particularly suitable for inverse problems where exact posterior sampling is too expensive.

What carries the argument

The proximal correction, an auxiliary optimization problem that performs a local adjustment on each sample drawn from the approximate posterior to improve alignment with the exact posterior.

Load-bearing premise

The method relies on the existence of an approximate posterior distribution that is cheaper to sample from but may have significant bias, and that the auxiliary proximal optimization problem can be solved reliably for the given operators.

What would settle it

Run Proximal-IMH on a simple linear inverse problem with known exact posterior and compare acceptance rates and effective sample size against standard independent Metropolis-Hastings without the proximal step; no improvement would contradict the claimed tightening effect.

read the original abstract

We consider the problem of sampling from a posterior distribution arising in Bayesian inverse problems in science, engineering, and imaging. Our method belongs to the family of independence Metropolis-Hastings (IMH) sampling algorithms, which are common in Bayesian inference. Relying on the existence of an approximate posterior distribution that is cheaper to sample from but may have significant bias, we introduce Proximal-IMH, a scheme that removes this bias by correcting samples from the approximate posterior through an auxiliary optimization problem. This yields a local adjustment that trades off adherence to the exact model against stability around the approximate reference point. For idealized settings, we prove that the proximal correction tightens the match between approximate and exact posteriors, thereby improving acceptance rates and mixing. The method applies to both linear and nonlinear input-output operators and is particularly suitable for inverse problems where exact posterior sampling is too expensive. We present numerical experiments including multimodal and data-driven priors with nonlinear input-output operators. The results show that Proximal-IMH reliably outperforms existing IMH variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Proximal-IMH adds a proximal correction step to IMH proposals drawn from a biased approximate posterior, with an idealized proof and some numerical gains on nonlinear inverse problems.

read the letter

The main point is that this paper takes samples from a cheap but biased approximate posterior and runs them through a proximal optimization problem to produce better proposals for independent Metropolis-Hastings. The correction trades off fidelity to the exact forward model against staying close to the approximate reference point, and they prove that in idealized linear settings this tightens the effective proposal distribution and raises acceptance rates and mixing speed. They also run experiments on multimodal and data-driven priors with nonlinear operators and report that the method beats standard IMH variants in those cases. That combination of a targeted fix plus some theory and tests is the concrete contribution here. The proximal step itself does not appear in the earlier IMH papers they cite, so the technical move is new within this family of samplers. The work is aimed squarely at Bayesian inverse problems in imaging, engineering, and science where the forward operator is expensive and an approximate posterior is already available. A reader who already uses IMH and needs to reduce bias without switching to a full MCMC overhaul could try this out directly. The soft spot is exactly the one flagged in the stress test: the proof assumes the proximal subproblem is solved exactly, but for nonlinear operators that subproblem is non-convex and any practical solver will stop at some tolerance. That gap means the acceptance-rate guarantee may degrade in real runs, and the paper does not appear to quantify how sensitive performance is to solver accuracy or to the choice of the proximal regularization parameter. The experiments show outperformance, yet without more detail on controls it is hard to know how general the improvement is. Still, the central idea is straightforward to implement and the claims are testable, so the paper has enough substance to warrant a serious referee rather than a desk reject. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Proximal-IMH, an independence Metropolis-Hastings sampler for Bayesian inverse problems. It draws proposals from a cheap but biased approximate posterior and applies a proximal correction via an auxiliary optimization problem to produce a local adjustment that trades off fidelity to the exact posterior against stability around the approximate reference. For idealized settings the authors prove that this correction tightens the match to the exact posterior and thereby improves acceptance rates and mixing; the method is claimed to apply to both linear and nonlinear forward operators. Numerical experiments on multimodal and data-driven priors with nonlinear operators are reported to show reliable outperformance over existing IMH variants.

Significance. If the idealized proof holds and the numerical gains are reproducible, the approach could supply a practical bias-correction mechanism for IMH when only approximate posteriors are cheaply available. The explicit proof for idealized cases together with the extension to nonlinear operators constitutes a clear strength; the method also avoids introducing new free parameters beyond the proximal regularization weight.

major comments (2)

[idealized-settings proof] The proof that the proximal correction tightens the match to the exact posterior (abstract and the idealized-settings analysis) is derived under the assumption of an exact solve of the auxiliary proximal optimization problem. For nonlinear input-output operators this problem is non-convex; any practical solver tolerance or early stopping therefore risks violating the exactness assumption used to derive the acceptance-rate improvement. Please state the required accuracy of the proximal solve explicitly and analyze the effect of approximate solves on the acceptance probability.
[numerical experiments] The central claim of improved mixing and acceptance rates rests on the proximal correction reducing bias relative to the approximate posterior. The manuscript should clarify whether the reported numerical gains remain when the proximal subproblem is solved only to moderate accuracy (e.g., 10^{-3} relative tolerance), as this directly tests the robustness of the theoretical guarantee for the nonlinear cases shown in the experiments.

minor comments (2)

[method description] The role of the proximal regularization parameter is introduced but its selection strategy across the reported experiments is not detailed; a brief discussion or default rule would improve reproducibility.
[algorithm] Notation for the approximate posterior and the proximal operator should be made consistent between the theoretical statements and the algorithmic pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comments point by point below and will revise the manuscript to incorporate the suggested clarifications and additional analysis.

read point-by-point responses

Referee: [idealized-settings proof] The proof that the proximal correction tightens the match to the exact posterior (abstract and the idealized-settings analysis) is derived under the assumption of an exact solve of the auxiliary proximal optimization problem. For nonlinear input-output operators this problem is non-convex; any practical solver tolerance or early stopping therefore risks violating the exactness assumption used to derive the acceptance-rate improvement. Please state the required accuracy of the proximal solve explicitly and analyze the effect of approximate solves on the acceptance probability.

Authors: We agree that the idealized proof relies on an exact solution of the proximal optimization problem. In the revised manuscript we will explicitly state the solver accuracy (in terms of relative residual tolerance) required to preserve the theoretical guarantees on acceptance-rate improvement. We will also add a short perturbation analysis showing that, under a bounded error in the proximal solution, the acceptance probability remains strictly higher than that of the uncorrected approximate posterior, with the improvement degrading gracefully as a function of the tolerance. Remarks on the implications for non-convex nonlinear operators will be included in the idealized-settings section. revision: yes
Referee: [numerical experiments] The central claim of improved mixing and acceptance rates rests on the proximal correction reducing bias relative to the approximate posterior. The manuscript should clarify whether the reported numerical gains remain when the proximal subproblem is solved only to moderate accuracy (e.g., 10^{-3} relative tolerance), as this directly tests the robustness of the theoretical guarantee for the nonlinear cases shown in the experiments.

Authors: We thank the referee for this important robustness check. In the revised numerical experiments section we will report results for the nonlinear-operator examples in which the proximal subproblem is solved only to a moderate accuracy of 10^{-3} relative tolerance. These additional runs confirm that the gains in acceptance rates and mixing are largely retained, albeit with a modest reduction relative to high-accuracy solves, thereby supporting the practical applicability of the method. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained via new proximal correction and idealized proof

full rationale

The paper defines Proximal-IMH by introducing an auxiliary proximal optimization to correct samples from an approximate posterior, then proves in idealized settings that this correction tightens the match to the exact posterior (improving IMH acceptance). This proof and the supporting numerical experiments on linear/nonlinear operators constitute independent content; no parameter is fitted to data and then relabeled as a prediction, no self-citation chain bears the central claim, and no equation reduces to its own input by construction. The idealized proof's exact-solve assumption is a modeling choice, not a definitional loop.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method depends on the availability of a usable approximate posterior and on the solvability of the proximal subproblem; no new physical entities are postulated.

free parameters (1)

proximal regularization parameter
Controls the trade-off between adherence to the exact posterior and stability around the approximate reference point; its value must be chosen for each problem.

axioms (1)

domain assumption An approximate posterior that is cheaper to sample from exists and can be used to generate proposals.
Stated in the abstract as the starting point for the correction scheme.

pith-pipeline@v0.9.0 · 5709 in / 1232 out tokens · 59192 ms · 2026-05-21T11:32:35.253347+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

x = arg min ∥A(x)−eA(ex)∥² + β∥x−ex∥² (Eq. 3b); K=(AᵀA+βI)⁻¹(AᵀeA+βI) (Eq. 4)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mixing-time bounds via local Lipschitz constants of log-weights (Thm 3.3, A.3)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.