Operator Splitting with Hamilton-Jacobi-based Proximals

Eric C. Chi; Nicholas Di; Samy Wu Fung

arxiv: 2601.22370 · v3 · pith:SICNB4GSnew · submitted 2026-01-29 · 🧮 math.OC

Operator Splitting with Hamilton-Jacobi-based Proximals

Nicholas Di , Eric C. Chi , Samy Wu Fung This is my paper

Pith reviewed 2026-05-22 11:39 UTC · model grok-4.3

classification 🧮 math.OC

keywords operator splittingproximal operatorsHamilton-Jacobi PDEMonte Carlo approximationfirst-order optimizationconvergence analysis

0 comments

The pith

Replacing exact proximal operators with HJ-Prox approximations preserves convergence in operator splitting algorithms under mild assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a unified framework that replaces exact proximal operators with numerical approximations called HJ-Prox in standard operator splitting methods. HJ-Prox uses Monte Carlo sampling based on Hamilton-Jacobi PDE theory to compute the proximal mapping without derivatives or closed-form expressions. The central result is that this substitution preserves the original convergence guarantees for algorithms including proximal point, proximal gradient descent, Douglas-Rachford splitting, Davis-Yin splitting, and primal-dual hybrid gradient, provided the approximation is accurate enough and the functions meet basic conditions. This matters because most objective functions in statistical learning lack closed-form proximal operators, so the framework removes a long-standing restriction on where splitting methods can be applied.

Core claim

The paper claims that a unified operator splitting framework based on HJ-Prox approximations can be deployed even when the functions involved are not proximable in closed form. It proves that the convergence guarantees of standard splitting methods carry over to this approximated setting for the proximal point algorithm, proximal gradient descent, Douglas-Rachford splitting, Davis-Yin splitting, and primal-dual hybrid gradient under mild assumptions.

What carries the argument

HJ-Prox, a derivative-free Monte Carlo technique based on Hamilton-Jacobi PDE theory that numerically approximates proximal operators.

If this is right

Proximal point, proximal gradient, Douglas-Rachford, Davis-Yin, and primal-dual hybrid gradient algorithms all retain their convergence properties when exact proximal steps are replaced by HJ-Prox.
Operator splitting methods become applicable to optimization problems whose component functions lack closed-form proximal operators.
Numerical experiments confirm that HJ-Prox versions remain competitive on statistical learning tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same PDE-based approximation idea could be adapted to other first-order operators used in optimization.
Efficiency gains might come from variance reduction or adaptive sampling in the Monte Carlo step.
The framework could serve as a template for handling nonconvex or stochastic settings with analogous error controls.

Load-bearing premise

The Monte Carlo approximation of the proximal operator must be accurate enough that accumulated error does not break convergence, and the problem functions must satisfy the mild convexity or continuity properties used in the proofs.

What would settle it

A test in which the HJ-Prox sample size is reduced on a convex problem with a known exact solution until the splitting algorithm diverges would falsify the claim that convergence is preserved.

read the original abstract

Operator splitting algorithms are a cornerstone of modern first-order optimization, decomposing complex problems into simpler subproblems solved via proximal operators. However, most functions lack closed-form proximal operators, which has long restricted these methods to a narrow set of problems. Hamilton-Jacobi-based proximal operator (HJ-Prox) is a recent derivative-free Monte Carlo technique based on Hamilton-Jacobi PDE theory, that approximates proximal operators numerically. In this work, we introduce a unified framework for operator splitting via HJ-Prox, which allows for deployment of operator splitting even when functions are not proximable. We prove that replacing exact proximal steps with HJ-Prox in algorithms such as proximal point, proximal gradient descent, Douglas-Rachford splitting, Davis-Yin splitting, and primal-dual hybrid gradient preserves convergence guarantees under mild assumptions. Numerical experiments demonstrate HJ-Prox is competitive and effective on a wide variety of statistical learning tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a unified framework for replacing exact proximal operators with the Hamilton-Jacobi-based proximal operator (HJ-Prox), a derivative-free Monte Carlo approximation derived from Hamilton-Jacobi PDE theory, within standard operator splitting methods. It claims that this substitution preserves convergence for the proximal point algorithm, proximal gradient descent, Douglas-Rachford splitting, Davis-Yin splitting, and primal-dual hybrid gradient under mild assumptions on the functions and approximation quality. Numerical experiments on statistical learning tasks are included to illustrate competitiveness.

Significance. If the convergence transfers are established with explicit controls on the Monte Carlo errors, the work would meaningfully expand the applicability of operator splitting to non-proximable functions, a long-standing limitation in first-order optimization. The unified treatment across five algorithms and the derivative-free character of HJ-Prox are strengths; credit is given for linking PDE-based approximation techniques to optimization convergence theory.

major comments (2)

[§4.2, Theorem 4.1] §4.2, Theorem 4.1 (inexact proximal point): the proof requires the HJ-Prox error sequence {ε_k} to satisfy ∑ ε_k < ∞ almost surely (or in expectation) for convergence to carry over from the exact case, yet no sample-size schedule is derived that guarantees this summability from the variance of the Monte Carlo HJ-PDE estimator across iterations.
[§5.1] §5.1 (Davis-Yin and PDHG extensions): the mild assumptions invoked for error tolerance are stated abstractly; it is unclear whether they remain sufficient when the per-iteration Monte Carlo variance depends on the local Lipschitz constants and sample count without an adaptive schedule.

minor comments (2)

[Algorithm boxes and §3] The distinction between the exact proximal mapping and its HJ-Prox approximation could be denoted more consistently (e.g., via a subscript or hat) to avoid occasional ambiguity in the algorithm statements.
[Figures 3-4] Figure 3 and 4: axis labels and legends do not indicate the number of Monte Carlo samples used per proximal step, which is relevant for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important practical aspects of implementing the error controls for the Monte Carlo approximations. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (inexact proximal point): the proof requires the HJ-Prox error sequence {ε_k} to satisfy ∑ ε_k < ∞ almost surely (or in expectation) for convergence to carry over from the exact case, yet no sample-size schedule is derived that guarantees this summability from the variance of the Monte Carlo HJ-PDE estimator across iterations.

Authors: We agree that the summability condition on {ε_k} is required for the convergence result in Theorem 4.1 to hold, consistent with standard analyses of inexact proximal point methods. While the theorem is stated under this general assumption on the approximation quality, we acknowledge that an explicit sample-size schedule linking the Monte Carlo variance to the required decay rate is not derived in the current version. In the revision, we will add a remark following Theorem 4.1 that provides a sufficient schedule: under the Lipschitz assumptions on the functions, the HJ-PDE Monte Carlo estimator has variance scaling as O(1/N_k), so choosing N_k ≥ C k^{2+δ} for δ > 0 and suitable constant C ensures E[ε_k] ≤ O(1/k^{1+δ/2}), which is summable. This addition will clarify practical implementation without changing the statement or proof of the theorem. revision: yes
Referee: [§5.1] §5.1 (Davis-Yin and PDHG extensions): the mild assumptions invoked for error tolerance are stated abstractly; it is unclear whether they remain sufficient when the per-iteration Monte Carlo variance depends on the local Lipschitz constants and sample count without an adaptive schedule.

Authors: The mild assumptions in §5.1 are formulated abstractly in terms of the per-iteration approximation errors satisfying summability (or boundedness) conditions in expectation or almost surely; these are sufficient for the convergence transfers in Davis-Yin and PDHG regardless of how the errors arise. To address the dependence on local Lipschitz constants, we will revise the discussion in §5.1 to explicitly connect the error tolerance to the Monte Carlo variance, noting that the variance of the HJ-Prox estimator scales with the local Lipschitz constant L and that sample sizes can be chosen proportionally to L^2 to meet the error bound. We will also add a note that a sufficiently large fixed sample size suffices under the global bounded-domain and uniform Lipschitz assumptions used elsewhere in the paper, while an adaptive schedule can be employed if local constants vary significantly. This makes the assumptions more concrete without restricting their generality. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence claims rest on external inexact proximal theory

full rationale

The paper's derivation applies standard results on inexact operator splitting (proximal point, DR, Davis-Yin, PDHG) to the case where each proximal step is replaced by an HJ-Prox Monte Carlo approximation. The proofs invoke mild assumptions on approximation error without defining the target convergence rate in terms of the approximation itself or fitting parameters to the output. HJ-Prox is treated as an external technique based on Hamilton-Jacobi PDE theory; no load-bearing self-citation chain reduces the central claim to a tautology, and the framework remains independent of the specific fitted values or ansatzes inside the present manuscript.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework depends on the accuracy of the HJ-Prox approximation and standard domain assumptions from convex optimization; no free parameters or new entities are described in the abstract.

axioms (1)

domain assumption Problem functions satisfy mild assumptions such as convexity or continuity properties needed for convergence.
Abstract states that convergence guarantees hold under these mild assumptions when HJ-Prox replaces exact proximal steps.

pith-pipeline@v0.9.0 · 5679 in / 1074 out tokens · 53013 ms · 2026-05-22T11:39:18.908587+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.3 (Monte Carlo Bound on HJ-Prox) and Assumption 3.4 (summability of δ_k, α_k, N_k)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat_induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Convergence via perturbed Krasnosel'skii-Mann (Theorem 3.1 citing Combettes 2001)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fixed-Point Neural Optimal Transport without Implicit Differentiation
math.OC 2026-05 unverdicted novelty 7.0

A single-network fixed-point formulation for neural optimal transport eliminates adversarial min-max optimization and implicit differentiation while enforcing dual feasibility exactly.