Operator Splitting with Hamilton-Jacobi-based Proximals
Pith reviewed 2026-05-22 11:39 UTC · model grok-4.3
The pith
Replacing exact proximal operators with HJ-Prox approximations preserves convergence in operator splitting algorithms under mild assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a unified operator splitting framework based on HJ-Prox approximations can be deployed even when the functions involved are not proximable in closed form. It proves that the convergence guarantees of standard splitting methods carry over to this approximated setting for the proximal point algorithm, proximal gradient descent, Douglas-Rachford splitting, Davis-Yin splitting, and primal-dual hybrid gradient under mild assumptions.
What carries the argument
HJ-Prox, a derivative-free Monte Carlo technique based on Hamilton-Jacobi PDE theory that numerically approximates proximal operators.
If this is right
- Proximal point, proximal gradient, Douglas-Rachford, Davis-Yin, and primal-dual hybrid gradient algorithms all retain their convergence properties when exact proximal steps are replaced by HJ-Prox.
- Operator splitting methods become applicable to optimization problems whose component functions lack closed-form proximal operators.
- Numerical experiments confirm that HJ-Prox versions remain competitive on statistical learning tasks.
Where Pith is reading between the lines
- The same PDE-based approximation idea could be adapted to other first-order operators used in optimization.
- Efficiency gains might come from variance reduction or adaptive sampling in the Monte Carlo step.
- The framework could serve as a template for handling nonconvex or stochastic settings with analogous error controls.
Load-bearing premise
The Monte Carlo approximation of the proximal operator must be accurate enough that accumulated error does not break convergence, and the problem functions must satisfy the mild convexity or continuity properties used in the proofs.
What would settle it
A test in which the HJ-Prox sample size is reduced on a convex problem with a known exact solution until the splitting algorithm diverges would falsify the claim that convergence is preserved.
read the original abstract
Operator splitting algorithms are a cornerstone of modern first-order optimization, decomposing complex problems into simpler subproblems solved via proximal operators. However, most functions lack closed-form proximal operators, which has long restricted these methods to a narrow set of problems. Hamilton-Jacobi-based proximal operator (HJ-Prox) is a recent derivative-free Monte Carlo technique based on Hamilton-Jacobi PDE theory, that approximates proximal operators numerically. In this work, we introduce a unified framework for operator splitting via HJ-Prox, which allows for deployment of operator splitting even when functions are not proximable. We prove that replacing exact proximal steps with HJ-Prox in algorithms such as proximal point, proximal gradient descent, Douglas-Rachford splitting, Davis-Yin splitting, and primal-dual hybrid gradient preserves convergence guarantees under mild assumptions. Numerical experiments demonstrate HJ-Prox is competitive and effective on a wide variety of statistical learning tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a unified framework for replacing exact proximal operators with the Hamilton-Jacobi-based proximal operator (HJ-Prox), a derivative-free Monte Carlo approximation derived from Hamilton-Jacobi PDE theory, within standard operator splitting methods. It claims that this substitution preserves convergence for the proximal point algorithm, proximal gradient descent, Douglas-Rachford splitting, Davis-Yin splitting, and primal-dual hybrid gradient under mild assumptions on the functions and approximation quality. Numerical experiments on statistical learning tasks are included to illustrate competitiveness.
Significance. If the convergence transfers are established with explicit controls on the Monte Carlo errors, the work would meaningfully expand the applicability of operator splitting to non-proximable functions, a long-standing limitation in first-order optimization. The unified treatment across five algorithms and the derivative-free character of HJ-Prox are strengths; credit is given for linking PDE-based approximation techniques to optimization convergence theory.
major comments (2)
- [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (inexact proximal point): the proof requires the HJ-Prox error sequence {ε_k} to satisfy ∑ ε_k < ∞ almost surely (or in expectation) for convergence to carry over from the exact case, yet no sample-size schedule is derived that guarantees this summability from the variance of the Monte Carlo HJ-PDE estimator across iterations.
- [§5.1] §5.1 (Davis-Yin and PDHG extensions): the mild assumptions invoked for error tolerance are stated abstractly; it is unclear whether they remain sufficient when the per-iteration Monte Carlo variance depends on the local Lipschitz constants and sample count without an adaptive schedule.
minor comments (2)
- [Algorithm boxes and §3] The distinction between the exact proximal mapping and its HJ-Prox approximation could be denoted more consistently (e.g., via a subscript or hat) to avoid occasional ambiguity in the algorithm statements.
- [Figures 3-4] Figure 3 and 4: axis labels and legends do not indicate the number of Monte Carlo samples used per proximal step, which is relevant for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight important practical aspects of implementing the error controls for the Monte Carlo approximations. We address each major comment below and indicate planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4.2, Theorem 4.1] §4.2, Theorem 4.1 (inexact proximal point): the proof requires the HJ-Prox error sequence {ε_k} to satisfy ∑ ε_k < ∞ almost surely (or in expectation) for convergence to carry over from the exact case, yet no sample-size schedule is derived that guarantees this summability from the variance of the Monte Carlo HJ-PDE estimator across iterations.
Authors: We agree that the summability condition on {ε_k} is required for the convergence result in Theorem 4.1 to hold, consistent with standard analyses of inexact proximal point methods. While the theorem is stated under this general assumption on the approximation quality, we acknowledge that an explicit sample-size schedule linking the Monte Carlo variance to the required decay rate is not derived in the current version. In the revision, we will add a remark following Theorem 4.1 that provides a sufficient schedule: under the Lipschitz assumptions on the functions, the HJ-PDE Monte Carlo estimator has variance scaling as O(1/N_k), so choosing N_k ≥ C k^{2+δ} for δ > 0 and suitable constant C ensures E[ε_k] ≤ O(1/k^{1+δ/2}), which is summable. This addition will clarify practical implementation without changing the statement or proof of the theorem. revision: yes
-
Referee: [§5.1] §5.1 (Davis-Yin and PDHG extensions): the mild assumptions invoked for error tolerance are stated abstractly; it is unclear whether they remain sufficient when the per-iteration Monte Carlo variance depends on the local Lipschitz constants and sample count without an adaptive schedule.
Authors: The mild assumptions in §5.1 are formulated abstractly in terms of the per-iteration approximation errors satisfying summability (or boundedness) conditions in expectation or almost surely; these are sufficient for the convergence transfers in Davis-Yin and PDHG regardless of how the errors arise. To address the dependence on local Lipschitz constants, we will revise the discussion in §5.1 to explicitly connect the error tolerance to the Monte Carlo variance, noting that the variance of the HJ-Prox estimator scales with the local Lipschitz constant L and that sample sizes can be chosen proportionally to L^2 to meet the error bound. We will also add a note that a sufficiently large fixed sample size suffices under the global bounded-domain and uniform Lipschitz assumptions used elsewhere in the paper, while an adaptive schedule can be employed if local constants vary significantly. This makes the assumptions more concrete without restricting their generality. revision: yes
Circularity Check
No circularity: convergence claims rest on external inexact proximal theory
full rationale
The paper's derivation applies standard results on inexact operator splitting (proximal point, DR, Davis-Yin, PDHG) to the case where each proximal step is replaced by an HJ-Prox Monte Carlo approximation. The proofs invoke mild assumptions on approximation error without defining the target convergence rate in terms of the approximation itself or fitting parameters to the output. HJ-Prox is treated as an external technique based on Hamilton-Jacobi PDE theory; no load-bearing self-citation chain reduces the central claim to a tautology, and the framework remains independent of the specific fitted values or ansatzes inside the present manuscript.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Problem functions satisfy mild assumptions such as convexity or continuity properties needed for convergence.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3.3 (Monte Carlo Bound on HJ-Prox) and Assumption 3.4 (summability of δ_k, α_k, N_k)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat_induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Convergence via perturbed Krasnosel'skii-Mann (Theorem 3.1 citing Combettes 2001)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Fixed-Point Neural Optimal Transport without Implicit Differentiation
A single-network fixed-point formulation for neural optimal transport eliminates adversarial min-max optimization and implicit differentiation while enforcing dual feasibility exactly.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.