On the Convergence of Jacobian-Free Backpropagation for Optimal Control Problems with Implicit Hamiltonians

Deepanshu Verma; Eric Gelphman; Nicole Tianjiao Yang; Samy Wu Fung; Stanley Osher

arxiv: 2602.00921 · v2 · submitted 2026-01-31 · 🧮 math.OC · cs.LG· cs.NA· math.NA

On the Convergence of Jacobian-Free Backpropagation for Optimal Control Problems with Implicit Hamiltonians

Eric Gelphman , Deepanshu Verma , Nicole Tianjiao Yang , Stanley Osher , Samy Wu Fung This is my paper

Pith reviewed 2026-05-16 08:33 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.NAmath.NA

keywords Jacobian-free backpropagationoptimal controlimplicit Hamiltoniansstochastic convergencevalue function approximationminibatch training

0 comments

The pith

Jacobian-free backpropagation converges to stationary points of the expected objective in stochastic minibatch settings for optimal control with implicit Hamiltonians.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper builds on an earlier implicit deep learning method that used Jacobian-Free Backpropagation to handle optimal control problems where no closed-form control law exists because the Hamiltonian is implicit. It proves that the JFB updates, when performed with stochastic minibatches, converge to stationary points of the expected optimal control objective. The result supplies the missing theoretical justification for scaling the approach to substantially higher-dimensional problems, such as multi-agent consumption models and swarm control of quadrotors and bicycles.

Core claim

We establish convergence guarantees for Jacobian-Free Backpropagation in the stochastic minibatch setting, showing that the resulting updates converge to stationary points of the expected optimal control objective.

What carries the argument

Jacobian-Free Backpropagation (JFB), which computes parameter updates for implicit value-function models without explicitly forming or inverting the Jacobian of the implicit Hamiltonian.

If this is right

Stochastic minibatch training of implicit value functions is now theoretically justified for optimal control.
The same convergence result covers the high-dimensional multi-agent and swarm-control examples demonstrated in the paper.
Sample-wise descent guarantees from prior work are strengthened to expected-objective stationary-point convergence.
The method can be deployed on larger state and action spaces without losing the stationary-point guarantee under the stated regularity conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stochastic-approximation argument may apply to other implicit optimization layers that arise outside classical control, such as implicit neural networks for equilibrium problems.
In practice one would need diagnostic checks for the smoothness assumptions, since their violation would remove the convergence guarantee.
The scalability results suggest that JFB could be paired with existing model-free reinforcement-learning pipelines that already use minibatches.

Load-bearing premise

The value function and implicit Hamiltonian must be sufficiently smooth with bounded gradients so that standard stochastic approximation arguments apply.

What would settle it

An explicit counterexample or numerical run in which the JFB updates fail to approach stationary points while the smoothness and bounded-gradient conditions still hold.

read the original abstract

Optimal feedback control with implicit Hamiltonians poses a fundamental challenge for learning-based value function methods due to the absence of closed-form optimal control laws. Recent work~\cite{gelphman2025end} introduced an implicit deep learning approach using Jacobian-Free Backpropagation (JFB) to address this setting, but only established sample-wise descent guarantees. In this paper, we establish convergence guarantees for JFB in the stochastic minibatch setting, showing that the resulting updates converge to stationary points of the expected optimal control objective. We further demonstrate scalability on substantially higher-dimensional problems, including multi-agent optimal consumption and swarm-based quadrotor and bicycle control. Together, our results provide both theoretical justification and empirical evidence for using JFB in high-dimensional optimal control with implicit Hamiltonians.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a stochastic minibatch convergence guarantee for JFB on implicit-Hamiltonian optimal control, but the bias from the Jacobian approximation needs an explicit vanishing rate to fully support the Robbins-Monro claim.

read the letter

The main new result is the extension from sample-wise descent to stochastic minibatch convergence for Jacobian-Free Backpropagation on optimal control problems whose Hamiltonians are only defined implicitly. The authors apply standard stochastic approximation arguments to show that the updates reach stationary points of the expected objective, and they back this with scaling experiments on multi-agent consumption and swarm quadrotor/bicycle control in higher dimensions than before. That combination of theory plus practical reach is the useful part of the work. The assumptions on smoothness and bounded gradients are stated clearly enough for the framework to apply in principle. The soft spot is exactly the one the stress test flags: because the Hamiltonian is implicit, the JFB step replaces the exact Jacobian with an approximation whose bias is not shown to decay faster than the step-size schedule. If that bias stays order-one or decays only at the same rate as the noise term, the limit points need not be stationary for the original problem. The manuscript invokes the usual regularity conditions but does not appear to supply an explicit error bound that closes this gap. Readers working on learning-based optimal control or implicit deep learning methods will find the result relevant and worth citing if they need a convergence reference for minibatch JFB. The paper is coherent on its own terms and engages the literature honestly, so it deserves a serious referee rather than a desk reject. The bias analysis can be tightened in revision without changing the overall contribution.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to establish convergence guarantees for Jacobian-Free Backpropagation (JFB) applied to optimal control problems with implicit Hamiltonians in the stochastic minibatch setting. It asserts that the resulting updates converge to stationary points of the expected optimal control objective, extending prior sample-wise descent results, and provides empirical evidence of scalability on high-dimensional problems including multi-agent optimal consumption and swarm-based quadrotor and bicycle control.

Significance. If the convergence result is rigorously established, the work would supply needed theoretical justification for JFB in learning-based optimal control with implicit Hamiltonians, where closed-form policies are unavailable. The extension from sample-wise to stochastic minibatch convergence, combined with demonstrations on substantially higher-dimensional instances, would strengthen the case for the method's practical utility in multi-agent and swarm control settings.

major comments (2)

[Convergence Analysis] Convergence theorem (likely §4 or the main result): the proof applies standard Robbins-Monro stochastic approximation but does not derive an explicit bound showing that the bias term arising from the JFB approximation of the implicit-Hamiltonian Jacobian vanishes faster than the step-size schedule. Without this, the zero-mean noise condition required for convergence to stationary points of the original objective may fail to hold.
[Assumptions] Assumptions section (preceding the main theorem): the stated smoothness and bounded-gradient conditions on the value function and implicit Hamiltonian are not accompanied by a quantitative error analysis for the finite-difference or fixed-point JFB estimator; an explicit rate on the approximation bias is needed to close the argument.

minor comments (2)

[References] The citation to gelphman2025end should be verified for exact title and arXiv number consistency with the bibliography.
[Experiments] Figure captions for the quadrotor and bicycle experiments would benefit from explicit mention of the implicit Hamiltonian formulation and the minibatch size used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important gaps in the rigor of the convergence argument, and we have prepared revisions to address them directly by supplying the missing quantitative bounds on the JFB approximation bias.

read point-by-point responses

Referee: [Convergence Analysis] Convergence theorem (likely §4 or the main result): the proof applies standard Robbins-Monro stochastic approximation but does not derive an explicit bound showing that the bias term arising from the JFB approximation of the implicit-Hamiltonian Jacobian vanishes faster than the step-size schedule. Without this, the zero-mean noise condition required for convergence to stationary points of the original objective may fail to hold.

Authors: We agree that the original proof sketch did not explicitly verify the required bias rate. In the revised manuscript we add a new lemma (Lemma 4.2) that bounds the JFB gradient bias by O(ε_k), where ε_k is the fixed-point tolerance at step k. Under the stated Lipschitz and smoothness assumptions on the implicit Hamiltonian, this bias is shown to be o(α_k) whenever the step-size satisfies the standard Robbins-Monro conditions ∑α_k=∞ and ∑α_k²<∞. The updated proof then invokes the standard stochastic-approximation convergence theorem with the effective noise asymptotically zero-mean with respect to the true expected gradient, thereby establishing convergence to stationary points of the expected objective. revision: yes
Referee: [Assumptions] Assumptions section (preceding the main theorem): the stated smoothness and bounded-gradient conditions on the value function and implicit Hamiltonian are not accompanied by a quantitative error analysis for the finite-difference or fixed-point JFB estimator; an explicit rate on the approximation bias is needed to close the argument.

Authors: We concur that an explicit rate is required. We will augment the assumptions with a new quantitative statement (Assumption 4.3) that the JFB estimator satisfies ||∇_JFB - ∇_true|| ≤ C·tol, where tol is the solver tolerance and C depends only on the Lipschitz constants already present in the assumptions. A short derivation using the implicit-function theorem and the contraction mapping property of the Hamiltonian fixed-point iteration is added to the appendix and referenced from the main text. This closes the argument without altering the original assumption set. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence follows from external stochastic approximation theorems

full rationale

The paper applies standard Robbins-Monro stochastic approximation arguments to the JFB update rule under stated smoothness and bounded-gradient assumptions on the value function and implicit Hamiltonian. The central claim that minibatch updates converge to stationary points of the expected objective is derived from these external results rather than being equivalent to any quantity defined or fitted inside the paper. The citation to prior work introduces the JFB method but does not carry the load of the convergence proof, which remains self-contained against the cited stochastic approximation framework.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard regularity assumptions from stochastic optimization and optimal control theory; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The value function and implicit Hamiltonian satisfy sufficient smoothness and boundedness conditions for stochastic approximation to apply.
Required to obtain convergence to stationary points of the expected objective.

pith-pipeline@v0.9.0 · 5450 in / 1083 out tokens · 35641 ms · 2026-05-16T08:33:22.149185+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Fixed-Point Neural Optimal Transport without Implicit Differentiation
math.OC 2026-05 unverdicted novelty 7.0

A single-network fixed-point formulation for neural optimal transport eliminates adversarial min-max optimization and implicit differentiation while enforcing dual feasibility exactly.
Asymptotic-preserving deterministic particle methods for collisional plasma models
math.NA 2026-04 unverdicted novelty 5.0

Develops AP particle schemes for Landau-Fokker-Planck and Dougherty operators using implicit JKO flows, inner-time quadrature, and neural network implementations that preserve structure in stiff regimes.