arxiv: 2604.08580 · v1 · submitted 2026-03-28 · 🧮 math.OC · cs.LG

Recognition: no theorem link

Adjoint Matching through the Lens of the Stochastic Maximum Principle in Optimal Control

Carles Domingo-Enrich , Jiequn Han

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:21 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords stochastic optimal controlstochastic maximum principleadjoint matchingHamilton-Jacobi-Bellman equationdiffusion modelsoptimal control theory

0 comments

The pith

A general Hamiltonian adjoint matching objective shares the first variation of the stochastic optimal control objective, so its critical points satisfy Hamilton-Jacobi-Bellman stationarity conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives adjoint matching from the stochastic maximum principle for stochastic optimal control problems with control-dependent dynamics. It formulates a Hamiltonian adjoint matching objective that applies to control-dependent drift and diffusion when running costs are convex. The expected value of this objective is proven to possess the same first variation as the original stochastic control objective. Critical points of the matching objective therefore satisfy the stationarity conditions of the Hamilton-Jacobi-Bellman equation. In the special case of diffusion independent of state and control, the objective reduces to a lean loss whose critical points coincide with the true optimum under mild uniqueness assumptions on the control.

Core claim

We formulate a general Hamiltonian adjoint matching objective for SOC problems with control-dependent drift and diffusion and convex running costs, and show that its expected value has the same first variation as the original SOC objective. As a consequence, critical points satisfy the Hamilton--Jacobi--Bellman (HJB) stationarity conditions. In the state- and control-independent diffusion case, we recover the lean adjoint matching loss whose critical points coincide with the optimal control under mild uniqueness assumptions. Adjoint matching is precisely a continuous-time method of successive approximations induced by the SMP.

What carries the argument

The Hamiltonian adjoint matching objective, constructed so its expectation shares the first variation of the stochastic optimal control cost functional and thereby inherits the stationarity conditions from the stochastic maximum principle.

If this is right

Critical points of the adjoint matching objective satisfy the HJB stationarity conditions.
In the case of state- and control-independent diffusion, critical points of the lean loss coincide with the optimal control under mild uniqueness assumptions on the control.
Adjoint matching serves as a practical continuous-time successive approximation method derived from the SMP.
The approach supplies an implementable pathway for SMP-based iterations in stochastic problems by avoiding intractable martingale terms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The equivalence may be tested numerically on simple linear-quadratic control problems to confirm gradient agreement up to sampling variance.
The method supplies a direct route for applying stochastic control techniques to reward fine-tuning of diffusion and flow models.
Further analysis could examine whether the first-variation property persists under mild relaxations of convexity via added regularization terms.

Load-bearing premise

The running costs of the stochastic optimal control problem are convex.

What would settle it

On a low-dimensional example with control-dependent diffusion and convex running costs, compute the gradient of the proposed adjoint matching objective and the gradient of the original SOC objective; any systematic mismatch between these gradients falsifies the claimed first-variation equivalence.

read the original abstract

Reward fine-tuning of diffusion and flow models and sampling from tilted or Boltzmann distributions can both be formulated as stochastic optimal control (SOC) problems, where learning an optimal generative dynamics corresponds to optimizing a control under SDE constraints. In this work, we revisit and generalize Adjoint Matching, a recently proposed SOC-based method for learning optimal controls, and place it on a rigorous footing by deriving it from the Stochastic Maximum Principle (SMP). We formulate a general Hamiltonian adjoint matching objective for SOC problems with control-dependent drift and diffusion and convex running costs, and show that its expected value has the same first variation as the original SOC objective. As a consequence, critical points satisfy the Hamilton--Jacobi--Bellman (HJB) stationarity conditions. In the important practical case of state- and control-independent diffusion, we recover the lean adjoint matching loss previously introduced in adjoint matching, which avoids second-order terms and whose critical points coincide with the optimal control under mild uniqueness assumptions. Finally, we show that adjoint matching can be precisely interpreted as a continuous-time method of successive approximations induced by the SMP, yielding a practical and implementable alternative to classical SMP-based algorithms, which are obstructed by intractable martingale terms in the stochastic setting. These results are also of independent interest to the stochastic control community, providing new implementable objectives and a viable pathway for SMP-based iterations in stochastic problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives adjoint matching a clean SMP derivation and extends it to control-dependent drift and diffusion while showing first-variation equivalence.

read the letter

The main contribution is the generalization of the adjoint-matching objective to stochastic control problems where both drift and diffusion can depend on the control, together with the proof that its expectation shares the same first variation as the original SOC objective. This lets them conclude that critical points satisfy the HJB stationarity conditions. They also recover the simpler loss from earlier work in the state- and control-independent diffusion case and frame the whole thing as a continuous-time successive approximation induced by the SMP. That last interpretation is useful because it sidesteps the intractable martingale terms that block direct SMP algorithms in stochastic settings. The convexity assumption on running costs is used only to turn stationarity into optimality in the independent-diffusion case, and the first-variation identity itself appears to hold under standard regularity. The derivations look internally consistent with classical stochastic control, and the practical payoff for reward fine-tuning of diffusion models is clear. The soft spots are minor: the abstract leaves the handling of second-order terms and uniqueness arguments implicit, so a referee would want to see the full BSDE manipulations and any interchange justifications. No circularity or hidden fitting is present. This is for people working on stochastic optimal control or on generative modeling that reduces to control problems. It is worth sending to peer review because the SMP link is new, the objective is implementable, and the claims are grounded enough to merit detailed checking.

Referee Report

2 major / 2 minor

Summary. The manuscript revisits and generalizes Adjoint Matching for stochastic optimal control (SOC) problems with control-dependent drift and diffusion and convex running costs. It formulates a Hamiltonian adjoint matching objective whose expected value is shown to possess the same first variation as the original SOC objective, implying that critical points satisfy the Hamilton--Jacobi--Bellman stationarity conditions. In the special case of state- and control-independent diffusion the construction recovers the lean adjoint-matching loss without second-order terms, whose critical points coincide with the optimum under mild uniqueness assumptions on the control. The work further interprets adjoint matching as a continuous-time successive-approximation scheme induced by the Stochastic Maximum Principle (SMP), providing an implementable alternative to classical SMP iterations that are obstructed by intractable martingale terms.

Significance. If the first-variation identity holds under the stated regularity, the paper supplies a rigorous foundation for adjoint-matching methods already used in diffusion-model fine-tuning and Boltzmann sampling. It also yields a practical, martingale-free route to SMP-based optimization in stochastic settings, which is of independent interest to the stochastic control community. The explicit recovery of the lean loss and the interpretation as successive approximations are concrete strengths.

major comments (2)

[§3, Theorem 3.2] §3, Theorem 3.2 (first-variation identity): the proof sketch invokes the SMP adjoint BSDE and the standard first-variation formula for controlled SDEs, but the manuscript does not explicitly state the integrability or domination conditions that justify interchanging the derivative with respect to the control parameter and the expectation. A short paragraph clarifying these conditions (or citing a standard reference) would remove any residual doubt about the scope of the equivalence.
[§4.2, Proposition 4.3] §4.2, Proposition 4.3 (uniqueness for independent diffusion): the claim that critical points coincide with the optimum relies on a mild uniqueness assumption on the optimal control. The manuscript should state this assumption explicitly (e.g., Lipschitz continuity of the Hamiltonian or strict convexity of the running cost) rather than leaving it implicit, so that readers can verify whether it is satisfied in typical diffusion-model applications.

minor comments (2)

Notation: the symbol for the adjoint process is overloaded between the forward and backward equations; a brief remark distinguishing the two uses would improve readability.
Figure 1: the diagram of the successive-approximation iteration would benefit from an explicit arrow indicating the update of the control law at each step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and the constructive suggestions for minor revision. We address each major comment below and will incorporate the requested clarifications into the revised manuscript.

read point-by-point responses

Referee: [§3, Theorem 3.2] §3, Theorem 3.2 (first-variation identity): the proof sketch invokes the SMP adjoint BSDE and the standard first-variation formula for controlled SDEs, but the manuscript does not explicitly state the integrability or domination conditions that justify interchanging the derivative with respect to the control parameter and the expectation. A short paragraph clarifying these conditions (or citing a standard reference) would remove any residual doubt about the scope of the equivalence.

Authors: We agree that an explicit statement of the integrability conditions would remove any ambiguity. In the revised manuscript we will insert a short paragraph immediately following the proof of Theorem 3.2. The paragraph will specify that the relevant processes are uniformly integrable (under the standing linear-growth and Lipschitz assumptions already listed in Section 2) and will cite Karatzas and Shreve (1991, Chapter 3) for the standard justification of interchanging differentiation under the expectation. revision: yes
Referee: [§4.2, Proposition 4.3] §4.2, Proposition 4.3 (uniqueness for independent diffusion): the claim that critical points coincide with the optimum relies on a mild uniqueness assumption on the optimal control. The manuscript should state this assumption explicitly (e.g., Lipschitz continuity of the Hamiltonian or strict convexity of the running cost) rather than leaving it implicit, so that readers can verify whether it is satisfied in typical diffusion-model applications.

Authors: We thank the referee for this observation. In the revised version we will state the uniqueness assumption explicitly in the statement of Proposition 4.3: the running cost is assumed to be strictly convex in the control variable. This condition is standard in the literature and is satisfied by the quadratic or entropy-regularized costs used in the diffusion-model examples of Section 5. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives the claimed equivalence—that the expected value of the general Hamiltonian adjoint matching objective has identical first variation to the original SOC objective—directly from the established Stochastic Maximum Principle (SMP) adjoint BSDE and the standard first-variation formula for controlled SDEs. This is a self-contained mathematical identity under the stated regularity assumptions on drift, diffusion, and convex running costs; the convexity condition is invoked only to link stationarity to optimality in the independent-diffusion case and does not enter the variation identity itself. No step reduces the target result to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose justification is internal to the present work. The derivation therefore stands on external, independently verifiable stochastic-control results rather than on any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard existence/uniqueness results for SDEs and HJB equations plus convexity of running costs; no free parameters are introduced and no new entities are postulated.

axioms (2)

domain assumption Existence and uniqueness of solutions to the controlled SDEs and associated HJB equations under the stated regularity conditions.
Invoked throughout the derivations to guarantee that the first-variation equivalence and critical-point statements are well-defined.
domain assumption Convexity of the running costs.
Required for the general Hamiltonian adjoint matching objective to be well-posed and for the first-variation argument to hold.

pith-pipeline@v0.9.0 · 5547 in / 1425 out tokens · 49467 ms · 2026-05-14T22:21:28.834863+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

[1]

An optimal control perspective on diffusion-based generative modeling.arXiv preprint arXiv:2211.01364,

Julius Berner, Lorenz Richter, and Karen Ullrich. An optimal control perspective on diffusion-based generative modeling.arXiv preprint arXiv:2211.01364,

work page arXiv
[2]

Trust region constrained measure transport in path space for stochastic optimal control and inference.arXiv preprint arXiv:2508.12511,

Denis Blessing, Julius Berner, Lorenz Richter, Carles Domingo-Enrich, Yuanqi Du, Arash Vahdat, and Gerhard Neumann. Trust region constrained measure transport in path space for stochastic optimal control and inference.arXiv preprint arXiv:2508.12511,

work page arXiv
[3]

Terminally constrained flow-based generative models from an optimal control perspective.arXiv preprint arXiv:2601.09474,

Weiguo Gao, Ming Li, and Qianxiao Li. Terminally constrained flow-based generative models from an optimal control perspective.arXiv preprint arXiv:2601.09474,

work page arXiv
[4]

Pathintegralsandsymmetrybreakingforoptimalcontroltheory.Journal of Statistical Mechanics: Theory and Experiment, 2005(11), nov

HJKappen. Pathintegralsandsymmetrybreakingforoptimalcontroltheory.Journal of Statistical Mechanics: Theory and Experiment, 2005(11), nov

work page 2005
[5]

Differentiating through stochastic differential equations: A primer.arXiv preprint arXiv:2601.08594,

Rishi Leburu, Levon Nurbekyan, and Lars Ruthotto. Differentiating through stochastic differential equations: A primer.arXiv preprint arXiv:2601.08594,

work page arXiv
[6]

Backward stochastic differential equations and quasilinear parabolic partial differential equations

Etienne Pardoux and Shige Peng. Backward stochastic differential equations and quasilinear parabolic partial differential equations. InStochastic Partial Differential Equations and Their Applications: Proceedings of IFIP WG 7/1 International Conference University of North Car- olina at Charlotte, NC June 6–8, 1991, pages 200–217. Springer,

work page 1991
[7]

Aligning text-to- image diffusion models with reward backpropagation (2023).arXiv preprint arXiv:2310.03739,

Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, and Katerina Fragkiadaki. Aligning text-to- image diffusion models with reward backpropagation (2023).arXiv preprint arXiv:2310.03739,

work page arXiv 2023
[8]

Make-A-Video: Text-to-Video Generation without Text-Video Data

Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, et al. Make-a-video: Text-to-video generation without text- video data.arXiv preprint arXiv:2209.14792,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Generative modeling by estimating gradients of the data distribu- tion.arXiv preprint arXiv:1907.05600,

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribu- tion.arXiv preprint arXiv:1907.05600,

work page arXiv 1907
[10]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-basedgenerativemodelingthroughstochasticdifferentialequations. InInternational Conference on Learning Representations (ICLR 2021),

work page 2021
[11]

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279,

Wenpin Tang and Fuzhong Zhou. Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279,

work page arXiv
[12]

Theoretical guarantees for sampling and inference in generative models with latent diffusions

15 Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions.arXiv:1903.01608,

work page internal anchor Pith review Pith/arXiv arXiv 1903
[13]

L., Tseng, A

Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine-tuning of continuous- time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194,

work page arXiv
[14]

Audiobox: Unified audio generation with natural language prompts.arXiv preprint arXiv:2312.15821,

Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, et al. Audiobox: Unified audio generation with natural language prompts.arXiv preprint arXiv:2312.15821,

work page arXiv
[15]

When individual results require additional regularity (e.g.C2 for the second-order adjoint, orJ(u;·,·)∈C1,2 for the verification step), this is stated explicitly

A ProofofTheorem1: BasicAdjointMatchingforGeneralHamil- tonians Throughout the proofs, we assumeb,σ,fareC1 in(x,u)andgisC 1, all with derivatives of at most polynomial growth; that the controlled SDE equation (4) is well-posed for every admissible control and sufficiently small perturbations thereof, with finite moments of all relevant orders; and that th...

work page 2024
[16]

Next, we prove the parameter-gradient identities(71) and(72) following an analogous structure

22 Step 6: exchange gradient and expectation.Under standard regularity and integrability assumptions ensuring that∇xF(x)is integrable and that differentiation under the expectation is valid, taking expectations in equation (120) yields ∇x E [ F(x) ⏐⏐X0 =x ] =E[∇xF(x)] =E[a 0],(103) which is the desired identity(69). Next, we prove the parameter-gradient i...

work page 2012
[17]

A.4 Proof of Lemma 9 By the dynamic programming principle, for∆t>0, J(u;x,t) =E[J(u;X u t+∆t,t+ ∆t)|Xu t =x] +E [∫t+∆t t f(Xu s,u(Xu s,s),s) ds ⏐⏐⏐ Xu t =x ]

for the full derivation in the general diffusion setting. A.4 Proof of Lemma 9 By the dynamic programming principle, for∆t>0, J(u;x,t) =E[J(u;X u t+∆t,t+ ∆t)|Xu t =x] +E [∫t+∆t t f(Xu s,u(Xu s,s),s) ds ⏐⏐⏐ Xu t =x ] . Dividing by∆tand sending∆t→0gives 0 =T uJ(u;x,t) +f(x,u(x,t),t), where the controlled generator is T uϕ=∂tϕ+⟨∇xϕ, b(x,u(x,t),t)⟩+1 2 Tr ( σ...

work page 2025
[18]

Lemma 10.LetX u be a solution of the SDE (4), and letA:R d×[0,T]→Rd×dsatisfy supt∈[0,T]E[∥A(Xu t,t)∥2]<+∞. Consider the integral equation Yt =E [∫T t ( A(Xu s,s)Ys +c(X u s,s) ) ds+ψ(Xu T ) ⏐⏐⏐Xu t ] ,(137) wherec:R d×[0,T]→Rd andψ:R d→Rd are measurable with appropriate integrability, andYt is understood as a function ofXu t . Then this equation has a uni...

work page 2012