Recognition: no theorem link
Adjoint Matching through the Lens of the Stochastic Maximum Principle in Optimal Control
Pith reviewed 2026-05-14 22:21 UTC · model grok-4.3
The pith
A general Hamiltonian adjoint matching objective shares the first variation of the stochastic optimal control objective, so its critical points satisfy Hamilton-Jacobi-Bellman stationarity conditions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We formulate a general Hamiltonian adjoint matching objective for SOC problems with control-dependent drift and diffusion and convex running costs, and show that its expected value has the same first variation as the original SOC objective. As a consequence, critical points satisfy the Hamilton--Jacobi--Bellman (HJB) stationarity conditions. In the state- and control-independent diffusion case, we recover the lean adjoint matching loss whose critical points coincide with the optimal control under mild uniqueness assumptions. Adjoint matching is precisely a continuous-time method of successive approximations induced by the SMP.
What carries the argument
The Hamiltonian adjoint matching objective, constructed so its expectation shares the first variation of the stochastic optimal control cost functional and thereby inherits the stationarity conditions from the stochastic maximum principle.
If this is right
- Critical points of the adjoint matching objective satisfy the HJB stationarity conditions.
- In the case of state- and control-independent diffusion, critical points of the lean loss coincide with the optimal control under mild uniqueness assumptions on the control.
- Adjoint matching serves as a practical continuous-time successive approximation method derived from the SMP.
- The approach supplies an implementable pathway for SMP-based iterations in stochastic problems by avoiding intractable martingale terms.
Where Pith is reading between the lines
- The equivalence may be tested numerically on simple linear-quadratic control problems to confirm gradient agreement up to sampling variance.
- The method supplies a direct route for applying stochastic control techniques to reward fine-tuning of diffusion and flow models.
- Further analysis could examine whether the first-variation property persists under mild relaxations of convexity via added regularization terms.
Load-bearing premise
The running costs of the stochastic optimal control problem are convex.
What would settle it
On a low-dimensional example with control-dependent diffusion and convex running costs, compute the gradient of the proposed adjoint matching objective and the gradient of the original SOC objective; any systematic mismatch between these gradients falsifies the claimed first-variation equivalence.
read the original abstract
Reward fine-tuning of diffusion and flow models and sampling from tilted or Boltzmann distributions can both be formulated as stochastic optimal control (SOC) problems, where learning an optimal generative dynamics corresponds to optimizing a control under SDE constraints. In this work, we revisit and generalize Adjoint Matching, a recently proposed SOC-based method for learning optimal controls, and place it on a rigorous footing by deriving it from the Stochastic Maximum Principle (SMP). We formulate a general Hamiltonian adjoint matching objective for SOC problems with control-dependent drift and diffusion and convex running costs, and show that its expected value has the same first variation as the original SOC objective. As a consequence, critical points satisfy the Hamilton--Jacobi--Bellman (HJB) stationarity conditions. In the important practical case of state- and control-independent diffusion, we recover the lean adjoint matching loss previously introduced in adjoint matching, which avoids second-order terms and whose critical points coincide with the optimal control under mild uniqueness assumptions. Finally, we show that adjoint matching can be precisely interpreted as a continuous-time method of successive approximations induced by the SMP, yielding a practical and implementable alternative to classical SMP-based algorithms, which are obstructed by intractable martingale terms in the stochastic setting. These results are also of independent interest to the stochastic control community, providing new implementable objectives and a viable pathway for SMP-based iterations in stochastic problems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript revisits and generalizes Adjoint Matching for stochastic optimal control (SOC) problems with control-dependent drift and diffusion and convex running costs. It formulates a Hamiltonian adjoint matching objective whose expected value is shown to possess the same first variation as the original SOC objective, implying that critical points satisfy the Hamilton--Jacobi--Bellman stationarity conditions. In the special case of state- and control-independent diffusion the construction recovers the lean adjoint-matching loss without second-order terms, whose critical points coincide with the optimum under mild uniqueness assumptions on the control. The work further interprets adjoint matching as a continuous-time successive-approximation scheme induced by the Stochastic Maximum Principle (SMP), providing an implementable alternative to classical SMP iterations that are obstructed by intractable martingale terms.
Significance. If the first-variation identity holds under the stated regularity, the paper supplies a rigorous foundation for adjoint-matching methods already used in diffusion-model fine-tuning and Boltzmann sampling. It also yields a practical, martingale-free route to SMP-based optimization in stochastic settings, which is of independent interest to the stochastic control community. The explicit recovery of the lean loss and the interpretation as successive approximations are concrete strengths.
major comments (2)
- [§3, Theorem 3.2] §3, Theorem 3.2 (first-variation identity): the proof sketch invokes the SMP adjoint BSDE and the standard first-variation formula for controlled SDEs, but the manuscript does not explicitly state the integrability or domination conditions that justify interchanging the derivative with respect to the control parameter and the expectation. A short paragraph clarifying these conditions (or citing a standard reference) would remove any residual doubt about the scope of the equivalence.
- [§4.2, Proposition 4.3] §4.2, Proposition 4.3 (uniqueness for independent diffusion): the claim that critical points coincide with the optimum relies on a mild uniqueness assumption on the optimal control. The manuscript should state this assumption explicitly (e.g., Lipschitz continuity of the Hamiltonian or strict convexity of the running cost) rather than leaving it implicit, so that readers can verify whether it is satisfied in typical diffusion-model applications.
minor comments (2)
- Notation: the symbol for the adjoint process is overloaded between the forward and backward equations; a brief remark distinguishing the two uses would improve readability.
- Figure 1: the diagram of the successive-approximation iteration would benefit from an explicit arrow indicating the update of the control law at each step.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and the constructive suggestions for minor revision. We address each major comment below and will incorporate the requested clarifications into the revised manuscript.
read point-by-point responses
-
Referee: [§3, Theorem 3.2] §3, Theorem 3.2 (first-variation identity): the proof sketch invokes the SMP adjoint BSDE and the standard first-variation formula for controlled SDEs, but the manuscript does not explicitly state the integrability or domination conditions that justify interchanging the derivative with respect to the control parameter and the expectation. A short paragraph clarifying these conditions (or citing a standard reference) would remove any residual doubt about the scope of the equivalence.
Authors: We agree that an explicit statement of the integrability conditions would remove any ambiguity. In the revised manuscript we will insert a short paragraph immediately following the proof of Theorem 3.2. The paragraph will specify that the relevant processes are uniformly integrable (under the standing linear-growth and Lipschitz assumptions already listed in Section 2) and will cite Karatzas and Shreve (1991, Chapter 3) for the standard justification of interchanging differentiation under the expectation. revision: yes
-
Referee: [§4.2, Proposition 4.3] §4.2, Proposition 4.3 (uniqueness for independent diffusion): the claim that critical points coincide with the optimum relies on a mild uniqueness assumption on the optimal control. The manuscript should state this assumption explicitly (e.g., Lipschitz continuity of the Hamiltonian or strict convexity of the running cost) rather than leaving it implicit, so that readers can verify whether it is satisfied in typical diffusion-model applications.
Authors: We thank the referee for this observation. In the revised version we will state the uniqueness assumption explicitly in the statement of Proposition 4.3: the running cost is assumed to be strictly convex in the control variable. This condition is standard in the literature and is satisfied by the quadratic or entropy-regularized costs used in the diffusion-model examples of Section 5. revision: yes
Circularity Check
No significant circularity
full rationale
The paper derives the claimed equivalence—that the expected value of the general Hamiltonian adjoint matching objective has identical first variation to the original SOC objective—directly from the established Stochastic Maximum Principle (SMP) adjoint BSDE and the standard first-variation formula for controlled SDEs. This is a self-contained mathematical identity under the stated regularity assumptions on drift, diffusion, and convex running costs; the convexity condition is invoked only to link stationarity to optimality in the independent-diffusion case and does not enter the variation identity itself. No step reduces the target result to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose justification is internal to the present work. The derivation therefore stands on external, independently verifiable stochastic-control results rather than on any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existence and uniqueness of solutions to the controlled SDEs and associated HJB equations under the stated regularity conditions.
- domain assumption Convexity of the running costs.
Reference graph
Works this paper leans on
-
[1]
Julius Berner, Lorenz Richter, and Karen Ullrich. An optimal control perspective on diffusion-based generative modeling.arXiv preprint arXiv:2211.01364,
-
[2]
Denis Blessing, Julius Berner, Lorenz Richter, Carles Domingo-Enrich, Yuanqi Du, Arash Vahdat, and Gerhard Neumann. Trust region constrained measure transport in path space for stochastic optimal control and inference.arXiv preprint arXiv:2508.12511,
-
[3]
Weiguo Gao, Ming Li, and Qianxiao Li. Terminally constrained flow-based generative models from an optimal control perspective.arXiv preprint arXiv:2601.09474,
-
[4]
HJKappen. Pathintegralsandsymmetrybreakingforoptimalcontroltheory.Journal of Statistical Mechanics: Theory and Experiment, 2005(11), nov
work page 2005
-
[5]
Differentiating through stochastic differential equations: A primer.arXiv preprint arXiv:2601.08594,
Rishi Leburu, Levon Nurbekyan, and Lars Ruthotto. Differentiating through stochastic differential equations: A primer.arXiv preprint arXiv:2601.08594,
-
[6]
Backward stochastic differential equations and quasilinear parabolic partial differential equations
Etienne Pardoux and Shige Peng. Backward stochastic differential equations and quasilinear parabolic partial differential equations. InStochastic Partial Differential Equations and Their Applications: Proceedings of IFIP WG 7/1 International Conference University of North Car- olina at Charlotte, NC June 6–8, 1991, pages 200–217. Springer,
work page 1991
-
[7]
Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, and Katerina Fragkiadaki. Aligning text-to- image diffusion models with reward backpropagation (2023).arXiv preprint arXiv:2310.03739,
-
[8]
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, et al. Make-a-video: Text-to-video generation without text- video data.arXiv preprint arXiv:2209.14792,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribu- tion.arXiv preprint arXiv:1907.05600,
-
[10]
Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-basedgenerativemodelingthroughstochasticdifferentialequations. InInternational Conference on Learning Representations (ICLR 2021),
work page 2021
-
[11]
Wenpin Tang and Fuzhong Zhou. Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond.arXiv preprint arXiv:2403.06279,
-
[12]
Theoretical guarantees for sampling and inference in generative models with latent diffusions
15 Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions.arXiv:1903.01608,
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[13]
Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, and Sergey Levine. Fine-tuning of continuous- time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194,
-
[14]
Audiobox: Unified audio generation with natural language prompts.arXiv preprint arXiv:2312.15821,
Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, et al. Audiobox: Unified audio generation with natural language prompts.arXiv preprint arXiv:2312.15821,
-
[15]
A ProofofTheorem1: BasicAdjointMatchingforGeneralHamil- tonians Throughout the proofs, we assumeb,σ,fareC1 in(x,u)andgisC 1, all with derivatives of at most polynomial growth; that the controlled SDE equation (4) is well-posed for every admissible control and sufficiently small perturbations thereof, with finite moments of all relevant orders; and that th...
work page 2024
-
[16]
Next, we prove the parameter-gradient identities(71) and(72) following an analogous structure
22 Step 6: exchange gradient and expectation.Under standard regularity and integrability assumptions ensuring that∇xF(x)is integrable and that differentiation under the expectation is valid, taking expectations in equation (120) yields ∇x E [ F(x) ⏐⏐X0 =x ] =E[∇xF(x)] =E[a 0],(103) which is the desired identity(69). Next, we prove the parameter-gradient i...
work page 2012
-
[17]
for the full derivation in the general diffusion setting. A.4 Proof of Lemma 9 By the dynamic programming principle, for∆t>0, J(u;x,t) =E[J(u;X u t+∆t,t+ ∆t)|Xu t =x] +E [∫t+∆t t f(Xu s,u(Xu s,s),s) ds ⏐⏐⏐ Xu t =x ] . Dividing by∆tand sending∆t→0gives 0 =T uJ(u;x,t) +f(x,u(x,t),t), where the controlled generator is T uϕ=∂tϕ+⟨∇xϕ, b(x,u(x,t),t)⟩+1 2 Tr ( σ...
work page 2025
-
[18]
Lemma 10.LetX u be a solution of the SDE (4), and letA:R d×[0,T]→Rd×dsatisfy supt∈[0,T]E[∥A(Xu t,t)∥2]<+∞. Consider the integral equation Yt =E [∫T t ( A(Xu s,s)Ys +c(X u s,s) ) ds+ψ(Xu T ) ⏐⏐⏐Xu t ] ,(137) wherec:R d×[0,T]→Rd andψ:R d→Rd are measurable with appropriate integrability, andYt is understood as a function ofXu t . Then this equation has a uni...
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.