arxiv: 2604.08155 · v1 · submitted 2026-04-09 · 🧮 math.OC · cs.NA· math.NA

Recognition: unknown

Dual Approaches to Stochastic Control via SPDEs and the Pathwise Hopf Formula

Mathieu Lauri\`ere , Jiefei Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:16 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA

keywords stochastic controldual boundsstochastic partial differential equationsgeneralized Hopf formulaPontryagin maximum principlehigh-dimensional controlreinforcement learning

0 comments

The pith

The generalized Hopf formula holds under mild conditions and provides dual bounds for stochastic control through the expectation of an SPDE solution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops dual approaches to continuous-time stochastic control by recasting the inner optimization problem as a stochastic partial differential equation. The expectation of the solution to this SPDE then serves as the dual bound. To enable computation in high dimensions, the authors prove the generalized Hopf formula under mild conditions and propose methods based on the Pontryagin maximum principle. This allows the dual bounds to complement primal approaches such as deep BSDE methods and actor-critic reinforcement learning without being limited by the curse of dimensionality.

Core claim

We prove the generalized Hopf formula, first introduced as a conjecture, under mild conditions. Building on the dual formulation, we formulate the inner optimization as an SPDE, and the expectation of its solution yields the dual bound. Curse-of-dimensionality-free methods are proposed based on the Pontryagin maximum principle and the generalized Hopf formula.

What carries the argument

The generalized Hopf formula, which gives a pathwise representation of the solution to the inner optimization problem in the dual formulation.

If this is right

Dual bounds become available for stochastic control in high-dimensional state and control spaces.
The dual methods complement primal solvers such as deep BSDE methods for PDEs and deep actor-critic methods in reinforcement learning.
Curse-of-dimensionality-free computation is achieved via the Pontryagin maximum principle combined with the pathwise formula.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may extend naturally to other optimal control settings where pathwise representations can be derived from SPDEs.
Numerical validation on standard high-dimensional benchmarks could quantify how much the dual bounds tighten existing primal estimates.
Links to viscosity solution theory for Hamilton-Jacobi-Bellman equations might suggest further analytic refinements.

Load-bearing premise

The dynamics and running costs of the control problem satisfy the mild conditions required for the generalized Hopf formula to hold.

What would settle it

A concrete stochastic control problem violating the mild conditions, where the dual bound obtained from the SPDE expectation fails to bound the true control value from the appropriate side.

Figures

Figures reproduced from arXiv: 2604.08155 by Jiefei Yang, Mathieu Lauri\`ere.

read the original abstract

We develop dual approaches for continuous-time stochastic control problems, enabling the computation of robust dual bounds in high-dimensional state and control spaces. Building on the dual formulation proposed in [L. C. G. Rogers, SIAM Journal on Control and Optimization, 46 (2007), pp. 1116--1132], we first formulate the inner optimization problem as a stochastic partial differential equation (SPDE); the expectation of its solution yields the dual bound. Curse-of-dimensionality-free methods are proposed based on the Pontryagin maximum principle and the generalized Hopf formula. In the process, we prove the generalized Hopf formula, first introduced as a conjecture in [Y. T. Chow, J. Darbon, S. Osher, and W. Yin, Journal of Computational Physics 387 (2019), pp. 376--409], under mild conditions. Numerical experiments demonstrate that our dual approaches effectively complement primal methods, including the deep BSDE method for solving high-dimensional PDEs and the deep actor-critic method in reinforcement learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves the generalized Hopf formula conjecture from Chow et al. 2019 and derives two new dual algorithms via SPDE and pathwise methods, but leaves the mild conditions unverified in the numerical examples.

read the letter

The main advance is the proof of the generalized Hopf formula that was flagged as a conjecture in the 2019 Chow paper, plus the derivation of two dual algorithms that turn the inner optimization into an SPDE whose expectation supplies the bound. One route uses the Pontryagin maximum principle; the other uses the pathwise Hopf formula. Both are positioned as curse-of-dimensionality-free and are tested against deep BSDE and actor-critic baselines on high-dimensional control problems. That is concrete progress over the Rogers 2007 dual formulation they start from.

Referee Report

2 major / 2 minor

Summary. The paper develops dual approaches to continuous-time stochastic control by recasting Rogers' inner optimization problem as an SPDE whose solution expectation supplies a dual bound. It proves the generalized Hopf formula (a prior conjecture) under mild conditions on dynamics and costs, proposes curse-of-dimensionality-free methods via the Pontryagin maximum principle and the pathwise Hopf formula, and presents numerical experiments showing complementarity with primal methods such as deep BSDE and actor-critic RL.

Significance. If the mild conditions hold for the problems of interest, the work provides a theoretically grounded route to rigorous dual bounds in high-dimensional stochastic control, addressing a key limitation of primal solvers. The self-contained proof of the generalized Hopf formula resolves an open conjecture and strengthens the dual framework. Explicit credit is due for the parameter-free derivation of the pathwise formula and the reproducible numerical demonstration of dual-primal complementarity.

major comments (2)

[§3 and §5] §3 (proof of generalized Hopf formula): the mild regularity/convexity/boundedness conditions required for the pathwise formula are stated but receive no explicit verification against the specific drift, Hamiltonian, and cost functions appearing in the numerical examples of §5. Without this check, the claim that the SPDE solution yields a valid dual bound in the reported experiments remains conditional and load-bearing for the practical contribution.
[§2] §2 (SPDE formulation): the derivation that the expectation of the SPDE solution equals the dual value relies on an application of stochastic calculus to the inner optimization; the precise Itô correction terms and the measurability requirements on the control process are not spelled out, making it difficult to confirm that the bound is rigorous for the non-Markovian or path-dependent cases considered later.

minor comments (2)

[Abstract and §3] A short paragraph summarizing the exact mild conditions (e.g., Lipschitz constants, convexity of the Hamiltonian) would help readers assess applicability without consulting the full proof.
[§5] Numerical tables in §5 should report both the dual bound value and a measure of Monte-Carlo variance or confidence interval to quantify the tightness of the reported bounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of the contributions, and constructive comments on rigor. We address each major comment below and will incorporate the suggested clarifications and verifications in the revised manuscript.

read point-by-point responses

Referee: [§3 and §5] §3 (proof of generalized Hopf formula): the mild regularity/convexity/boundedness conditions required for the pathwise formula are stated but receive no explicit verification against the specific drift, Hamiltonian, and cost functions appearing in the numerical examples of §5. Without this check, the claim that the SPDE solution yields a valid dual bound in the reported experiments remains conditional and load-bearing for the practical contribution.

Authors: We agree that explicit verification of the mild conditions for the numerical examples strengthens the link between theory and experiments. In the revised version we will add a dedicated subsection (or appendix) that checks the regularity, convexity, and boundedness assumptions on the drift, Hamiltonian, and cost functions for each example in §5, confirming that they satisfy the hypotheses of the generalized Hopf formula proved in §3. This will make the validity of the dual bounds in the reported experiments fully explicit. revision: yes
Referee: [§2] §2 (SPDE formulation): the derivation that the expectation of the SPDE solution equals the dual value relies on an application of stochastic calculus to the inner optimization; the precise Itô correction terms and the measurability requirements on the control process are not spelled out, making it difficult to confirm that the bound is rigorous for the non-Markovian or path-dependent cases considered later.

Authors: We acknowledge that additional detail on the stochastic calculus steps would improve transparency, particularly for non-Markovian and path-dependent controls. In the revision we will expand the derivation in §2 to explicitly display the Itô correction terms that arise when applying the stochastic calculus to the inner optimization problem and to state the precise measurability requirements on the admissible control processes. These additions will confirm that the equality between the expectation of the SPDE solution and the dual value holds under the settings used later in the paper. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained from Rogers dual formulation and standard stochastic calculus; new proof of generalized Hopf formula does not reduce to inputs or self-citation.

full rationale

The paper starts from the established Rogers (2007) dual formulation for stochastic control, recasts the inner optimization as an SPDE via standard stochastic calculus, and supplies an independent proof of the generalized Hopf formula (previously conjectured by Chow et al. 2019) under explicitly stated mild conditions on the Hamiltonian and data. No step equates a fitted parameter to a prediction, renames a known result, or relies on a load-bearing self-citation whose validity is assumed rather than re-derived. The expectation step yielding the dual bound follows directly from the SPDE representation and the proved Hopf identity without circular reduction. The mild-conditions assumption is a standard regularity hypothesis, not a definitional tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard stochastic calculus (Ito formula, martingale representation) and the mild regularity conditions needed for the Hopf formula proof. No new free parameters are introduced; the dual bound is obtained by taking an expectation rather than by fitting constants.

axioms (2)

domain assumption Standard assumptions of stochastic control (adapted processes, integrable costs, existence of optimal controls) hold for the problems considered.
Invoked when applying Rogers' dual formulation and the SPDE representation.
ad hoc to paper Mild regularity conditions on the dynamics and cost functions that make the generalized Hopf formula valid.
These conditions are the load-bearing premise of the new proof; they are stated in the paper but not enumerated in the abstract.

pith-pipeline@v0.9.0 · 5485 in / 1561 out tokens · 38409 ms · 2026-05-10T17:16:36.591811+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 5 canonical work pages

[1]

Athans , The role and use of the stochastic linear-quadratic- G aussian problem in control system design , IEEE transactions on automatic control, 16 (1971), pp

M. Athans , The role and use of the stochastic linear-quadratic- G aussian problem in control system design , IEEE transactions on automatic control, 16 (1971), pp. 529--552

1971
[2]

Bachouch, C

A. Bachouch, C. Hur \'e , N. Langren \'e , and H. Pham , Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications , Methodology and Computing in Applied Probability, 24 (2022), pp. 143--178

2022
[3]

Bayer, L

C. Bayer, L. Pelizzari, and J. Schoenmakers , Primal and dual optimal stopping with signatures , Finance and Stochastics, (2025), pp. 1--34

2025
[4]

Belomestny, C

D. Belomestny, C. Bender, and J. Schoenmakers , True upper bounds for B ermudan products via non-nested M onte C arlo , Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics, 19 (2009), pp. 53--71

2009
[5]

Belomestny, R

D. Belomestny, R. Hildebrand, and J. Schoenmakers , Optimal stopping via pathwise dual empirical maximisation , Applied Mathematics & Optimization, 79 (2019), pp. 715--741

2019
[6]

Belomestny, I

D. Belomestny, I. Levin, A. Naumov, and S. Samsonov , UVIP : Model-free approach to evaluate reinforcement learning algorithms , Journal of Optimization Theory and Applications, 208 (2026), p. 89

2026
[7]

Belomestny and J

D. Belomestny and J. Schoenmakers , Primal-dual regression approach for M arkov decision processes with general state and action spaces , SIAM Journal on Control and Optimization, 62 (2024), pp. 650--679

2024
[8]

Brze \'z niak and F

Z. Brze \'z niak and F. Flandoli , Almost sure approximation of W ong- Z akai type for stochastic partial differential equations , Stochastic processes and their applications, 55 (1995), pp. 329--358

1995
[9]

Buckdahn and J

R. Buckdahn and J. Ma , Pathwise stochastic control problems and stochastic HJB equations , SIAM journal on control and optimization, 45 (2007), pp. 2224--2256

2007
[10]

Carmona and F

R. Carmona and F. Delarue , Probabilistic theory of mean field games with applications I-II , vol. 3, Springer, 2018

2018
[11]

Caruana, P

M. Caruana, P. K. Friz, and H. Oberhauser , A (rough) pathwise approach to a class of non-linear stochastic partial differential equations , in Annales de l'Institut Henri Poincar \'e C, Analyse non lin \'e aire, vol. 28, Elsevier, 2011, pp. 27--46

2011
[12]

N. Chen, M. Liu, X. Wang, and N. Zhang , Adversarial reinforcement learning: A duality-based approach to solving optimal control problems , arXiv preprint arXiv:2506.00801, (2025)

work page arXiv 2025
[13]

N. Chen, X. Ma, Y. Liu, and W. Yu , Information relaxation and a duality-driven algorithm for stochastic dynamic programs , Operations Research, 72 (2024), pp. 2302--2320

2024
[14]

Y. T. Chow, J. Darbon, S. Osher, and W. Yin , Algorithm for overcoming the curse of dimensionality for state-dependent H amilton- J acobi equations , Journal of Computational Physics, 387 (2019), pp. 376--409

2019
[15]

V. V. Desai, V. F. Farias, and C. C. Moallemi , Bounds for M arkov decision processes , Reinforcement learning and approximate dynamic programming for feedback control, (2012), pp. 452--473

2012
[16]

2292--2308

height 2pt depth -1.6pt width 23pt, Pathwise optimization for optimal stopping problems , Management Science, 58 (2012), pp. 2292--2308

2012
[17]

Diehl, P

J. Diehl, P. K. Friz, and P. Gassiat , Stochastic control with rough paths , Applied Mathematics & Optimization, 75 (2017), pp. 285--315

2017
[18]

W. E, J. Han, and A. Jentzen , Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations , Communications in Mathematics and Statistics, 5 (2017), p. 349–380

2017
[19]

W. H. Fleming and R. W. Rishel , Deterministic and stochastic optimal control , vol. 1, Springer Science & Business Media, 2012

2012
[20]

HA Davis and G

M. HA Davis and G. Burstein , A deterministic approach to stochastic optimal control with application to anticipative control , Stochastics: An International Journal of Probability and Stochastic Processes, 40 (1992), pp. 203--256

1992
[21]

Deep Learning Approximation for Stochastic Control Problems

J. Han and W. E , Deep learning approximation for stochastic control problems , Deep Reinforcement Learning Workshop, NIPS, arXiv preprint arXiv:1611.07422, (2016)

work page Pith review arXiv 2016
[22]

M. B. Haugh and L. Kogan , Pricing american options: A duality approach , Operations Research, 52 (2004), pp. 258--270

2004
[23]

Henry-Labordere , Deep primal-dual algorithm for BSDE s: Applications of machine learning to CVA and IM , Available at SSRN 3071506, (2017)

P. Henry-Labordere , Deep primal-dual algorithm for BSDE s: Applications of machine learning to CVA and IM , Available at SSRN 3071506, (2017)

2017
[24]

Henry-Labordere, C

P. Henry-Labordere, C. Litterer, and Z. Ren , A dual algorithm for stochastic control problems: Applications to uncertain volatility models and CVA , SIAM Journal on Financial Mathematics, 7 (2016), pp. 159--182

2016
[25]

D. P. Heyman and M. J. Sobel , Stochastic models in operations research: stochastic optimization , vol. 2, Courier Corporation, 2004

2004
[26]

Hu and M

R. Hu and M. Lauriere , Recent developments in machine learning methods for stochastic control and games , Numerical Algebra, Control and Optimization, 14 (2024), pp. 435--525

2024
[27]

Hur \'e , H

C. Hur \'e , H. Pham, and X. Warin , Deep backward schemes for high-dimensional nonlinear pdes , Mathematics of Computation, 89 (2020), pp. 1547--1579

2020
[28]

X. Li, D. Verma, and L. Ruthotto , A neural network approach for stochastic optimal control , SIAM Journal on Scientific Computing, 46 (2024), pp. C535--C556

2024
[29]

Lions and P

P.-L. Lions and P. E. Souganidis , Fully nonlinear stochastic partial differential equations: non-smooth equations and applications , Comptes Rendus de l'Acad \'e mie des Sciences-Series I-Mathematics, 327 (1998), pp. 735--741

1998
[30]

N \"u sken and L

N. N \"u sken and L. Richter , Solving high-dimensional H amilton-- J acobi-- B ellman PDE s using neural networks: perspectives from the theory of controlled diffusions and measures on path space , Partial differential equations and applications, 2 (2021), p. 48

2021
[31]

Pham , Continuous-time stochastic control and optimization with financial applications , vol

H. Pham , Continuous-time stochastic control and optimization with financial applications , vol. 61, Springer Science & Business Media, 2009

2009
[32]

Recht , A tour of reinforcement learning: The view from continuous control , Annual Review of Control, Robotics, and Autonomous Systems, 2 (2019), pp

B. Recht , A tour of reinforcement learning: The view from continuous control , Annual Review of Control, Robotics, and Autonomous Systems, 2 (2019), pp. 253--279

2019
[33]

L. C. G. Rogers , M onte C arlo valuation of A merican options , Mathematical Finance, 12 (2002), pp. 271--286

2002
[34]

1116--1132

height 2pt depth -1.6pt width 23pt, Pathwise stochastic optimal control , SIAM Journal on Control and Optimization, 46 (2007), pp. 1116--1132

2007
[35]

Twardowska , An approximation theorem of W ong- Z akai type for nonlinear stochastic partial differential equations , Stochastic Analysis and Applications, 13 (1995), pp

K. Twardowska , An approximation theorem of W ong- Z akai type for nonlinear stochastic partial differential equations , Stochastic Analysis and Applications, 13 (1995), pp. 601--626

1995
[36]

Yang and G

J. Yang and G. Li , A deep primal-dual BSDE method for optimal stopping problems , arXiv preprint arXiv:2409.06937, (2024)

work page arXiv 2024
[37]

Ye and H

J. Ye and H. Y. Wong , Deep M artingale: Duality of the optimal stopping problem with expressivity , arXiv preprint arXiv:2510.13868, (2025)

work page arXiv 2025
[38]

Yegorov and P

I. Yegorov and P. M. Dower , Perspectives on characteristics based curse-of-dimensionality-free numerical approaches for solving H amilton-- J acobi equations , Applied Mathematics & Optimization, 83 (2021), pp. 1--49

2021
[39]

J. J. Yong and X. Y. Zhou , Stochastic controls : Hamiltonian systems and HJB equations , Applications of mathematics; 43., Springer, New York, 1999

1999
[40]

Solving Time-Continuous Stochastic Optimal Control Prob- lems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

M. Zhou and J. Lu , Solving time-continuous stochastic optimal control problems: Algorithm design and convergence analysis of actor-critic flow , arXiv preprint arXiv:2402.17208, (2024)

work page arXiv 2024