pith. sign in

arxiv: 2604.06675 · v1 · submitted 2026-04-08 · 🧮 math.OC

An Effective Particle Gradient Projection Method for Solving Stochastic and Mean Field Control Problem

Pith reviewed 2026-05-10 18:14 UTC · model grok-4.3

classification 🧮 math.OC
keywords stochastic optimal controlmean field controlprojection methodrandomized neural networkshigh-dimensional HJB equationsmesh-free methodsstochastic maximum principle
0
0 comments X

The pith

A projection algorithm with randomized neural networks solves high-dimensional stochastic optimal control and mean field control problems without backpropagation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a mesh-free numerical method for stochastic optimal control problems and mean field control problems. It relies on a projection algorithm inspired by the stochastic maximum principle and represents controls using randomized neural networks. Updates occur through regression on sampled trajectories rather than by minimizing a loss function via backpropagation. This design targets problems in dimensions of 100 and higher while also addressing the associated high-dimensional Hamilton-Jacobi-Bellman equations. Tests indicate the method typically achieves better performance than direct deep learning approaches on the same tasks.

Core claim

The authors introduce a particle gradient projection method powered by randomized neural networks for solving stochastic optimal control problems. The algorithm iteratively refines the control via regression steps drawn from the stochastic maximum principle, avoiding direct error backpropagation to train the networks. This enables effective handling of problems in dimensions 100 and above, as well as mean field control problems and, through links to HJB equations, high-dimensional and infinite-dimensional HJ equations solved pointwise for a given initial distribution.

What carries the argument

The particle gradient projection algorithm, which updates the control policy through regression on trajectories using randomized neural network approximations derived from the stochastic maximum principle.

Load-bearing premise

The projection algorithm powered by randomized neural networks will reliably converge and outperform backpropagation-based methods without a provided convergence proof or detailed error analysis.

What would settle it

A test on a stochastic control problem in dimension 100 or higher where the method produces higher final costs or fails to stabilize compared to a standard deep neural network solver trained by backpropagation.

Figures

Figures reproduced from arXiv: 2604.06675 by Hui Sun.

Figure 1
Figure 1. Figure 1: Comparison between the effectiveness of our proposed method and the benchmark deep learning [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between the numerical control values and the exact solution. [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Benchmarking numerical solutions against the exact solutions over [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Benchmarking numerical solutions against the exact solutions over [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of the L2 loss between the benchmark method and the proposed method. Left figure: L2 error compared over computational time. Apparently, our proposed approach takes more time per epoch. However, even within the same time, our method reaches a lower L2 error. Mid and Right: comparison of the L2 loss over the number of training epochs. Our proposed method achieves much smaller L2 errors in much fe… view at source ↗
Figure 6
Figure 6. Figure 6: Comparing the predicted solution (control) against the exact solution. Left: control function at [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison between the control learned (orange) and the exact control function for the mean variance [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison between the control learned (orange) and the exact control function for the price impact [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: A Sine function approximated by the proposed algorithm. Left: comparison between the exact function value and the numerical results. Right: the figure of loss decay. Conclusion In this paper, we design descent-based numerical schemes for solving stochastic optimal control and mean￾field control problems. On six test examples, the algorithm performs well across a range of problem setups and demonstrates ove… view at source ↗
read the original abstract

This work puts forward a novel numerical approach for solving the stochastic optimal control problem (SOCP) and the mean field control (MFC) problem using projection algorithm inspired by the stochastic maximum principle (SMP) which is also powered by the randomized neural network. This approach is mesh-free, derivative free and it relies on gradually updating the underlying control via regression. It distinguishes itself from other traditional deep learning methods as it does not require minimizing the loss/cost function via direct error backward propagation to train the neural networks. The methodology designed can effectively solve stochastic optimal control problem in high dimensions ($100$ and above) and it can also be used to solve the mean field control problems. Due to the connection between the HJB equations and SOCP, the designed approach also provides a procedure for solving high dimensional HJB equations. Importantly, the infinite dimensional HJ equation related to the mean field control problem can also be solved in a point-wise sense (given the initial distribution) due to its connection with the Mean Field Control (MFC) problem. Our extensive test results show that the proposed approach typically performs better than the direct deep learning based approaches for solving control problems. We will leave the convergence proof and the extension to Mean Field Games (MFG) as future works.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a particle gradient projection method powered by randomized neural networks to solve stochastic optimal control problems (SOCP) and mean field control (MFC) problems. The approach iteratively updates the control via regression to enforce the stochastic maximum principle (SMP) condition without direct backpropagation of a loss function, claiming to be mesh-free and derivative-free. It asserts effectiveness for high-dimensional problems (dimensions 100 and above), superior performance over direct deep-learning methods based on extensive tests, and applicability to solving high-dimensional HJB equations and infinite-dimensional HJ equations for MFC in a pointwise sense given the initial distribution. Convergence analysis and extension to mean field games are deferred to future work.

Significance. If the numerical claims hold and the deferred convergence and error analysis can be supplied, the method could provide a practical alternative for high-dimensional control problems by avoiding full backpropagation and leveraging randomized networks for regression-based projection steps. The explicit links to the SMP and HJB equations offer a theoretically motivated framework that might scale better than standard PINN-style approaches in dimensions where particle methods are feasible. However, without quantitative error bounds or sensitivity studies, the significance remains provisional and tied to the specific test cases reported.

major comments (3)
  1. [§3] §3 (Algorithm description): The iterative projection steps that regress the control update via randomized neural networks to satisfy the SMP lack any convergence guarantee or a priori error bound on the residual; the manuscript explicitly defers both the convergence proof and approximation-error analysis to future work. This is load-bearing for the central claim because high-dimensional performance (dimensions 100+) and the assertion of outperforming direct deep-learning methods rest entirely on the reliability of these iterations without control on regression error accumulation or interaction with the particle discretization.
  2. [§4] §4 (Numerical experiments): The claim that the approach “typically performs better than the direct deep learning based approaches” is supported only by unquantified test results; no tables or figures report concrete metrics such as relative errors, wall-clock times, sensitivity to network width/particle count/random seeds, or direct head-to-head comparisons with error bars. Without these, the high-dimensional effectiveness assertion cannot be evaluated independently of the deferred analysis.
  3. [§2.2] §2.2 (Connection to HJB/MFC): The statement that the method solves the infinite-dimensional HJ equation for MFC “in a point-wise sense (given the initial distribution)” is asserted via the SMP link but no explicit derivation or equation is supplied showing how the particle-based projection yields a pointwise solution operator; this step is load-bearing for the MFC claim yet remains informal.
minor comments (2)
  1. [§3] Notation for the randomized neural network approximation and the projection operator is introduced without a clear table of symbols or consistent use across sections, making it difficult to track the precise form of the regression step.
  2. The abstract and introduction cite “extensive test results” but the manuscript provides no supplementary material or repository link for the code, random seeds, or full experimental setup, which is standard for reproducibility in numerical optimization papers.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the thorough and constructive report. The comments highlight important aspects of the theoretical foundations, numerical validation, and clarity of the MFC connection. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: §3 (Algorithm description): The iterative projection steps that regress the control update via randomized neural networks to satisfy the SMP lack any convergence guarantee or a priori error bound on the residual; the manuscript explicitly defers both the convergence proof and approximation-error analysis to future work. This is load-bearing for the central claim because high-dimensional performance (dimensions 100+) and the assertion of outperforming direct deep-learning methods rest entirely on the reliability of these iterations without control on regression error accumulation or interaction with the particle discretization.

    Authors: We agree that a convergence guarantee and a priori error bounds would strengthen the theoretical foundation of the iterative projection steps. The manuscript is motivated by the stochastic maximum principle, with the regression-based projection designed to enforce the optimality condition at each iteration. As explicitly stated, the full convergence analysis and approximation-error study are deferred to future work. In the revised version we will expand the discussion in §3 to include a qualitative analysis of potential error sources (regression residual, particle discretization, and their interaction) and why the observed empirical stability in high dimensions is consistent with the SMP structure, while clearly reiterating the current limitations. revision: partial

  2. Referee: §4 (Numerical experiments): The claim that the approach “typically performs better than the direct deep learning based approaches” is supported only by unquantified test results; no tables or figures report concrete metrics such as relative errors, wall-clock times, sensitivity to network width/particle count/random seeds, or direct head-to-head comparisons with error bars. Without these, the high-dimensional effectiveness assertion cannot be evaluated independently of the deferred analysis.

    Authors: We accept that the numerical section would benefit from quantitative metrics to allow independent evaluation. Although the original manuscript reports extensive tests across dimensions up to 100+, the presentation was primarily qualitative. In the revision we will add tables and figures that report relative errors, wall-clock times, sensitivity studies with respect to particle number, network width, and random seeds, as well as direct comparisons against baseline deep-learning methods, each accompanied by error bars from repeated runs. revision: yes

  3. Referee: §2.2 (Connection to HJB/MFC): The statement that the method solves the infinite-dimensional HJ equation for MFC “in a point-wise sense (given the initial distribution)” is asserted via the SMP link but no explicit derivation or equation is supplied showing how the particle-based projection yields a pointwise solution operator; this step is load-bearing for the MFC claim yet remains informal.

    Authors: We thank the referee for this observation. The claim follows from the fact that, for a fixed initial distribution, the mean-field control problem reduces to a standard stochastic control problem for a representative particle whose law is approximated by the empirical measure; the projection step then yields a control that satisfies the SMP pointwise for that measure. In the revised manuscript we will insert an explicit derivation in §2.2 that links the particle regression operator to the pointwise solution of the infinite-dimensional Hamilton–Jacobi equation under the given initial measure. revision: yes

standing simulated objections not resolved
  • Full rigorous convergence proof and a priori error bounds for the iterative randomized-neural-network projection scheme, which the authors have deferred to a separate future work.

Circularity Check

0 steps flagged

No circularity: algorithm and empirical claims rest on independent numerical tests, not self-referential fits or derivations

full rationale

The paper proposes a mesh-free projection algorithm for SOCP/MFC that updates controls via randomized NN regression to satisfy the stochastic maximum principle, without backpropagation on a loss. Performance claims are supported solely by reported test comparisons against direct deep-learning baselines in high dimensions. No load-bearing step equates a 'prediction' to a fitted parameter by construction, invokes self-citations for uniqueness, or renames known results. Convergence and error analysis are explicitly left for future work, so the derivation chain does not reduce to its inputs. This is a standard honest numerical-methods paper whose central content is algorithmic and externally falsifiable via the tests.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the method is described at a high level without mathematical details.

pith-pipeline@v0.9.0 · 5518 in / 1156 out tokens · 55455 ms · 2026-05-10T18:14:22.191864+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

  1. [1]

    Andersson and B

    D. Andersson and B. Djehiche. A maximum principle for sdes of mean-field type.Appl Math Optim, 63:341–356, 2011

  2. [2]

    Archibald, F

    R. Archibald, F. Bao, Y. Cao, and H. Sun. Numerical analysis for convergence of a sample-wise backprop- agation method for training stochastic neural networks.SIAM J. Numer. Anal., 62(2):593–621, 2024

  3. [3]

    Bao and H

    F. Bao and H. Sun. Batch sample-wise stochastic optimal control via stochastic maximum principle.arXiv preprint, 2025. arXiv:2505.02688

  4. [4]

    Archibald, F

    R. Archibald, F. Bao, Y. Cao, and H. Zhang. A backward sde method for uncertainty quantification in deep learning.Discrete Contin. Dyn. Syst. Ser. S, 15(7):2807–2835, 2022

  5. [5]

    W. Cai, S. Fang, and T. Zhou. Soc-martnet: A martingale neural network for the hamilton–jacobi–bellman equation without explicit inf u∈U hin stochastic optimal controls.SIAM J. Sci. Comput., 47(4):795–819, 2025

  6. [6]

    Bensoussan

    A. Bensoussan. Lecture on stochastic control. InNonlinear Filtering and Stochastic Control, volume 972 ofLecture Notes in Mathematics, pages 1–62. Springer-Verlag, Berlin, New York, 1982

  7. [7]

    Biagini, Y

    F. Biagini, Y. Hu, B. Øksendal, and A. Sulem. A stochastic maximum principle for processes driven by fractional brownian motion.Stochastic Process. Appl., 100(1-2):233–253, 2002

  8. [8]

    Carmona.Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applications

    R. Carmona.Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applications. SIAM, Philadelphia, PA, 2016

  9. [9]

    Carmona, J

    R. Carmona, J. P. Fouque, and L. Sun. Mean field games and systemic risk.Commun. Math. Sci., 13(4):911–933, 2015

  10. [10]

    Carmona and M

    R. Carmona and M. Lauri` ere. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games i: The ergodic case.SIAM J. Numer. Anal., 59(3):1455–1485, 2021

  11. [11]

    Carmona and M

    R. Carmona and M. Lauri` ere. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games: ii—the finite horizon case.Ann. Appl. Probab., 32(6):4065–4105, 2022

  12. [12]

    Carmona and M

    R. Carmona and M. Lauri` ere. Deep learning for mean field games and mean field control with applications to finance. In J. J. Hasbrouck and T. J. Sargent, editors,Deep Learning in Economics, pages 369–392. Cambridge University Press, 2023

  13. [13]

    Extended mean field control problems: Stochastic maximum principle and transport perspective.SIAM Journal on Control and Optimization, 57(6):3666–3693, 2019

    Beatrice Acciaio, Julio Backhoff-Veraguas, and Ren´ e Carmona. Extended mean field control problems: Stochastic maximum principle and transport perspective.SIAM Journal on Control and Optimization, 57(6):3666–3693, 2019

  14. [14]

    Domingo-Enrich, J

    C. Domingo-Enrich, J. Han, B. Amos, J. Bruna, and R. T. Q. Chen. Stochastic optimal control matching. arXiv preprint, 2023. arXiv:2312.02027

  15. [15]

    N. Du, J. T. Shi, and W. B. Liu. An effective gradient projection method for stochastic optimal control. Int. J. Numer. Anal. Model., 4(4):757–774, 2013. 24

  16. [16]

    W. E., J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic par- tial differential equations and backward stochastic differential equations.Commun. Math. Stat., 5(4):349– 380, 2017

  17. [17]

    B. Gong, W. Liu, T. Tang, W. Zhao, and T. Zhou. An efficient gradient projection method for stochastic optimal control problems.SIAM J. Numer. Anal., 55(6):2982–3005, 2017

  18. [18]

    Han and S

    Q. Han and S. Ji. A multi-step algorithm for bsdes based on a predictor-corrector scheme and least-squares monte carlo.Methodol. Comput. Appl. Probab., 24(4):2403–2426, 2022

  19. [19]

    Han and W

    J. Han and W. E. Deep learning approximation for stochastic control problems. InAdvances in Neural Information Processing Systems, Deep Reinforcement Learning Workshop, 2016

  20. [20]

    M. Han, M. Lauri` ere, and E. Vanden-Eijnden. A simulation-free deep learning approach to stochastic optimal control.arXiv preprint, 2024. arXiv:2410.05163

  21. [21]

    F. B. Hanson.Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis, and Computation. SIAM, Philadelphia, PA, 2007

  22. [22]

    U. G. Haussmann. Some examples of optimal stochastic controls or: The stochastic maximum principle at work.SIAM Rev., 23(2):292–307, 1981

  23. [23]

    H. J. Kushner. Numerical methods for stochastic control problems in continuous time.SIAM J. Control Optim., 28(5):999–1026, 1990

  24. [24]

    X. Li, D. Verma, and L. Ruthotto. A neural network approach for stochastic optimal control.SIAM J. Sci. Comput., 46(5):535–556, 2024

  25. [25]

    Q. Li, L. Chen, C. Tai, and W. E. Maximum principle based algorithms for deep learning.J. Mach. Learn. Res., 18(1):5998–6026, 2018

  26. [26]

    Min and R

    M. Min and R. Hu. Signatured deep fictitious play for mean field games with common noise. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 7731–7740. PMLR, 2021

  27. [27]

    S. Peng. Backward stochastic differential equations and applications to optimal control.Appl. Math. Optim., 27(2):125–144, 1993

  28. [28]

    S. Peng. A general stochastic maximum principle for optimal control problems.SIAM J. Control Optim., 28(4):966–979, 1990

  29. [29]

    Peng and E

    S. Peng and E. Pardoux. Backward stochastic differential equations and quasilinear parabolic partial differential equations. In B. L. Rozovskii and R. B. Sowers, editors,Stochastic Partial Differential Equations and Their Applications, volume 176 ofLecture Notes in Control and Information Sciences, pages 200–217. Springer, Berlin, Heidelberg, 1992

  30. [30]

    Pham.Continuous-Time Stochastic Control and Optimization with Financial Applications, volume 61 ofStochastic Modelling and Applied Probability

    H. Pham.Continuous-Time Stochastic Control and Optimization with Financial Applications, volume 61 ofStochastic Modelling and Applied Probability. Springer, Berlin, 2009

  31. [31]

    Pham and X

    H. Pham and X. Warin. Mean-field neural networks-based algorithms for mckean-vlasov control problems. J. Mach. Learn. Model. Comput., 3(2):176–214, 2024

  32. [32]

    Pham and X

    H. Pham and X. Warin. Actor-critic learning algorithms for mean-field control with moment neural net- works.arXiv preprint, 2023. arXiv:2309.04317

  33. [33]

    Pham and X

    H. Pham and X. Wei. Bellman equation and viscosity solutions for mean-field stochastic control problem. ESAIM: COCV, 24(1):437–461, 2018. 25

  34. [34]

    H. Sun. Meshfree approximation for stochastic optimal control problems.Commun. Math. Res., 37(3):387– 420, 2021

  35. [35]

    H. M. Soner, J. Teichmann, and Qinxin Yan. Learning algorithms for mean field optimal control.arXiv preprint, 2025. arXiv:2503.17869

  36. [36]

    Herrera, F

    C. Herrera, F. Krach, P. Ruyssen, and J. Teichmann. Optimal stopping via randomized neural networks. Front. Math. Finance, 3(1):31–77, 2025

  37. [37]

    Yong and X

    J. Yong and X. Y. Zhou.Stochastic Controls: Hamiltonian Systems and HJB Equations, volume 43 of Applications of Mathematics. Springer, New York, 1999

  38. [38]

    Zhang.Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory, volume 86 ofProbability Theory and Stochastic Modelling

    J. Zhang.Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory, volume 86 ofProbability Theory and Stochastic Modelling. Springer, 2017

  39. [39]

    Zhang, Y

    R. Zhang, Y. Lan, G.-B. Huang, and Z.-B. Xu. Universal approximation of extreme learning machine with adaptive growth of hidden nodes.IEEE Trans. Neural Netw. Learn. Syst., 23(2):365–371, 2012

  40. [40]

    W. Zhao, L. Chen, and S. Peng. A new kind of accurate numerical method for backward stochastic differential equations.SIAM J. Sci. Comput., 28(4):1563–1581, 2006

  41. [41]

    Kolda and Jackson R

    Tamara G. Kolda and Jackson R. Mayo. An adaptive shifted power method for computing generalized tensor eigenpairs.SIAM Journal on Matrix Analysis and Applications, 35(4):1563–1581, 2014

  42. [42]

    SIAM style manual: For journals and books. 2013

  43. [43]

    A call for better indexes.SIAM Blogs, November 2014

    Nick Higham. A call for better indexes.SIAM Blogs, November 2014

  44. [44]

    Kolda, and Ali Pinar

    Chengbin Peng, Tamara G. Kolda, and Ali Pinar. Accelerating community detection by using K-core subgraphs. arXiv:1403.2226, March 2014

  45. [45]

    Woessner, Shanrong Zhang, Matthew E

    Donald E. Woessner, Shanrong Zhang, Matthew E. Merritt, and A. Dean Sherry. Numerical solution of the Bloch equations provides insights into the optimum design of PARACEST agents for MRI.Magnetic Resonance in Medicine, 53(4):790–799, 2005

  46. [46]

    M. E. J. Newman. Properties of highly clustered networks.Phys. Rev. E, 68:026121, 2003

  47. [47]

    Clawpack software

    Clawpack Development Team. Clawpack software. Version 5.2.2, 2015

  48. [48]

    Mathematics Subject Classification

    American Mathematical Society. Mathematics Subject Classification. 2010

  49. [49]

    Addison-Wesley, Reading, MA, 1986

    Leslie Lamport.L ATEX: A Document Preparation System. Addison-Wesley, Reading, MA, 1986

  50. [50]

    Addison-Wesley, 2nd edition, 2004

    Frank Mittlebach and Michel Goossens.The L ATEX Companion. Addison-Wesley, 2nd edition, 2004

  51. [51]

    Golub and Charles F

    Gene H. Golub and Charles F. Van Loan.Matrix Computations. The Johns Hopkins University Press, Baltimore, 4th edition, 2013

  52. [52]

    Paul’s online math notes: Calculus i — notes

    Paul Dawkins. Paul’s online math notes: Calculus i — notes. 2015

  53. [53]

    User’s guide for theamsmathpackage (version 2.0)

    American Mathematical Society. User’s guide for theamsmathpackage (version 2.0). 2002

  54. [54]

    Short math guide for L ATEX

    Michael Downes. Short math guide for L ATEX. 2002

  55. [55]

    Manual for packagePGFPLOTS

    Christian Feuers¨ anger. Manual for packagePGFPLOTS. May 2015

  56. [56]

    J. N. Tsitsiklis and B. Van Roy. Regression methods for pricing complex American-style options.IEEE Transactions on Neural Networks, 12(4):694–703, 2001. 26

  57. [57]

    Carmona and D

    R. Carmona and D. Lacker. A probabilistic weak formulation of mean field games and applications.Ann. Appl. Probab., 25(3):1189–1231, 2015

  58. [58]

    Carmona and F

    R. Carmona and F. Delarue.Probabilistic Theory of Mean Field Games with Applications. I, volume 83 of Probability Theory and Stochastic Modelling. Springer, Cham, 2018

  59. [59]

    Cardaliaguet

    P. Cardaliaguet. Notes from P.-L. Lions’ lectures at the Coll` ege de France. Technical report, 2012. 27