An Effective Particle Gradient Projection Method for Solving Stochastic and Mean Field Control Problem
Pith reviewed 2026-05-10 18:14 UTC · model grok-4.3
The pith
A projection algorithm with randomized neural networks solves high-dimensional stochastic optimal control and mean field control problems without backpropagation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a particle gradient projection method powered by randomized neural networks for solving stochastic optimal control problems. The algorithm iteratively refines the control via regression steps drawn from the stochastic maximum principle, avoiding direct error backpropagation to train the networks. This enables effective handling of problems in dimensions 100 and above, as well as mean field control problems and, through links to HJB equations, high-dimensional and infinite-dimensional HJ equations solved pointwise for a given initial distribution.
What carries the argument
The particle gradient projection algorithm, which updates the control policy through regression on trajectories using randomized neural network approximations derived from the stochastic maximum principle.
Load-bearing premise
The projection algorithm powered by randomized neural networks will reliably converge and outperform backpropagation-based methods without a provided convergence proof or detailed error analysis.
What would settle it
A test on a stochastic control problem in dimension 100 or higher where the method produces higher final costs or fails to stabilize compared to a standard deep neural network solver trained by backpropagation.
Figures
read the original abstract
This work puts forward a novel numerical approach for solving the stochastic optimal control problem (SOCP) and the mean field control (MFC) problem using projection algorithm inspired by the stochastic maximum principle (SMP) which is also powered by the randomized neural network. This approach is mesh-free, derivative free and it relies on gradually updating the underlying control via regression. It distinguishes itself from other traditional deep learning methods as it does not require minimizing the loss/cost function via direct error backward propagation to train the neural networks. The methodology designed can effectively solve stochastic optimal control problem in high dimensions ($100$ and above) and it can also be used to solve the mean field control problems. Due to the connection between the HJB equations and SOCP, the designed approach also provides a procedure for solving high dimensional HJB equations. Importantly, the infinite dimensional HJ equation related to the mean field control problem can also be solved in a point-wise sense (given the initial distribution) due to its connection with the Mean Field Control (MFC) problem. Our extensive test results show that the proposed approach typically performs better than the direct deep learning based approaches for solving control problems. We will leave the convergence proof and the extension to Mean Field Games (MFG) as future works.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a particle gradient projection method powered by randomized neural networks to solve stochastic optimal control problems (SOCP) and mean field control (MFC) problems. The approach iteratively updates the control via regression to enforce the stochastic maximum principle (SMP) condition without direct backpropagation of a loss function, claiming to be mesh-free and derivative-free. It asserts effectiveness for high-dimensional problems (dimensions 100 and above), superior performance over direct deep-learning methods based on extensive tests, and applicability to solving high-dimensional HJB equations and infinite-dimensional HJ equations for MFC in a pointwise sense given the initial distribution. Convergence analysis and extension to mean field games are deferred to future work.
Significance. If the numerical claims hold and the deferred convergence and error analysis can be supplied, the method could provide a practical alternative for high-dimensional control problems by avoiding full backpropagation and leveraging randomized networks for regression-based projection steps. The explicit links to the SMP and HJB equations offer a theoretically motivated framework that might scale better than standard PINN-style approaches in dimensions where particle methods are feasible. However, without quantitative error bounds or sensitivity studies, the significance remains provisional and tied to the specific test cases reported.
major comments (3)
- [§3] §3 (Algorithm description): The iterative projection steps that regress the control update via randomized neural networks to satisfy the SMP lack any convergence guarantee or a priori error bound on the residual; the manuscript explicitly defers both the convergence proof and approximation-error analysis to future work. This is load-bearing for the central claim because high-dimensional performance (dimensions 100+) and the assertion of outperforming direct deep-learning methods rest entirely on the reliability of these iterations without control on regression error accumulation or interaction with the particle discretization.
- [§4] §4 (Numerical experiments): The claim that the approach “typically performs better than the direct deep learning based approaches” is supported only by unquantified test results; no tables or figures report concrete metrics such as relative errors, wall-clock times, sensitivity to network width/particle count/random seeds, or direct head-to-head comparisons with error bars. Without these, the high-dimensional effectiveness assertion cannot be evaluated independently of the deferred analysis.
- [§2.2] §2.2 (Connection to HJB/MFC): The statement that the method solves the infinite-dimensional HJ equation for MFC “in a point-wise sense (given the initial distribution)” is asserted via the SMP link but no explicit derivation or equation is supplied showing how the particle-based projection yields a pointwise solution operator; this step is load-bearing for the MFC claim yet remains informal.
minor comments (2)
- [§3] Notation for the randomized neural network approximation and the projection operator is introduced without a clear table of symbols or consistent use across sections, making it difficult to track the precise form of the regression step.
- The abstract and introduction cite “extensive test results” but the manuscript provides no supplementary material or repository link for the code, random seeds, or full experimental setup, which is standard for reproducibility in numerical optimization papers.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive report. The comments highlight important aspects of the theoretical foundations, numerical validation, and clarity of the MFC connection. We address each major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: §3 (Algorithm description): The iterative projection steps that regress the control update via randomized neural networks to satisfy the SMP lack any convergence guarantee or a priori error bound on the residual; the manuscript explicitly defers both the convergence proof and approximation-error analysis to future work. This is load-bearing for the central claim because high-dimensional performance (dimensions 100+) and the assertion of outperforming direct deep-learning methods rest entirely on the reliability of these iterations without control on regression error accumulation or interaction with the particle discretization.
Authors: We agree that a convergence guarantee and a priori error bounds would strengthen the theoretical foundation of the iterative projection steps. The manuscript is motivated by the stochastic maximum principle, with the regression-based projection designed to enforce the optimality condition at each iteration. As explicitly stated, the full convergence analysis and approximation-error study are deferred to future work. In the revised version we will expand the discussion in §3 to include a qualitative analysis of potential error sources (regression residual, particle discretization, and their interaction) and why the observed empirical stability in high dimensions is consistent with the SMP structure, while clearly reiterating the current limitations. revision: partial
-
Referee: §4 (Numerical experiments): The claim that the approach “typically performs better than the direct deep learning based approaches” is supported only by unquantified test results; no tables or figures report concrete metrics such as relative errors, wall-clock times, sensitivity to network width/particle count/random seeds, or direct head-to-head comparisons with error bars. Without these, the high-dimensional effectiveness assertion cannot be evaluated independently of the deferred analysis.
Authors: We accept that the numerical section would benefit from quantitative metrics to allow independent evaluation. Although the original manuscript reports extensive tests across dimensions up to 100+, the presentation was primarily qualitative. In the revision we will add tables and figures that report relative errors, wall-clock times, sensitivity studies with respect to particle number, network width, and random seeds, as well as direct comparisons against baseline deep-learning methods, each accompanied by error bars from repeated runs. revision: yes
-
Referee: §2.2 (Connection to HJB/MFC): The statement that the method solves the infinite-dimensional HJ equation for MFC “in a point-wise sense (given the initial distribution)” is asserted via the SMP link but no explicit derivation or equation is supplied showing how the particle-based projection yields a pointwise solution operator; this step is load-bearing for the MFC claim yet remains informal.
Authors: We thank the referee for this observation. The claim follows from the fact that, for a fixed initial distribution, the mean-field control problem reduces to a standard stochastic control problem for a representative particle whose law is approximated by the empirical measure; the projection step then yields a control that satisfies the SMP pointwise for that measure. In the revised manuscript we will insert an explicit derivation in §2.2 that links the particle regression operator to the pointwise solution of the infinite-dimensional Hamilton–Jacobi equation under the given initial measure. revision: yes
- Full rigorous convergence proof and a priori error bounds for the iterative randomized-neural-network projection scheme, which the authors have deferred to a separate future work.
Circularity Check
No circularity: algorithm and empirical claims rest on independent numerical tests, not self-referential fits or derivations
full rationale
The paper proposes a mesh-free projection algorithm for SOCP/MFC that updates controls via randomized NN regression to satisfy the stochastic maximum principle, without backpropagation on a loss. Performance claims are supported solely by reported test comparisons against direct deep-learning baselines in high dimensions. No load-bearing step equates a 'prediction' to a fitted parameter by construction, invokes self-citations for uniqueness, or renames known results. Convergence and error analysis are explicitly left for future work, so the derivation chain does not reduce to its inputs. This is a standard honest numerical-methods paper whose central content is algorithmic and externally falsifiable via the tests.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
D. Andersson and B. Djehiche. A maximum principle for sdes of mean-field type.Appl Math Optim, 63:341–356, 2011
work page 2011
-
[2]
R. Archibald, F. Bao, Y. Cao, and H. Sun. Numerical analysis for convergence of a sample-wise backprop- agation method for training stochastic neural networks.SIAM J. Numer. Anal., 62(2):593–621, 2024
work page 2024
- [3]
-
[4]
R. Archibald, F. Bao, Y. Cao, and H. Zhang. A backward sde method for uncertainty quantification in deep learning.Discrete Contin. Dyn. Syst. Ser. S, 15(7):2807–2835, 2022
work page 2022
-
[5]
W. Cai, S. Fang, and T. Zhou. Soc-martnet: A martingale neural network for the hamilton–jacobi–bellman equation without explicit inf u∈U hin stochastic optimal controls.SIAM J. Sci. Comput., 47(4):795–819, 2025
work page 2025
-
[6]
A. Bensoussan. Lecture on stochastic control. InNonlinear Filtering and Stochastic Control, volume 972 ofLecture Notes in Mathematics, pages 1–62. Springer-Verlag, Berlin, New York, 1982
work page 1982
-
[7]
F. Biagini, Y. Hu, B. Øksendal, and A. Sulem. A stochastic maximum principle for processes driven by fractional brownian motion.Stochastic Process. Appl., 100(1-2):233–253, 2002
work page 2002
-
[8]
R. Carmona.Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applications. SIAM, Philadelphia, PA, 2016
work page 2016
-
[9]
R. Carmona, J. P. Fouque, and L. Sun. Mean field games and systemic risk.Commun. Math. Sci., 13(4):911–933, 2015
work page 2015
-
[10]
R. Carmona and M. Lauri` ere. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games i: The ergodic case.SIAM J. Numer. Anal., 59(3):1455–1485, 2021
work page 2021
-
[11]
R. Carmona and M. Lauri` ere. Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games: ii—the finite horizon case.Ann. Appl. Probab., 32(6):4065–4105, 2022
work page 2022
-
[12]
R. Carmona and M. Lauri` ere. Deep learning for mean field games and mean field control with applications to finance. In J. J. Hasbrouck and T. J. Sargent, editors,Deep Learning in Economics, pages 369–392. Cambridge University Press, 2023
work page 2023
-
[13]
Beatrice Acciaio, Julio Backhoff-Veraguas, and Ren´ e Carmona. Extended mean field control problems: Stochastic maximum principle and transport perspective.SIAM Journal on Control and Optimization, 57(6):3666–3693, 2019
work page 2019
-
[14]
C. Domingo-Enrich, J. Han, B. Amos, J. Bruna, and R. T. Q. Chen. Stochastic optimal control matching. arXiv preprint, 2023. arXiv:2312.02027
-
[15]
N. Du, J. T. Shi, and W. B. Liu. An effective gradient projection method for stochastic optimal control. Int. J. Numer. Anal. Model., 4(4):757–774, 2013. 24
work page 2013
-
[16]
W. E., J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic par- tial differential equations and backward stochastic differential equations.Commun. Math. Stat., 5(4):349– 380, 2017
work page 2017
-
[17]
B. Gong, W. Liu, T. Tang, W. Zhao, and T. Zhou. An efficient gradient projection method for stochastic optimal control problems.SIAM J. Numer. Anal., 55(6):2982–3005, 2017
work page 2017
- [18]
- [19]
- [20]
-
[21]
F. B. Hanson.Applied Stochastic Processes and Control for Jump-Diffusions: Modeling, Analysis, and Computation. SIAM, Philadelphia, PA, 2007
work page 2007
-
[22]
U. G. Haussmann. Some examples of optimal stochastic controls or: The stochastic maximum principle at work.SIAM Rev., 23(2):292–307, 1981
work page 1981
-
[23]
H. J. Kushner. Numerical methods for stochastic control problems in continuous time.SIAM J. Control Optim., 28(5):999–1026, 1990
work page 1990
-
[24]
X. Li, D. Verma, and L. Ruthotto. A neural network approach for stochastic optimal control.SIAM J. Sci. Comput., 46(5):535–556, 2024
work page 2024
-
[25]
Q. Li, L. Chen, C. Tai, and W. E. Maximum principle based algorithms for deep learning.J. Mach. Learn. Res., 18(1):5998–6026, 2018
work page 2018
- [26]
-
[27]
S. Peng. Backward stochastic differential equations and applications to optimal control.Appl. Math. Optim., 27(2):125–144, 1993
work page 1993
-
[28]
S. Peng. A general stochastic maximum principle for optimal control problems.SIAM J. Control Optim., 28(4):966–979, 1990
work page 1990
-
[29]
S. Peng and E. Pardoux. Backward stochastic differential equations and quasilinear parabolic partial differential equations. In B. L. Rozovskii and R. B. Sowers, editors,Stochastic Partial Differential Equations and Their Applications, volume 176 ofLecture Notes in Control and Information Sciences, pages 200–217. Springer, Berlin, Heidelberg, 1992
work page 1992
-
[30]
H. Pham.Continuous-Time Stochastic Control and Optimization with Financial Applications, volume 61 ofStochastic Modelling and Applied Probability. Springer, Berlin, 2009
work page 2009
-
[31]
H. Pham and X. Warin. Mean-field neural networks-based algorithms for mckean-vlasov control problems. J. Mach. Learn. Model. Comput., 3(2):176–214, 2024
work page 2024
-
[32]
H. Pham and X. Warin. Actor-critic learning algorithms for mean-field control with moment neural net- works.arXiv preprint, 2023. arXiv:2309.04317
-
[33]
H. Pham and X. Wei. Bellman equation and viscosity solutions for mean-field stochastic control problem. ESAIM: COCV, 24(1):437–461, 2018. 25
work page 2018
-
[34]
H. Sun. Meshfree approximation for stochastic optimal control problems.Commun. Math. Res., 37(3):387– 420, 2021
work page 2021
- [35]
-
[36]
C. Herrera, F. Krach, P. Ruyssen, and J. Teichmann. Optimal stopping via randomized neural networks. Front. Math. Finance, 3(1):31–77, 2025
work page 2025
-
[37]
J. Yong and X. Y. Zhou.Stochastic Controls: Hamiltonian Systems and HJB Equations, volume 43 of Applications of Mathematics. Springer, New York, 1999
work page 1999
-
[38]
J. Zhang.Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory, volume 86 ofProbability Theory and Stochastic Modelling. Springer, 2017
work page 2017
- [39]
-
[40]
W. Zhao, L. Chen, and S. Peng. A new kind of accurate numerical method for backward stochastic differential equations.SIAM J. Sci. Comput., 28(4):1563–1581, 2006
work page 2006
-
[41]
Tamara G. Kolda and Jackson R. Mayo. An adaptive shifted power method for computing generalized tensor eigenpairs.SIAM Journal on Matrix Analysis and Applications, 35(4):1563–1581, 2014
work page 2014
-
[42]
SIAM style manual: For journals and books. 2013
work page 2013
-
[43]
A call for better indexes.SIAM Blogs, November 2014
Nick Higham. A call for better indexes.SIAM Blogs, November 2014
work page 2014
-
[44]
Chengbin Peng, Tamara G. Kolda, and Ali Pinar. Accelerating community detection by using K-core subgraphs. arXiv:1403.2226, March 2014
-
[45]
Woessner, Shanrong Zhang, Matthew E
Donald E. Woessner, Shanrong Zhang, Matthew E. Merritt, and A. Dean Sherry. Numerical solution of the Bloch equations provides insights into the optimum design of PARACEST agents for MRI.Magnetic Resonance in Medicine, 53(4):790–799, 2005
work page 2005
-
[46]
M. E. J. Newman. Properties of highly clustered networks.Phys. Rev. E, 68:026121, 2003
work page 2003
- [47]
-
[48]
Mathematics Subject Classification
American Mathematical Society. Mathematics Subject Classification. 2010
work page 2010
-
[49]
Addison-Wesley, Reading, MA, 1986
Leslie Lamport.L ATEX: A Document Preparation System. Addison-Wesley, Reading, MA, 1986
work page 1986
-
[50]
Addison-Wesley, 2nd edition, 2004
Frank Mittlebach and Michel Goossens.The L ATEX Companion. Addison-Wesley, 2nd edition, 2004
work page 2004
-
[51]
Gene H. Golub and Charles F. Van Loan.Matrix Computations. The Johns Hopkins University Press, Baltimore, 4th edition, 2013
work page 2013
-
[52]
Paul’s online math notes: Calculus i — notes
Paul Dawkins. Paul’s online math notes: Calculus i — notes. 2015
work page 2015
-
[53]
User’s guide for theamsmathpackage (version 2.0)
American Mathematical Society. User’s guide for theamsmathpackage (version 2.0). 2002
work page 2002
- [54]
-
[55]
Christian Feuers¨ anger. Manual for packagePGFPLOTS. May 2015
work page 2015
-
[56]
J. N. Tsitsiklis and B. Van Roy. Regression methods for pricing complex American-style options.IEEE Transactions on Neural Networks, 12(4):694–703, 2001. 26
work page 2001
-
[57]
R. Carmona and D. Lacker. A probabilistic weak formulation of mean field games and applications.Ann. Appl. Probab., 25(3):1189–1231, 2015
work page 2015
-
[58]
R. Carmona and F. Delarue.Probabilistic Theory of Mean Field Games with Applications. I, volume 83 of Probability Theory and Stochastic Modelling. Springer, Cham, 2018
work page 2018
-
[59]
P. Cardaliaguet. Notes from P.-L. Lions’ lectures at the Coll` ege de France. Technical report, 2012. 27
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.