Deep Policy Iteration for High-Dimensional Mean-Field Games with Regenerative Reformulation

Hui Zhang; Shuixin Fang; Shupeng Wang; Tao Zhou; Zhen Wu

arxiv: 2604.26782 · v2 · pith:7PBBOKTZnew · submitted 2026-04-29 · 🧮 math.NA · cs.NA

Deep Policy Iteration for High-Dimensional Mean-Field Games with Regenerative Reformulation

Shuixin Fang , Shupeng Wang , Zhen Wu , Hui Zhang , Tao Zhou This is my paper

Pith reviewed 2026-05-19 17:10 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords mean-field gamespolicy iterationdeep learninghigh-dimensional problemsregenerative reformulationparticle systemsEuler-Maruyama discretizationnumerical methods

0 comments

The pith

By reformulating mean-field games into regenerative problems with deterministic cycles, deep policy iteration becomes efficient and scalable in dimensions up to 10,000.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep policy iteration algorithm for high-dimensional finite-horizon mean-field games by introducing a regenerative reformulation with deterministic cycles. This structure permits policy evaluation, policy improvement, and estimation of the population measure to occur sequentially cycle by cycle rather than over the full horizon. The population is approximated with particles that are advanced using one-step random mappings derived from Euler-Maruyama discretization, which transports mini-batches forward without repeated full simulations. Adversarial training handles evaluation while averaged optimization does improvement. Readers should care because standard approaches to mean-field games break down in high dimensions due to the need to solve large coupled systems or simulate long trajectories repeatedly.

Core claim

The authors claim that the mean-field game can be recast as a regenerative problem with deterministic cycles. Within this setup, the population measure is tracked by a particle system whose states are updated from one cycle to the next by a single random mapping coming from the Euler-Maruyama scheme applied to the controlled dynamics. Policy evaluation and improvement are then defined through the relations that hold between consecutive cycles, with the former solved via adversarial training and the latter via averaged optimization. The resulting procedure sidesteps the coupled Hamilton-Jacobi-Bellman and Fokker-Planck equations, avoids simulating entire trajectories at every iteration, disp

What carries the argument

Regenerative reformulation with deterministic cycles, which decomposes the game so that updates to the population measure and policy steps can be performed using one-step particle mappings between cycles.

If this is right

The method avoids direct solution of the coupled Hamilton-Jacobi-Bellman and Fokker-Planck system.
It avoids the full simulation of trajectories to estimate the population measure at each iteration.
It avoids the explicit computation of conditional expectations in policy evaluation.
It avoids pointwise optimization in policy improvement.
Numerical experiments show effective performance in dimensions up to 10,000.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This cycle-based particle update could reduce computational cost in other stochastic control problems involving large populations.
Extending the regenerative structure to infinite-horizon settings might require defining appropriate cycle lengths based on ergodicity assumptions.
The use of mini-batch particle transport suggests potential for parallelization on modern hardware.

Load-bearing premise

The mean-field game must admit a reformulation as a regenerative problem with deterministic cycles so that all subproblems can be solved accurately using cycle-by-cycle particle approximations from the Euler-Maruyama discretization.

What would settle it

Observing that the approximated population measures diverge from the true distribution or that the learned policies fail to satisfy the mean-field equilibrium condition as dimension increases beyond 1,000 would falsify the scalability of the method.

Figures

Figures reproduced from arXiv: 2604.26782 by Hui Zhang, Shuixin Fang, Shupeng Wang, Tao Zhou, Zhen Wu.

**Figure 1.** Figure 1: Numerical results of Algorithm 1 for LQ-1, -2, and -3 in section 4.1 with view at source ↗

**Figure 2.** Figure 2: Numerical results of Algorithm 1 for LQ-1, -2, and -3 in section 4.1 with view at source ↗

**Figure 3.** Figure 3: Numerical results of Algorithm 1 for LQ-1 in section 4.1 with view at source ↗

**Figure 4.** Figure 4: 18 view at source ↗

**Figure 4.** Figure 4: Results of Algorithm 1 for the MFG in section 4.2. (Upper left) Loss versus view at source ↗

**Figure 5.** Figure 5: Results of Algorithm 1 for the MFG in section 4.3. (Upper left) Loss versus view at source ↗

read the original abstract

This paper develops a deep policy iteration method for high-dimensional finite-horizon mean-field games (MFG). We reformulate the game as a regenerative problem with deterministic cycles, which allows policy evaluation (PE), policy improvement (PI), and population measure estimation to be carried out cycle by cycle. Within this formulation, we approximate the population measure by a particle system and update it using a one-step random mapping induced by the Euler-Maruyama discretization of the state dynamics. This update transports a mini-batch of particles from one cycle to the next, avoiding sequential trajectory simulation over the entire time horizon at each iteration. The PE and PI subproblems are formulated through the relation between consecutive cycles, with adversarial training used for evaluation and averaged optimization used for improvement. The resulting method is efficient and scalable in high dimensions, as it avoids the direct solution of the coupled Hamilton-Jacobi-Bellman and Fokker-Planck system, the full simulation of trajectories to estimate the population measure, the explicit computation of conditional expectations in policy evaluation, and pointwise optimization in policy improvement. Numerical experiments demonstrate that the proposed method effectively handles dimensions up to 10,000.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Regenerative cycles and one-step particle updates give a scalable path for high-dim MFGs, but the exactness of the reformulation is the key thing to verify.

read the letter

This paper's main contribution is a regenerative reformulation that turns finite-horizon mean-field games into a sequence of deterministic cycles. Policy evaluation, improvement, and measure updates then happen cycle by cycle instead of over the whole horizon. The novelty lies in using one-step random mappings from the Euler-Maruyama scheme to transport particles between cycles. This replaces full trajectory simulations for estimating the population measure. They pair it with adversarial training to handle the evaluation subproblem and averaged optimization for the improvement step. These choices let the method skip the usual coupled PDE system and pointwise optimizations. The experiments reportedly scale to dimensions of 10,000, which would be a step forward if the accuracy holds. What works here is the focus on practical computation. By breaking the problem this way, they reduce the cost of repeated full simulations and explicit conditional expectations. For applications needing quick approximations in high dimensions, this could be a useful template. The potential issue is whether the reformulation stays faithful to the original game. The stress-test raises a fair point about possible bias from choosing a fixed cycle length or from accumulated discretization errors in the one-step updates. If the drift or diffusion depends on the state in complicated ways, or if the horizon is long, the cycle relations might only approximate the true equilibrium rather than match it exactly. The abstract does not include error bounds or a full derivation, so the strength of the scalability claim depends on how well the full paper addresses this. This kind of work is for people developing numerical tools for mean-field games and related control problems. A reader who wants ideas for handling very high-dimensional cases would find the algorithmic structure worth examining and perhaps adapting. I recommend sending it to peer review. The idea is concrete enough to benefit from expert feedback on the reformulation details and additional validation experiments.

Referee Report

3 major / 2 minor

Summary. This manuscript develops a deep policy iteration algorithm for high-dimensional finite-horizon mean-field games. The central contribution is a regenerative reformulation of the MFG as a problem with deterministic cycles, which permits cycle-by-cycle policy evaluation (via adversarial training), policy improvement (via averaged optimization), and population-measure estimation (via a particle system updated by one-step Euler-Maruyama random mappings). The method is asserted to avoid direct solution of the coupled HJB-FP system, full-trajectory simulation, explicit conditional expectations, and pointwise optimization, with numerical results reported for state dimensions up to 10,000.

Significance. If the regenerative reformulation is rigorously equivalent to the original finite-horizon MFG and the particle and neural approximations converge at controllable rates, the approach would constitute a meaningful advance for scalable numerical solution of high-dimensional MFGs. The explicit avoidance of several standard computational bottlenecks and the reported ability to reach d=10,000 are concrete strengths that, if substantiated, could influence subsequent work on mean-field control and games.

major comments (3)

[§2] §2 (Regenerative reformulation): The manuscript introduces the deterministic-cycle reformulation and states that PE/PI are formulated 'through the relation between consecutive cycles,' yet provides neither a derivation establishing exact equivalence to the original finite-horizon MFG nor an error bound quantifying the bias introduced by a fixed cycle length. Because the central scalability claim rests on solving the true mean-field Nash equilibrium rather than an altered problem, this equivalence must be proved or the approximation error controlled.
[§3.2] §3.2 (One-step particle update): The population measure is transported by a single Euler-Maruyama step per cycle. No global error analysis or stability estimate is given for the accumulated local truncation error over many cycles, especially when the drift or diffusion coefficients are state-dependent. This directly affects the reliability of the measure approximation that underpins both the policy-evaluation and policy-improvement steps.
[§4] §4 (Numerical experiments): Results are presented for dimensions up to 10,000, but the experiments section supplies neither quantitative error metrics against known low-dimensional solutions nor comparisons with existing MFG solvers. Without such validation, the claim that the method 'effectively handles' these dimensions remains difficult to assess.

minor comments (2)

The notation for cycle length and the precise definition of the 'one-step random mapping' should be introduced once and used consistently; occasional redefinition in later sections reduces readability.
Figure captions for the particle-transport diagrams would benefit from explicit mention of the mini-batch size and the Euler-Maruyama step size employed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. We will address each of the major comments in detail below and make the necessary revisions to the manuscript.

read point-by-point responses

Referee: [§2] §2 (Regenerative reformulation): The manuscript introduces the deterministic-cycle reformulation and states that PE/PI are formulated 'through the relation between consecutive cycles,' yet provides neither a derivation establishing exact equivalence to the original finite-horizon MFG nor an error bound quantifying the bias introduced by a fixed cycle length. Because the central scalability claim rests on solving the true mean-field Nash equilibrium rather than an altered problem, this equivalence must be proved or the approximation error controlled.

Authors: We agree with the referee that establishing the equivalence rigorously is crucial. In the revised manuscript, we will expand §2 to include a complete derivation of the regenerative reformulation, demonstrating its exact equivalence to the original finite-horizon MFG under the deterministic cycle structure. We will also derive an error bound for the approximation error induced by a fixed cycle length, showing that this bias can be made arbitrarily small by appropriate selection of the cycle length relative to the time horizon. This will confirm that the method targets the true mean-field Nash equilibrium. revision: yes
Referee: [§3.2] §3.2 (One-step particle update): The population measure is transported by a single Euler-Maruyama step per cycle. No global error analysis or stability estimate is given for the accumulated local truncation error over many cycles, especially when the drift or diffusion coefficients are state-dependent. This directly affects the reliability of the measure approximation that underpins both the policy-evaluation and policy-improvement steps.

Authors: The referee is correct that a global error analysis is currently missing. We will revise §3.2 to incorporate a detailed stability estimate and global error bound for the accumulated truncation errors over the cycles. Drawing on numerical analysis for SDEs, we will bound the error in the particle system approximation of the population measure, taking into account state-dependent coefficients. This addition will provide the necessary guarantees for the accuracy of the measure estimates used in the PE and PI procedures. revision: yes
Referee: [§4] §4 (Numerical experiments): Results are presented for dimensions up to 10,000, but the experiments section supplies neither quantitative error metrics against known low-dimensional solutions nor comparisons with existing MFG solvers. Without such validation, the claim that the method 'effectively handles' these dimensions remains difficult to assess.

Authors: We appreciate this observation and will enhance the numerical experiments section. In the revision, we will add quantitative error metrics, including comparisons to analytical or high-accuracy reference solutions in low-dimensional settings (such as d ≤ 5). We will also provide benchmark comparisons against other state-of-the-art MFG solvers, including neural network-based methods and traditional discretization approaches, to highlight the scalability and performance advantages of our method in high dimensions up to 10,000. revision: yes

Circularity Check

0 steps flagged

No significant circularity; algorithmic reformulation is self-contained

full rationale

The paper proposes a deep policy iteration algorithm for finite-horizon MFGs by introducing a regenerative reformulation with deterministic cycles, particle approximations, and one-step Euler-Maruyama updates for population measure transport. Policy evaluation uses adversarial training and policy improvement uses averaged optimization, both formulated via consecutive-cycle relations. No equations or steps are presented that reduce the claimed scalability or equilibrium approximation to fitted parameters, self-definitions, or load-bearing self-citations by construction. The derivation chain consists of standard discretization and approximation techniques applied to the reformulated problem, with numerical validation in dimensions up to 10,000 serving as external check. This qualifies as an independent algorithmic construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the ledger captures the core modeling assumption; standard numerical tools like Euler-Maruyama are not counted as invented here.

axioms (1)

domain assumption The mean-field game admits a regenerative reformulation with deterministic cycles that preserves the original dynamics for cycle-by-cycle policy evaluation and improvement.
This premise enables the avoidance of full-horizon simulation and is invoked to justify the particle transport and subproblem formulations.

pith-pipeline@v0.9.0 · 5740 in / 1367 out tokens · 53706 ms · 2026-05-19T17:10:15.247911+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We reformulate the game as a regenerative problem with deterministic cycles, which allows policy evaluation (PE), policy improvement (PI), and population measure estimation to be carried out cycle by cycle... update it using a one-step random mapping induced by the Euler-Maruyama discretization
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Numerical experiments demonstrate that the proposed method effectively handles dimensions up to 10,000

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

Mean field games for modeling crowd motion

Yves Achdou and Jean-Michel Lasry. Mean field games for modeling crowd motion. In Contributions to partial differential equations and applications, volume 47 ofComput. Methods Appl. Sci., pages 17–42. Springer, Cham, 2019

work page 2019
[2]

Extensions of the deep Galerkin method.Appl

Ali Al-Aradi, Adolfo Correia, Gabriel Jardim, Danilo de Freitas Naiff, and Yuri Saporito. Extensions of the deep Galerkin method.Appl. Math. Comput., 430:Paper No. 127287, 18, 2022

work page 2022
[3]

A maximum principle for SDEs of mean-field type.Appl

Daniel Andersson and Boualem Djehiche. A maximum principle for SDEs of mean-field type.Appl. Math. Optim., 63(3):341–356, 2011

work page 2011
[4]

SpringerBriefs in Mathematics

Alain Bensoussan, Jens Frehse, and Phillip Yam.Mean field games and mean field type control theory. SpringerBriefs in Mathematics. Springer, New York, 2013

work page 2013
[5]

Mean field control and mean field game models with several populations.Minimax Theory Appl., 3(2):173–209, 2018

Alain Bensoussan, Tao Huang, and Mathieu Lauri` ere. Mean field control and mean field game models with several populations.Minimax Theory Appl., 3(2):173–209, 2018. 23

work page 2018
[6]

and Zhou, T

Wei Cai, Shuixin Fang, Wenzhong Zhang, and Tao Zhou. Martingale deep learning for very high dimensional quasi-linear partial differential equations and stochastic optimal controls.arXiv preprint arXiv:2408.14395, 2024

work page arXiv 2024
[7]

SOC-MartNet: A martingale neural network for the hamilton-jacobi-bellman equation without explicit inf u∈U Hin stochastic optimal controls.SIAM J

Wei Cai, Shuixin Fang, and Tao Zhou. SOC-MartNet: A martingale neural network for the hamilton-jacobi-bellman equation without explicit inf u∈U Hin stochastic optimal controls.SIAM J. Sci. Comput., 47(4):C795–C819, 2025

work page 2025
[8]

Deep random difference method for high- dimensional quasilinear parabolic partial differential equations.J

Wei Cai, Shuixin Fang, and Tao Zhou. Deep random difference method for high- dimensional quasilinear parabolic partial differential equations.J. Comput. Phys., page 114767, 2026

work page 2026
[9]

DeepMartNet: a Martingale-based deep neural network learning method for Dirichlet BVPs and eigenvalue problems of elliptic PDEs inR d.SIAM J

Wei Cai, Andrew He, and Daniel Margolis. DeepMartNet: a Martingale-based deep neural network learning method for Dirichlet BVPs and eigenvalue problems of elliptic PDEs inR d.SIAM J. Sci. Comput., 48(1):C25–C50, 2026

work page 2026
[10]

Cardaliaguet, J.-M

P. Cardaliaguet, J.-M. Lasry, P.-L. Lions, and A. Porretta. Long time average of mean field games with a nonlocal coupling.SIAM J. Control Optim., 51(5):3558–3591, 2013

work page 2013
[11]

Notes on mean field games

Pierre Cardaliaguet. Notes on mean field games. Technical report, Technical report Technical report, 2010

work page 2010
[12]

I, volume 83 ofProbability Theory and Stochastic Modelling

Ren´ e Carmona and Fran¸ cois Delarue.Probabilistic theory of mean field games with applications. I, volume 83 ofProbability Theory and Stochastic Modelling. Springer, Cham, 2018. Mean field FBSDEs, control, and games

work page 2018
[13]

Mean field games and systemic risk.Commun

Ren´ e Carmona, Jean-Pierre Fouque, and Li-Hsien Sun. Mean field games and systemic risk.Commun. Math. Sci., 13(4):911–933, 2015

work page 2015
[14]

A probabilistic weak formulation of mean field games and applications.Ann

Ren´ e Carmona and Daniel Lacker. A probabilistic weak formulation of mean field games and applications.Ann. Appl. Probab., 25(3):1189–1231, 2015

work page 2015
[15]

Discrete time mean-field stochastic linear- quadratic optimal control problems.Automatica J

Robert Elliott, Xun Li, and Yuan-Hua Ni. Discrete time mean-field stochastic linear- quadratic optimal control problems.Automatica J. IFAC, 49(11):3222–3233, 2013

work page 2013
[16]

Failure-informed adaptive sampling for PINNs

Zhiwei Gao, Liang Yan, and Tao Zhou. Failure-informed adaptive sampling for PINNs. SIAM J. Sci. Comput., 45(4):A1971–A1994, 2023

work page 2023
[17]

Large deviations for a mean field model of systemic risk.SIAM J

Josselin Garnier, George Papanicolaou, and Tzu-Wei Yang. Large deviations for a mean field model of systemic risk.SIAM J. Financial Math., 4(1):151–184, 2013

work page 2013
[18]

Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J

Maximilien Germain, Huyˆ en Pham, and Xavier Warin. Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J. Sci. Comput., 44(1):A28– A56, 2022. 24

work page 2022
[19]

Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

work page 2018
[20]

Learning physics-informed neural networks without stacked back- propagation

Di He, Shanda Li, Wenlei Shi, Xiaotian Gao, Jia Zhang, Jiang Bian, Liwei Wang, and Tie-Yan Liu. Learning physics-informed neural networks without stacked back- propagation. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statis- tics, volume 206 ofProceed...

work page 2023
[21]

Hutchinson trace estimation for high-dimensional and high-order physics-informed neural networks

Zheyuan Hu, Zekun Shi, George Em Karniadakis, and Kenji Kawaguchi. Hutchinson trace estimation for high-dimensional and high-order physics-informed neural networks. Comput. Methods Appl. Mech. Engrg., 424:Paper No. 116883, 17, 2024

work page 2024
[22]

Tackling the curse of dimensionality with physics-informed neural networks.Neural Networks, 176:106369, 2024

Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, and Kenji Kawaguchi. Tackling the curse of dimensionality with physics-informed neural networks.Neural Networks, 176:106369, 2024

work page 2024
[23]

Karniadakis, and Kenji Kawaguchi

Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George E. Karniadakis, and Kenji Kawaguchi. Bias-Variance Trade-Off in Physics-Informed Neural Networks with Ran- domized Smoothing for High-Dimensional PDEs.SIAM J. Sci. Comput., 47(4):C846– C872, 2025

work page 2025
[24]

Large-population LQG games involving a major player: the Nash cer- tainty equivalence principle.SIAM J

Minyi Huang. Large-population LQG games involving a major player: the Nash cer- tainty equivalence principle.SIAM J. Control Optim., 48(5):3318–3353, 2009/10

work page 2009
[25]

Caines, and Roland P

Minyi Huang, Peter E. Caines, and Roland P. Malham´ e. Social optima in mean field LQG control: centralized and decentralized strategies.IEEE Trans. Automat. Control, 57(7):1736–1751, 2012

work page 2012
[26]

Deep backward schemes for high- dimensional nonlinear PDEs.Math

Cˆ ome Hur´ e, Huyˆ en Pham, and Xavier Warin. Deep backward schemes for high- dimensional nonlinear PDEs.Math. Comp., 89(324):1547–1579, 2020

work page 2020
[27]

Policy evaluation and temporal-difference learning in con- tinuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

Yanwei Jia and Xun Yu Zhou. Policy evaluation and temporal-difference learning in con- tinuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

work page 2022
[28]

Policy gradient and actor-critic learning in continu- ous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

Yanwei Jia and Xun Yu Zhou. Policy gradient and actor-critic learning in continu- ous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

work page 2022
[29]

Springer Cham, third edition, 2020

Achim Klenke.Probability Theory. Springer Cham, third edition, 2020. 25

work page 2020
[30]

Kloeden and Eckhard Platen.Numerical solution of stochastic differential equations, volume 23 ofApplications of Mathematics (New York)

Peter E. Kloeden and Eckhard Platen.Numerical solution of stochastic differential equations, volume 23 ofApplications of Mathematics (New York). Springer-Verlag, Berlin, 1992

work page 1992
[31]

Efficiency of the price formation process in presence of high frequency participants: a mean field game analysis.Math

Aim´ e Lachapelle, Jean-Michel Lasry, Charles-Albert Lehalle, and Pierre-Louis Lions. Efficiency of the price formation process in presence of high frequency participants: a mean field game analysis.Math. Financ. Econ., 10(3):223–262, 2016

work page 2016
[32]

Computation of mean field equilibria in economics.Math

Aime Lachapelle, Julien Salomon, and Gabriel Turinici. Computation of mean field equilibria in economics.Math. Models Methods Appl. Sci., 20(4):567–588, 2010

work page 2010
[33]

On a mean field game approach mod- eling congestion and aversion in pedestrian crowds.Transportation research part B: methodological, 45(10):1572–1589, 2011

Aim´ e Lachapelle and Marie-Therese Wolfram. On a mean field game approach mod- eling congestion and aversion in pedestrian crowds.Transportation research part B: methodological, 45(10):1572–1589, 2011

work page 2011
[34]

A neural network approach for stochastic optimal control.SIAM J

Xingjian Li, Deepanshu Verma, and Lars Ruthotto. A neural network approach for stochastic optimal control.SIAM J. Sci. Comput., 46(5):C535–C556, 2024

work page 2024
[35]

Multi-scale deep neural network (MscaleDNN) for solving Poisson-Boltzmann equation in complex domains.Commun

Ziqi Liu, Wei Cai, and Zhi-Qin John Xu. Multi-scale deep neural network (MscaleDNN) for solving Poisson-Boltzmann equation in complex domains.Commun. Comput. Phys., 28(5):1970–2001, 2020

work page 1970
[36]

On bellman equations for continuous-time policy eval- uation i: discretization and approximation, 2024

Wenlong Mou and Yuhua Zhu. On bellman equations for continuous-time policy eval- uation i: discretization and approximation, 2024

work page 2024
[37]

Springer-Verlag, Berlin, 2009

Huyˆ en Pham.Continuous-time stochastic control and optimization with financial ap- plications, volume 61 ofStochastic Modelling and Applied Probability. Springer-Verlag, Berlin, 2009

work page 2009
[38]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys., 378:686–707, 2019

work page 2019
[39]

Deep neural networks motivated by partial differential equations.J

Lars Ruthotto and Eldad Haber. Deep neural networks motivated by partial differential equations.J. Math. Imaging Vision, 62(3):352–364, 2020

work page 2020
[40]

Osher, Wuchen Li, Levon Nurbekyan, and Samy Wu Fung

Lars Ruthotto, Stanley J. Osher, Wuchen Li, Levon Nurbekyan, and Samy Wu Fung. A machine learning framework for solving high-dimensional mean field game and mean field control problems.Proc. Natl. Acad. Sci. USA, 117(17):9183–9193, 2020

work page 2020
[41]

Stochastic taylor derivative estimator: Efficient amortization for arbitrary differential operators

Zekun Shi, Zheyuan Hu, Min Lin, and Kenji Kawaguchi. Stochastic taylor derivative estimator: Efficient amortization for arbitrary differential operators. InThe Thirty- eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024
[42]

DGM: a deep learning algorithm for solving partial differential equations.J

Justin Sirignano and Konstantinos Spiliopoulos. DGM: a deep learning algorithm for solving partial differential equations.J. Comput. Phys., 375:1339–1364, 2018. 26

work page 2018
[43]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement learning. An introduction. Adapt. Comput. Mach. Learn. Cambridge, MA: MIT Press, 2nd expanded and updated edition edition, 2018

work page 2018
[44]

Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations.Journal of Compu- tational Physics, 476:111868, 2023

Kejun Tang, Xiaoliang Wan, and Chao Yang. Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations.Journal of Compu- tational Physics, 476:111868, 2023

work page 2023
[45]

Adaptive importance sampling for deep Ritz.Commun

Xiaoliang Wan, Tao Zhou, and Yuancheng Zhou. Adaptive importance sampling for deep Ritz.Commun. Appl. Math. Comput., 7(3):929–953, 2025

work page 2025
[46]

A deep shotgun method for solving high-dimensional parabolic partial differential equations.J

Wenjun Xu and Wenzhong Zhang. A deep shotgun method for solving high-dimensional parabolic partial differential equations.J. Sci. Comput., 104(2):69, 2025

work page 2025
[47]

Linear-quadratic optimal control problems for mean-field stochastic differential equations.SIAM J

Jiongmin Yong. Linear-quadratic optimal control problems for mean-field stochastic differential equations.SIAM J. Control Optim., 51(4):2809–2838, 2013

work page 2013
[48]

Springer-Verlag, New York, 1999

Jiongmin Yong and Xun Yu Zhou.Stochastic controls, volume 43 ofApplications of Mathematics (New York). Springer-Verlag, New York, 1999. Hamiltonian systems and HJB equations

work page 1999
[49]

Weak adversarial networks for high-dimensional partial differential equations.J

Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou. Weak adversarial networks for high-dimensional partial differential equations.J. Comput. Phys., 411:109409, 14, 2020

work page 2020
[50]

FBSDE based neural network algorithms for high- dimensional quasilinear parabolic PDEs.J

Wenzhong Zhang and Wei Cai. FBSDE based neural network algorithms for high- dimensional quasilinear parabolic PDEs.J. Comput. Phys., 470:Paper No. 111557, 14, 2022

work page 2022
[51]

Actor-critic method for high dimensional static Hamilton-Jacobi-Bellman partial differential equations based on neural networks.SIAM J

Mo Zhou, Jiequn Han, and Jianfeng Lu. Actor-critic method for high dimensional static Hamilton-Jacobi-Bellman partial differential equations based on neural networks.SIAM J. Sci. Comput., 43(6):A4043–A4066, 2021

work page 2021
[52]

Solving time-continuous stochastic optimal control prob- lems: Algorithm design and convergence analysis of actor-critic flow

Mo Zhou and Jianfeng Lu. Solving Time-Continuous Stochastic Optimal Control Prob- lems: Algorithm Design and Convergence Analysis of Actor-Critic Flow. Preprint, arXiv:2402.17208 [math.OC] (2024), 2024

work page arXiv 2024
[53]

A policy gradient framework for stochastic optimal control problems with global convergence guarantee.SIAM J

Mo Zhou and Jianfeng Lu. A policy gradient framework for stochastic optimal control problems with global convergence guarantee.SIAM J. Control Optim., 63(4):2605–2631, 2025

work page 2025
[54]

Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning

Yuhua Zhu, Yuming Zhang, and Haoyu Zhang. Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning. Preprint, arXiv:2506.05208 [math.OC] (2025), 2025. 27

work page arXiv 2025

[1] [1]

Mean field games for modeling crowd motion

Yves Achdou and Jean-Michel Lasry. Mean field games for modeling crowd motion. In Contributions to partial differential equations and applications, volume 47 ofComput. Methods Appl. Sci., pages 17–42. Springer, Cham, 2019

work page 2019

[2] [2]

Extensions of the deep Galerkin method.Appl

Ali Al-Aradi, Adolfo Correia, Gabriel Jardim, Danilo de Freitas Naiff, and Yuri Saporito. Extensions of the deep Galerkin method.Appl. Math. Comput., 430:Paper No. 127287, 18, 2022

work page 2022

[3] [3]

A maximum principle for SDEs of mean-field type.Appl

Daniel Andersson and Boualem Djehiche. A maximum principle for SDEs of mean-field type.Appl. Math. Optim., 63(3):341–356, 2011

work page 2011

[4] [4]

SpringerBriefs in Mathematics

Alain Bensoussan, Jens Frehse, and Phillip Yam.Mean field games and mean field type control theory. SpringerBriefs in Mathematics. Springer, New York, 2013

work page 2013

[5] [5]

Mean field control and mean field game models with several populations.Minimax Theory Appl., 3(2):173–209, 2018

Alain Bensoussan, Tao Huang, and Mathieu Lauri` ere. Mean field control and mean field game models with several populations.Minimax Theory Appl., 3(2):173–209, 2018. 23

work page 2018

[6] [6]

and Zhou, T

Wei Cai, Shuixin Fang, Wenzhong Zhang, and Tao Zhou. Martingale deep learning for very high dimensional quasi-linear partial differential equations and stochastic optimal controls.arXiv preprint arXiv:2408.14395, 2024

work page arXiv 2024

[7] [7]

SOC-MartNet: A martingale neural network for the hamilton-jacobi-bellman equation without explicit inf u∈U Hin stochastic optimal controls.SIAM J

Wei Cai, Shuixin Fang, and Tao Zhou. SOC-MartNet: A martingale neural network for the hamilton-jacobi-bellman equation without explicit inf u∈U Hin stochastic optimal controls.SIAM J. Sci. Comput., 47(4):C795–C819, 2025

work page 2025

[8] [8]

Deep random difference method for high- dimensional quasilinear parabolic partial differential equations.J

Wei Cai, Shuixin Fang, and Tao Zhou. Deep random difference method for high- dimensional quasilinear parabolic partial differential equations.J. Comput. Phys., page 114767, 2026

work page 2026

[9] [9]

DeepMartNet: a Martingale-based deep neural network learning method for Dirichlet BVPs and eigenvalue problems of elliptic PDEs inR d.SIAM J

Wei Cai, Andrew He, and Daniel Margolis. DeepMartNet: a Martingale-based deep neural network learning method for Dirichlet BVPs and eigenvalue problems of elliptic PDEs inR d.SIAM J. Sci. Comput., 48(1):C25–C50, 2026

work page 2026

[10] [10]

Cardaliaguet, J.-M

P. Cardaliaguet, J.-M. Lasry, P.-L. Lions, and A. Porretta. Long time average of mean field games with a nonlocal coupling.SIAM J. Control Optim., 51(5):3558–3591, 2013

work page 2013

[11] [11]

Notes on mean field games

Pierre Cardaliaguet. Notes on mean field games. Technical report, Technical report Technical report, 2010

work page 2010

[12] [12]

I, volume 83 ofProbability Theory and Stochastic Modelling

Ren´ e Carmona and Fran¸ cois Delarue.Probabilistic theory of mean field games with applications. I, volume 83 ofProbability Theory and Stochastic Modelling. Springer, Cham, 2018. Mean field FBSDEs, control, and games

work page 2018

[13] [13]

Mean field games and systemic risk.Commun

Ren´ e Carmona, Jean-Pierre Fouque, and Li-Hsien Sun. Mean field games and systemic risk.Commun. Math. Sci., 13(4):911–933, 2015

work page 2015

[14] [14]

A probabilistic weak formulation of mean field games and applications.Ann

Ren´ e Carmona and Daniel Lacker. A probabilistic weak formulation of mean field games and applications.Ann. Appl. Probab., 25(3):1189–1231, 2015

work page 2015

[15] [15]

Discrete time mean-field stochastic linear- quadratic optimal control problems.Automatica J

Robert Elliott, Xun Li, and Yuan-Hua Ni. Discrete time mean-field stochastic linear- quadratic optimal control problems.Automatica J. IFAC, 49(11):3222–3233, 2013

work page 2013

[16] [16]

Failure-informed adaptive sampling for PINNs

Zhiwei Gao, Liang Yan, and Tao Zhou. Failure-informed adaptive sampling for PINNs. SIAM J. Sci. Comput., 45(4):A1971–A1994, 2023

work page 2023

[17] [17]

Large deviations for a mean field model of systemic risk.SIAM J

Josselin Garnier, George Papanicolaou, and Tzu-Wei Yang. Large deviations for a mean field model of systemic risk.SIAM J. Financial Math., 4(1):151–184, 2013

work page 2013

[18] [18]

Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J

Maximilien Germain, Huyˆ en Pham, and Xavier Warin. Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J. Sci. Comput., 44(1):A28– A56, 2022. 24

work page 2022

[19] [19]

Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

work page 2018

[20] [20]

Learning physics-informed neural networks without stacked back- propagation

Di He, Shanda Li, Wenlei Shi, Xiaotian Gao, Jia Zhang, Jiang Bian, Liwei Wang, and Tie-Yan Liu. Learning physics-informed neural networks without stacked back- propagation. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statis- tics, volume 206 ofProceed...

work page 2023

[21] [21]

Hutchinson trace estimation for high-dimensional and high-order physics-informed neural networks

Zheyuan Hu, Zekun Shi, George Em Karniadakis, and Kenji Kawaguchi. Hutchinson trace estimation for high-dimensional and high-order physics-informed neural networks. Comput. Methods Appl. Mech. Engrg., 424:Paper No. 116883, 17, 2024

work page 2024

[22] [22]

Tackling the curse of dimensionality with physics-informed neural networks.Neural Networks, 176:106369, 2024

Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, and Kenji Kawaguchi. Tackling the curse of dimensionality with physics-informed neural networks.Neural Networks, 176:106369, 2024

work page 2024

[23] [23]

Karniadakis, and Kenji Kawaguchi

Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George E. Karniadakis, and Kenji Kawaguchi. Bias-Variance Trade-Off in Physics-Informed Neural Networks with Ran- domized Smoothing for High-Dimensional PDEs.SIAM J. Sci. Comput., 47(4):C846– C872, 2025

work page 2025

[24] [24]

Large-population LQG games involving a major player: the Nash cer- tainty equivalence principle.SIAM J

Minyi Huang. Large-population LQG games involving a major player: the Nash cer- tainty equivalence principle.SIAM J. Control Optim., 48(5):3318–3353, 2009/10

work page 2009

[25] [25]

Caines, and Roland P

Minyi Huang, Peter E. Caines, and Roland P. Malham´ e. Social optima in mean field LQG control: centralized and decentralized strategies.IEEE Trans. Automat. Control, 57(7):1736–1751, 2012

work page 2012

[26] [26]

Deep backward schemes for high- dimensional nonlinear PDEs.Math

Cˆ ome Hur´ e, Huyˆ en Pham, and Xavier Warin. Deep backward schemes for high- dimensional nonlinear PDEs.Math. Comp., 89(324):1547–1579, 2020

work page 2020

[27] [27]

Policy evaluation and temporal-difference learning in con- tinuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

Yanwei Jia and Xun Yu Zhou. Policy evaluation and temporal-difference learning in con- tinuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

work page 2022

[28] [28]

Policy gradient and actor-critic learning in continu- ous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

Yanwei Jia and Xun Yu Zhou. Policy gradient and actor-critic learning in continu- ous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

work page 2022

[29] [29]

Springer Cham, third edition, 2020

Achim Klenke.Probability Theory. Springer Cham, third edition, 2020. 25

work page 2020

[30] [30]

Kloeden and Eckhard Platen.Numerical solution of stochastic differential equations, volume 23 ofApplications of Mathematics (New York)

Peter E. Kloeden and Eckhard Platen.Numerical solution of stochastic differential equations, volume 23 ofApplications of Mathematics (New York). Springer-Verlag, Berlin, 1992

work page 1992

[31] [31]

Efficiency of the price formation process in presence of high frequency participants: a mean field game analysis.Math

Aim´ e Lachapelle, Jean-Michel Lasry, Charles-Albert Lehalle, and Pierre-Louis Lions. Efficiency of the price formation process in presence of high frequency participants: a mean field game analysis.Math. Financ. Econ., 10(3):223–262, 2016

work page 2016

[32] [32]

Computation of mean field equilibria in economics.Math

Aime Lachapelle, Julien Salomon, and Gabriel Turinici. Computation of mean field equilibria in economics.Math. Models Methods Appl. Sci., 20(4):567–588, 2010

work page 2010

[33] [33]

On a mean field game approach mod- eling congestion and aversion in pedestrian crowds.Transportation research part B: methodological, 45(10):1572–1589, 2011

Aim´ e Lachapelle and Marie-Therese Wolfram. On a mean field game approach mod- eling congestion and aversion in pedestrian crowds.Transportation research part B: methodological, 45(10):1572–1589, 2011

work page 2011

[34] [34]

A neural network approach for stochastic optimal control.SIAM J

Xingjian Li, Deepanshu Verma, and Lars Ruthotto. A neural network approach for stochastic optimal control.SIAM J. Sci. Comput., 46(5):C535–C556, 2024

work page 2024

[35] [35]

Multi-scale deep neural network (MscaleDNN) for solving Poisson-Boltzmann equation in complex domains.Commun

Ziqi Liu, Wei Cai, and Zhi-Qin John Xu. Multi-scale deep neural network (MscaleDNN) for solving Poisson-Boltzmann equation in complex domains.Commun. Comput. Phys., 28(5):1970–2001, 2020

work page 1970

[36] [36]

On bellman equations for continuous-time policy eval- uation i: discretization and approximation, 2024

Wenlong Mou and Yuhua Zhu. On bellman equations for continuous-time policy eval- uation i: discretization and approximation, 2024

work page 2024

[37] [37]

Springer-Verlag, Berlin, 2009

Huyˆ en Pham.Continuous-time stochastic control and optimization with financial ap- plications, volume 61 ofStochastic Modelling and Applied Probability. Springer-Verlag, Berlin, 2009

work page 2009

[38] [38]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys., 378:686–707, 2019

work page 2019

[39] [39]

Deep neural networks motivated by partial differential equations.J

Lars Ruthotto and Eldad Haber. Deep neural networks motivated by partial differential equations.J. Math. Imaging Vision, 62(3):352–364, 2020

work page 2020

[40] [40]

Osher, Wuchen Li, Levon Nurbekyan, and Samy Wu Fung

Lars Ruthotto, Stanley J. Osher, Wuchen Li, Levon Nurbekyan, and Samy Wu Fung. A machine learning framework for solving high-dimensional mean field game and mean field control problems.Proc. Natl. Acad. Sci. USA, 117(17):9183–9193, 2020

work page 2020

[41] [41]

Stochastic taylor derivative estimator: Efficient amortization for arbitrary differential operators

Zekun Shi, Zheyuan Hu, Min Lin, and Kenji Kawaguchi. Stochastic taylor derivative estimator: Efficient amortization for arbitrary differential operators. InThe Thirty- eighth Annual Conference on Neural Information Processing Systems, 2024

work page 2024

[42] [42]

DGM: a deep learning algorithm for solving partial differential equations.J

Justin Sirignano and Konstantinos Spiliopoulos. DGM: a deep learning algorithm for solving partial differential equations.J. Comput. Phys., 375:1339–1364, 2018. 26

work page 2018

[43] [43]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement learning. An introduction. Adapt. Comput. Mach. Learn. Cambridge, MA: MIT Press, 2nd expanded and updated edition edition, 2018

work page 2018

[44] [44]

Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations.Journal of Compu- tational Physics, 476:111868, 2023

Kejun Tang, Xiaoliang Wan, and Chao Yang. Das-pinns: A deep adaptive sampling method for solving high-dimensional partial differential equations.Journal of Compu- tational Physics, 476:111868, 2023

work page 2023

[45] [45]

Adaptive importance sampling for deep Ritz.Commun

Xiaoliang Wan, Tao Zhou, and Yuancheng Zhou. Adaptive importance sampling for deep Ritz.Commun. Appl. Math. Comput., 7(3):929–953, 2025

work page 2025

[46] [46]

A deep shotgun method for solving high-dimensional parabolic partial differential equations.J

Wenjun Xu and Wenzhong Zhang. A deep shotgun method for solving high-dimensional parabolic partial differential equations.J. Sci. Comput., 104(2):69, 2025

work page 2025

[47] [47]

Linear-quadratic optimal control problems for mean-field stochastic differential equations.SIAM J

Jiongmin Yong. Linear-quadratic optimal control problems for mean-field stochastic differential equations.SIAM J. Control Optim., 51(4):2809–2838, 2013

work page 2013

[48] [48]

Springer-Verlag, New York, 1999

Jiongmin Yong and Xun Yu Zhou.Stochastic controls, volume 43 ofApplications of Mathematics (New York). Springer-Verlag, New York, 1999. Hamiltonian systems and HJB equations

work page 1999

[49] [49]

Weak adversarial networks for high-dimensional partial differential equations.J

Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou. Weak adversarial networks for high-dimensional partial differential equations.J. Comput. Phys., 411:109409, 14, 2020

work page 2020

[50] [50]

FBSDE based neural network algorithms for high- dimensional quasilinear parabolic PDEs.J

Wenzhong Zhang and Wei Cai. FBSDE based neural network algorithms for high- dimensional quasilinear parabolic PDEs.J. Comput. Phys., 470:Paper No. 111557, 14, 2022

work page 2022

[51] [51]

Actor-critic method for high dimensional static Hamilton-Jacobi-Bellman partial differential equations based on neural networks.SIAM J

Mo Zhou, Jiequn Han, and Jianfeng Lu. Actor-critic method for high dimensional static Hamilton-Jacobi-Bellman partial differential equations based on neural networks.SIAM J. Sci. Comput., 43(6):A4043–A4066, 2021

work page 2021

[52] [52]

Solving time-continuous stochastic optimal control prob- lems: Algorithm design and convergence analysis of actor-critic flow

Mo Zhou and Jianfeng Lu. Solving Time-Continuous Stochastic Optimal Control Prob- lems: Algorithm Design and Convergence Analysis of Actor-Critic Flow. Preprint, arXiv:2402.17208 [math.OC] (2024), 2024

work page arXiv 2024

[53] [53]

A policy gradient framework for stochastic optimal control problems with global convergence guarantee.SIAM J

Mo Zhou and Jianfeng Lu. A policy gradient framework for stochastic optimal control problems with global convergence guarantee.SIAM J. Control Optim., 63(4):2605–2631, 2025

work page 2025

[54] [54]

Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning

Yuhua Zhu, Yuming Zhang, and Haoyu Zhang. Optimal-PhiBE: A PDE-based Model-free framework for Continuous-time Reinforcement Learning. Preprint, arXiv:2506.05208 [math.OC] (2025), 2025. 27

work page arXiv 2025