Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the L^p-sense
Pith reviewed 2026-05-24 06:50 UTC · model grok-4.3
The pith
Solutions to high-dimensional semilinear heat PDEs with Lipschitz nonlinearities can be approximated in the L^p sense by deep neural networks with ReLU, leaky ReLU or softplus activations without the curse of dimensionality, provided the初始值
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that for every T > 0 the solutions u_d : [0,T] × R^d → R of semilinear heat PDEs with Lipschitz continuous nonlinearities can be approximated at time T in the L^p sense, p ∈ (0,∞), by DNNs with ReLU, leaky ReLU or softplus activations without the curse of dimensionality whenever the initial functions x ↦ u_d(0,x) admit such approximations without the curse of dimensionality.
What carries the argument
Transfer of non-curse-of-dimensionality approximability from initial data to terminal-time solutions via the PDE evolution operator, for ReLU, leaky ReLU and softplus activations.
Load-bearing premise
The initial value functions themselves can be approximated without the curse of dimensionality by networks with the given activations.
What would settle it
Finding a sequence of initial functions approximable without the COD by the networks, but whose evolved solutions at time T require exponentially many parameters in d for the same accuracy in L^p.
read the original abstract
Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed. The interest that these methods have generated in the literature is in large part due to simulations which appear to demonstrate that such DL methods have the capacity to overcome the curse of dimensionality (COD) for PDEs in the sense that the number of computational operations they require to achieve a certain approximation accuracy $\varepsilon\in(0,\infty)$ grows at most polynomially in the PDE dimension $d\in\mathbb N$ and the reciprocal of $\varepsilon$. While there is thus far no mathematical result that proves that one of such methods is indeed capable of overcoming the COD, there are now a number of rigorous results in the literature that show that deep neural networks (DNNs) have the expressive power to approximate PDE solutions without the COD in the sense that the number of parameters used to describe the approximating DNN grows at most polynomially in both the PDE dimension $d\in\mathbb N$ and the reciprocal of the approximation accuracy $\varepsilon>0$. Roughly speaking, in the literature it is has been proved for every $T>0$ that solutions $u_d\colon [0,T]\times\mathbb R^d\to \mathbb R$, $d\in\mathbb N$, of semilinear heat PDEs with Lipschitz continuous nonlinearities can be approximated by DNNs with ReLU activation at the terminal time in the $L^2$-sense without the COD provided that the initial value functions $\mathbb R^d\ni x\mapsto u_d(0,x)\in\mathbb R$, $d\in\mathbb N$, can be approximated by ReLU DNNs without the COD. It is the key contribution of this work to generalize this result by establishing this statement in the $L^p$-sense with $p\in(0,\infty)$ and by allowing the activation function to be more general covering the ReLU, the leaky ReLU, and the softplus activation functions as special cases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proves that, for any fixed T>0 and p in (0,infty), solutions u_d of semilinear heat equations with Lipschitz nonlinearities on [0,T] x R^d can be approximated at time T in the L^p norm by DNNs using ReLU, leaky ReLU or softplus activations, with the number of parameters growing at most polynomially in d and 1/epsilon, provided the initial data functions admit such DNN approximations without the curse of dimensionality. This extends earlier results that were restricted to the L^2 norm and the ReLU activation.
Significance. If the proofs hold, the result supplies a clean technical extension of known approximation-theoretic guarantees for DNNs applied to high-dimensional Kolmogorov PDEs. Broadening the admissible activations and the range of p strengthens the theoretical case that DNNs can overcome the curse of dimensionality for this class of PDEs, conditional on the initial-data hypothesis that is already standard in the literature.
minor comments (2)
- The dependence of the constants on p and on the Lipschitz constant of the nonlinearity should be stated explicitly in the main theorem statement to make the polynomial-in-d claim fully transparent.
- Notation for the DNN parameter count N(d,epsilon) is used before it is formally defined; a forward reference or early definition would improve readability.
Simulated Author's Rebuttal
We thank the referee for their careful reading, positive summary, and recommendation to accept the manuscript.
Circularity Check
Conditional result on external initial-data assumption; minor self-citations not load-bearing
full rationale
The paper establishes a conditional statement that terminal-time L^p approximability by DNNs (ReLU/leaky ReLU/softplus) follows from the assumption that initial-value functions can be approximated without the COD. This generalizes prior L^2/ReLU results via technical extension of error-propagation arguments rather than any internal reduction of the target quantity to a fitted parameter or self-defined input. The weakest assumption is explicitly external and not derived within the paper. Self-citations to earlier works appear but are not load-bearing for the new generalization, satisfying the criteria for at most minor circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption ReLU, leaky ReLU and softplus satisfy the approximation-theoretic properties used in the prior L^2 result
- domain assumption Solutions of the semilinear heat PDE with Lipschitz nonlinearity admit the regularity needed for the approximation argument
Reference graph
Works this paper leans on
-
[1]
Approximation properties of residual neural networks for Kolmogorov PDEs.Discrete Contin
Baggenstos, J., and Salimov a, D. Approximation properties of residual neural networks for Kolmogorov PDEs.Discrete Contin. Dyn. Syst. Ser. B 28, 5 (2023), 3193– 3215
work page 2023
-
[2]
Bao, G., Ye, X., Zang, Y., and Zhou, H. Numerical solution of inverse problems by weak adversarial networks.Inverse Problems 36, 11 (2020), 115003, 31
work page 2020
-
[3]
Deep splitting method for parabolic PDEs
Beck, C., Becker, S., Cheridito, P., Jentzen, A., and Neufeld, A. Deep splitting method for parabolic PDEs. SIAM J. Sci. Comput. 43, 5 (2021), A3135– A3154. 47
work page 2021
-
[4]
Solving the Kolmogorov PDE by means of deep learning.J
Beck, C., Becker, S., Grohs, P., Jaaf ari, N., and Jentzen, A. Solving the Kolmogorov PDE by means of deep learning.J. Sci. Comput. 88, 3 (2021), Paper No. 73, 28
work page 2021
-
[5]
Beck, C., E, W., and Jentzen, A. Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order back- ward stochastic differential equations.J. Nonlinear Sci. 29, 4 (2019), 1563–1619
work page 2019
-
[6]
Beck, C., Gonon, L., Hutzenthaler, M., and Jentzen, A. On existence and uniqueness properties for solutions of stochastic fixed point equations.Discrete Contin. Dyn. Syst. Ser. B 26, 9 (2021), 4927–4962
work page 2021
-
[7]
Beck, C., Gonon, L., and Jentzen, A. Overcoming the curse of dimensionality in the numerical approximation of high-dimensional semilinear elliptic partial differential equations. Partial Differ. Equ. Appl. 5, 6 (2024), Paper No. 31, 47
work page 2024
-
[8]
Beck, C., Hornung, F., Hutzenthaler, M., Jentzen, A., and Kruse, T. Overcoming the curse of dimensionality in the numerical approximation of Allen-Cahn partial differential equations via truncated full-history recursive multilevel Picard ap- proximations. J. Numer. Math. 28, 4 (2020), 197–222
work page 2020
-
[9]
Beck, C., Hutzenthaler, M., and Jentzen, A. On nonlinear Feynman–Kac formulas for viscosity solutions of semilinear parabolic partial differential equations. Stochastics and Dynamics 21, 08 (2021), 2150048
work page 2021
-
[10]
Beck, C., Hutzenthaler, M., Jentzen, A., and Kuckuck, B. An overview on deep learning-based approximation methods for partial differential equations.Discrete Contin. Dyn. Syst. Ser. B 28, 6 (2023), 3697–3746
work page 2023
-
[11]
Becker, S., Braunw arth, R., Hutzenthaler, M., Jentzen, A., and von Wurstemberger, P. NumericalsimulationsforfullhistoryrecursivemultilevelPicard approximations for systems of high-dimensional partial differential equations.Commun. Comput. Phys. 28, 5 (2020), 2109–2138
work page 2020
-
[12]
Bellman, R. Dynamic programming. Princeton Landmarks in Mathematics. Prince- ton University Press, Princeton, NJ, 2010. Reprint of the 1957 edition, With a new introduction by Stuart Dreyfus
work page 2010
-
[13]
Berg, J., and Nyström, K. A unified deep artificial neural network approach to partial differential equations in complex geometries.Neurocomputing 317 (2018), 28– 41
work page 2018
-
[14]
Berner, J., Grohs, P., and Jentzen, A. Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. SIAM J. Math. Data Sci. 2, 3 (2020), 631–657. 48
work page 2020
-
[15]
Blechschmidt, J., and Ernst, O. G. Three ways to solve partial differential equa- tionswithneuralnetworks—areview. GAMM-Mitt. 44, 2(2021), PaperNo.e202100006, 29
work page 2021
-
[16]
Machine learning for semi linear PDEs
Chan-W ai-Nam, Q., Mikael, J., and W arin, X. Machine learning for semi linear PDEs. J. Sci. Comput. 79, 3 (2019), 1667–1712
work page 2019
-
[17]
Deep Runge–Kutta schemes for BSDEs
Chassagneux, J.-F., Chen, J., and Frikha, N. Deep Runge–Kutta schemes for BSDEs. arXiv:2212.14372 (2022), 33 pages
-
[18]
A., Hutzenthaler, M., and Werner, P
Cioica-Licht, P. A., Hutzenthaler, M., and Werner, P. T. Deep neural net- works overcome the curse of dimensionality in the numerical approximation of semilinear partial differential equations.arXiv:2205.14398 (2022), 34 pages
-
[19]
E, W., Han, J., and Jentzen, A. Deep learning-based numerical methods for high- dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 5, 4 (2017), 349–380
work page 2017
-
[20]
E, W., Han, J., and Jentzen, A. Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning.Nonlinearity 35, 1 (2022), 278–310
work page 2022
-
[21]
E, W., Hutzenthaler, M., Jentzen, A., and Kruse, T. On multilevel Picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations.J. Sci. Comput. 79, 3 (2019), 1534–1571
work page 2019
-
[22]
Multilevel Picard itera- tions for solving smooth semilinear parabolic heat equations.Partial Differ
E, W., Hutzenthaler, M., Jentzen, A., and Kruse, T. Multilevel Picard itera- tions for solving smooth semilinear parabolic heat equations.Partial Differ. Equ. Appl. 2, 6 (2021), Paper No. 80, 31
work page 2021
-
[23]
E, W., and Yu, B. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems.Commun. Math. Stat. 6, 1 (2018), 1–12
work page 2018
-
[24]
DNN expression rate analysis of high-dimensional PDEs: Application to option pricing.Constr
Elbrächter, D., Grohs, P., Jentzen, A., and Schw ab, C. DNN expression rate analysis of high-dimensional PDEs: Application to option pricing.Constr. Approx. (2021), 1–69
work page 2021
-
[25]
Feng, D., Yang, Z., and Zou, S. Fractional weak adversarial networks for the sta- tionary fractional advection dispersion equations.Z. Angew. Math. Phys. 75, 5 (2024), Paper No. 168, 20
work page 2024
-
[26]
Fujii, M., Takahashi, A., and Takahashi, M. Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs.Asia-Pacific Financial Markets (Mar 2019)
work page 2019
-
[27]
Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J
Germain, M., Pham, H., and W arin, X. Approximation error analysis of some deep backward schemes for nonlinear PDEs.SIAM J. Sci. Comput. 44, 1 (2022), A28–A56. 49
work page 2022
-
[28]
Neural Networks–Based Algorithms for Stochastic Control and PDEs in Finance
Germain, M., Pham, H., and W arin, X. Neural Networks–Based Algorithms for Stochastic Control and PDEs in Finance. Cambridge University Press, 2023, pp. 426– –452
work page 2023
-
[29]
Giles, M. B., Jentzen, A., and Welti, T. Generalised multilevel Picard approxi- mations. arXiv:1911.03188 (2019), 61 pages
-
[30]
Uniform er- ror estimates for artificial neural network approximations for heat equations.IMA J
Gonon, L., Grohs, P., Jentzen, A., Kofler, D., and Šiška, D. Uniform er- ror estimates for artificial neural network approximations for heat equations.IMA J. Numer. Anal. 42, 3 (2022), 1991–2054
work page 2022
-
[31]
Gonon, L., and Schw ab, C. Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models.Finance Stoch. 25, 4 (2021), 615–657
work page 2021
-
[32]
Grohs, P., and Herrmann, L. Deep neural network approximation for high- dimensional parabolic Hamilton–Jacobi–Bellman equations.arXiv:2103.05744 (2021), 23 pages
-
[33]
Deep neural network approximation for high- dimensional elliptic PDEs with boundary conditions.IMA J
Grohs, P., and Herrmann, L. Deep neural network approximation for high- dimensional elliptic PDEs with boundary conditions.IMA J. Numer. Anal. 42, 3 (2022), 2055–2082
work page 2022
-
[34]
Grohs, P., Hornung, F., Jentzen, A., and von Wurstemberger, P. A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations.Mem. Amer. Math. Soc. 284, 1410 (2023), v+93
work page 2023
-
[35]
Space-time er- ror estimates for deep neural network approximations for differential equations.Adv
Grohs, P., Hornung, F., Jentzen, A., and Zimmermann, P. Space-time er- ror estimates for deep neural network approximations for differential equations.Adv. Comput. Math. 49, 1 (2023), Paper No. 4, 78
work page 2023
-
[36]
Grohs, P., Jentzen, A., and Salimov a, D. Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms.Partial Differ. Equ. Appl. 3, 4 (2022), Paper No. 45, 41
work page 2022
-
[37]
Solving high-dimensional partial differential equa- tions using deep learning.Proc
Han, J., Jentzen, A., and E, W. Solving high-dimensional partial differential equa- tions using deep learning.Proc. Natl. Acad. Sci. USA 115, 34 (2018), 8505–8510
work page 2018
-
[38]
Convergence of the deep BSDE method for coupled FBSDEs
Han, J., and Long, J. Convergence of the deep BSDE method for coupled FBSDEs. Probab. Uncertain. Quant. Risk 5(2020), Paper No. 5, 33
work page 2020
-
[39]
Deep Primal-Dual Algorithm for BSDEs: Applications of Machine Learning to CVA and IM
Henry-Labordère, P. Deep Primal-Dual Algorithm for BSDEs: Applications of Machine Learning to CVA and IM. (November 15, 2017), 16 pages. Available at SSRN: https://ssrn.com/abstract=3071506
work page 2017
-
[40]
Horn, R. A., and Johnson, C. R. Matrix analysis. Cambridge University Press, Cambridge, 1985. 50
work page 1985
-
[41]
Space-time deep neu- ral network approximations for high-dimensional partial different ial equations
Hornung, F., Jentzen, A., and Salimov a, D. Space-time deep neural network approximations for high-dimensional partial differential equations. arXiv:2006.02199 (2020), 52 pages. Accepted in J. Comput. Math
-
[42]
Hu, Z., Ka w aguchi, K., Zhang, Z., and Karniadakis, G. E. Tackling the curse of dimensionality in fractional and tempered fractional PDEs with physics-informed neural networks. Comput. Methods Appl. Mech. Engrg. 432(2024), Paper No. 117448, 13
work page 2024
-
[43]
Deep backward schemes for high-dimensional nonlinear PDEs
Huré, C., Pham, H., and W arin, X. Deep backward schemes for high-dimensional nonlinear PDEs. Math. Comp. 89, 324 (2020), 1547–1579
work page 2020
-
[44]
Hutzenthaler, M., Jentzen, A., and Kruse, T. Overcoming the curse of dimen- sionality in the numerical approximation of parabolic partial differential equations with gradient-dependent nonlinearities.Found. Comput. Math. 22, 4 (2022), 905–966
work page 2022
- [45]
-
[46]
Hutzenthaler, M., Jentzen, A., Kruse, T., and Nguyen, T. A. A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. Partial Differ. Equ. Appl. 1, 2 (2020), Paper No. 10, 34
work page 2020
-
[47]
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T. A., and von Wurstemberger, P. Overcoming the curse of dimensionality in the numerical ap- proximation of semilinear parabolic partial differential equations.Proc. A. 476, 2244 (2020), 20190630, 25
work page 2020
- [48]
-
[49]
Hutzenthaler, M., Jentzen, A., and von Wurstemberger, P. Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risks. Electron. J. Probab. 25(2020), Paper No. 101, 73
work page 2020
-
[50]
Hutzenthaler, M., and Kruse, T. Multilevel Picard approximations of high- dimensional semilinear parabolic differential equations with gradient-dependent nonlin- earities. SIAM J. Numer. Anal. 58, 2 (2020), 929–961
work page 2020
-
[51]
Hutzenthaler, M., Kruse, T., and Nguyen, T. A. Multilevel Picard approxima- tions for McKean-Vlasov stochastic differential equations.J. Math. Anal. Appl. 507, 1 (2022), Paper No. 125761, 14
work page 2022
-
[52]
Hutzenthaler, M., and Nguyen, T. A. Multilevel Picard approximations of high- dimensional semilinear partial differential equations with locally monotone coefficient functions. Appl. Numer. Math. 181(2022), 151–175. 51
work page 2022
-
[53]
Deep curve-dependent PDEs for affine rough volatility
Jacquier, A., and Oumgari, M. Deep curve-dependent PDEs for affine rough volatility. SIAM J. Financial Math. 14, 2 (2023), 353–382
work page 2023
-
[54]
Jentzen, A., Salimov a, D., and Welti, T. A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kol- mogorov partial differential equations with constant diffusion and nonlinear drift coef- ficients. Commun. Math. Sci. 19, 5 (2021), 1167–1205
work page 2021
-
[55]
A theoretical analysis of deep neural networks and parametric PDEs.Constr
Kutyniok, G., Petersen, P., Raslan, M., and Schneider, R. A theoretical analysis of deep neural networks and parametric PDEs.Constr. Approx. 55, 1 (2022), 73–125
work page 2022
-
[56]
Neufeld, A., and Wu, S. Multilevel Picard approximation algorithm for semilin- ear partial integro-differential equations and its complexity analysis.arXiv:2205.09639 (2022), 54 pages
-
[57]
Tractability of multivariate problems
Nov ak, E., and Woźniakowski, H. Tractability of multivariate problems. Vol. 1: Linear information, vol. 6 of EMS Tracts in Mathematics. European Mathematical Society (EMS), Zürich, 2008
work page 2008
-
[58]
Nüsken, N., and Richter, L. Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space.Partial Differ. Equ. Appl. 2, 4 (2021), Paper No. 48, 48
work page 2021
-
[59]
Neural networks-based backward scheme for fully nonlinear PDEs.Partial Differ
Pham, H., W arin, X., and Germain, M. Neural networks-based backward scheme for fully nonlinear PDEs.Partial Differ. Equ. Appl. 2, 1 (2021), Paper No. 16, 24
work page 2021
-
[60]
Raissi, M., Perdikaris, P., and Karniadakis, G. E. Physics–informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys. 378(2019), 686–707
work page 2019
-
[61]
Reisinger, C., and Zhang, Y. Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. Anal. Appl. (Singap.) 18, 6 (2020), 951–999
work page 2020
-
[62]
Simon, M. K. Probability distributions involving Gaussian random variables: A hand- book for engineers and scientists. Springer Science & Business Media, 2007
work page 2007
-
[63]
DGM: A deep learning algorithm for solving partial differential equations.J
Sirignano, J., and Spiliopoulos, K. DGM: A deep learning algorithm for solving partial differential equations.J. Comput. Phys. 375(2018), 1339–1364
work page 2018
-
[64]
V alsecchi Oliv a, P., Wu, Y., He, C., and Ni, H. Towards fast weak adversarial trainingtosolvehighdimensionalparabolicpartialdifferentialequationsusingXNODE- WAN. J. Comput. Phys. 463(2022), Paper No. 111233, 17
work page 2022
-
[65]
Weak adversarial networks for high- dimensional partial differential equations.J
Zang, Y., Bao, G., Ye, X., and Zhou, H. Weak adversarial networks for high- dimensional partial differential equations.J. Comput. Phys. 411(2020), 109409, 14. 52
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.