XNet-Enhanced Deep BSDE Method and Numerical Analysis

Xiaotao Zheng; Xingye Yue; Xin Li; Zhihong Xia

arxiv: 2502.06238 · v2 · submitted 2025-02-10 · 💻 cs.CE

XNet-Enhanced Deep BSDE Method and Numerical Analysis

Xiaotao Zheng , Xingye Yue , Zhihong Xia , Xin Li This is my paper

Pith reviewed 2026-05-23 04:15 UTC · model grok-4.3

classification 💻 cs.CE

keywords Deep BSDEnon-LipschitzAllen-CahnHamilton-Jacobi-BellmanconvergenceXNethigh-dimensional PDEsemilinear parabolic PDE

0 comments

The pith

Deep BSDE methods converge for non-Lipschitz generators in Allen-Cahn and HJB equations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that the Deep Backward Stochastic Differential Equation method converges even when the generator is not globally Lipschitz, as occurs in Allen-Cahn equations with cubic terms and Hamilton-Jacobi-Bellman equations with quadratic growth. The proof uses a bounded double-well lemma together with a truncated analysis of the backward stochastic differential equation. It also introduces XNet, a shallow network with linear parameter scaling in its depth, that keeps the approximation strength but cuts the optimization and run-time expense. Tests in one hundred dimensions back up the theory and quantify the efficiency improvement over usual networks. Readers should care because many dynamical systems in science produce semilinear PDEs that lie outside the Lipschitz setting, so the method becomes usable for a wider set of high-dimensional models.

Core claim

We establish the convergence theory for non-Lipschitz generators covering Allen-Cahn equations with cubic nonlinearity and HJB equations with quadratic gradient growth based on a bounded double-well lemma and a truncated-BSDE analysis. Computationally, we instantiate the framework with XNet, a shallow architecture with O(L) parameters that preserves strong approximation while substantially reducing optimization and computational cost.

What carries the argument

XNet shallow architecture with O(L) parameters, supported by bounded double-well lemma and truncated-BSDE analysis for non-Lipschitz convergence

If this is right

The method converges for Allen-Cahn equations with cubic nonlinearity.
The method converges for HJB equations with quadratic gradient growth.
XNet achieves strong approximation with far fewer parameters than standard networks.
Numerical tests on 100-dimensional problems confirm both the convergence rates and the cost savings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same truncation technique could apply to other PDEs whose nonlinearities grow faster than linear but stay bounded in certain ways.
Efficiency improvements from XNet may make it feasible to solve time-dependent problems in real time for applications in physics and engineering.
Further work could examine whether the approach scales to dimensions beyond 100 without loss of accuracy.

Load-bearing premise

The bounded double-well lemma holds and the truncated-BSDE analysis extends to the non-Lipschitz generators considered.

What would settle it

Running the Deep BSDE solver on an Allen-Cahn equation and observing that the approximation error fails to decrease as the number of time steps or network width increases would disprove the convergence claim.

Figures

Figures reproduced from arXiv: 2502.06238 by Xiaotao Zheng, Xingye Yue, Xin Li, Zhihong Xia.

**Figure 2.** Figure 2: Comparison of Two Network Architectures for Solving the Allen-Cahn Equation under [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of Two Network Architectures for Solving the Allen-Cahn Equation under [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of Two Network Architectures for Solving the PricingDiffrate Equation under [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Results of solving the Allen-Cahn Equation using the Deep BSDE method by XNet under [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of Two Network Architectures for Solving the PricingDiddrate under 10- [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Solving the PricingDiddrate Equation by XNet under various settings with 20-step-time [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Semilinear parabolic partial differential equations (PDEs) are fundamental to modeling complex dynamical systems across scientific domains. The Deep Backward Stochastic Differential Equation (BSDE) method is a promising approach for high-dimensional PDEs; however, existing convergence results apply only to globally Lipschitz generators, excluding important cases such as Allen--Cahn and Hamilton--Jacobi--Bellman (HJB) equations. This paper presents both a theoretical and a computational advance for Deep BSDE methods. Theoretically, we establish the convergence theory for non--Lipschitz generators--covering Allen--Cahn equations with cubic nonlinearity and HJB equations with quadratic gradient growth--based on a bounded double--well lemma and a truncated-BSDE analysis within the Bouchard--Touzi--Zhang theory. Computationally, we instantiate the framework with XNet, a shallow architecture with $\mathcal O(L)$ parameters that preserves strong approximation while substantially reducing optimization and computational cost. Numerical experiments on 100--dimensional PDEs corroborate the predicted convergence behavior and demonstrate significant efficiency gains over standard feedforward implementations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Claims new convergence for non-Lipschitz Deep BSDE via double-well lemma and truncation, plus a cheap XNet; truncation step for quadratic HJB needs close inspection.

read the letter

The paper's main contribution is a claimed extension of Deep BSDE convergence to generators that are not globally Lipschitz, specifically Allen-Cahn with cubic nonlinearity and HJB with quadratic gradient growth. It does this inside the Bouchard-Touzi-Zhang framework by adding a bounded double-well lemma and a truncated-BSDE argument. It also introduces XNet, a shallow network whose parameter count scales linearly with depth rather than quadratically. The numerical section tests both on 100-dimensional problems and reports faster training than standard feedforward nets while matching the expected rates. Those experiments are the clearest practical takeaway. The theory is the part that matters most for the field. If the double-well lemma supplies uniform bounds for the cubic case and the truncation bias can be controlled independently for the quadratic case, then the result would let the method reach two important families of PDEs that currently sit outside the Lipschitz theory. The XNet design is simple enough that it could be adopted quickly if the convergence carries over. The soft spot sits in the HJB truncation step. Restoring Lipschitz continuity by truncation is standard, but passing to the limit requires an a-priori estimate showing the difference between truncated and original BSDEs vanishes uniformly in the truncation level. The abstract does not spell out how that estimate is obtained when the generator has quadratic growth; the stress-test concern is therefore on point and needs to be checked in the full proof. For Allen-Cahn the double-well lemma may close the gap, but that does not automatically transfer. The paper cites the right prior work and does not appear to hide circularity in the abstract. The experiments are consistent with the stated rates, which is better than many purely theoretical claims. This work is aimed at people who already use or study Deep BSDE methods for high-dimensional semilinear PDEs in scientific computing. A reader who cares about extending the method to Allen-Cahn or HJB problems would find the new cases and the architecture worth looking at. It is important enough, and the central claim is stated clearly enough, that it should go to peer review rather than a desk reject; the referees can verify whether the truncation argument actually closes for the quadratic case.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to extend the convergence theory of the Deep BSDE method to semilinear parabolic PDEs with non-Lipschitz generators (Allen-Cahn with cubic nonlinearity and HJB with quadratic gradient growth) by combining a bounded double-well lemma with truncated-BSDE analysis inside the Bouchard-Touzi-Zhang framework; it further introduces the shallow XNet architecture (O(L) parameters) that preserves strong approximation while lowering optimization cost, and reports numerical experiments on 100-dimensional instances that corroborate the predicted rates and efficiency gains.

Significance. If the convergence statements close, the work would meaningfully enlarge the class of PDEs amenable to Deep BSDE solvers, directly covering models that arise in phase transitions and stochastic control. The XNet construction and the high-dimensional numerical corroboration would constitute concrete practical contributions.

major comments (1)

[truncated-BSDE analysis within Bouchard-Touzi-Zhang theory] Truncated-BSDE analysis for HJB equations with quadratic gradient growth: the passage to the limit as the truncation level tends to infinity requires an a-priori bound on the difference between the truncated and original BSDEs that is uniform in the truncation parameter. The abstract does not indicate whether this bound is derived independently of the Lipschitz condition being relaxed or whether it relies on solution moments that have not yet been established; this step is load-bearing for the claimed extension of Bouchard-Touzi-Zhang theory.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying this load-bearing step in the convergence argument. We address the comment below.

read point-by-point responses

Referee: [truncated-BSDE analysis within Bouchard-Touzi-Zhang theory] Truncated-BSDE analysis for HJB equations with quadratic gradient growth: the passage to the limit as the truncation level tends to infinity requires an a-priori bound on the difference between the truncated and original BSDEs that is uniform in the truncation parameter. The abstract does not indicate whether this bound is derived independently of the Lipschitz condition being relaxed or whether it relies on solution moments that have not yet been established; this step is load-bearing for the claimed extension of Bouchard-Touzi-Zhang theory.

Authors: We agree that this uniformity is essential. In the manuscript (Section 3.2 and the proof of Theorem 3.5), the a-priori bound on the difference between truncated and original BSDEs is obtained from the bounded double-well lemma (Lemma 2.3) together with the moment estimates of the truncated processes; these estimates are derived before the limit is taken and hold uniformly in the truncation level by exploiting the quadratic growth structure and the specific form of the truncation, without invoking global Lipschitz continuity. The abstract condenses the overall strategy but does not spell out the independence of the bound. We will add a short clarifying sentence in the introduction (and, if space permits, the abstract) to make this explicit. revision: partial

Circularity Check

0 steps flagged

No significant circularity; extension of external BTZ theory via independent lemmas

full rationale

The paper's central claim is an extension of the Bouchard-Touzi-Zhang convergence theory to non-Lipschitz generators, achieved through a new bounded double-well lemma (for Allen-Cahn) and truncated-BSDE analysis (for HJB). No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the derivation chain relies on external BTZ results plus the paper's own analytical additions. This is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no specific free parameters, axioms, or invented entities can be extracted or audited.

pith-pipeline@v0.9.0 · 5718 in / 972 out tokens · 48690 ms · 2026-05-23T04:15:57.801196+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 2 internal anchors

[1]

C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld, Deep splitting method for parabolic pdes, SIAM Journal on Scientific Computing, 43 (2021), pp. A3135–A3154

work page 2021
[2]

Y. Z. Bergman , Option pricing with differential interest rates , The Review of Financial Studies, 8 (1995), pp. 475–500

work page 1995
[3]

W. Chen, Z. Wang, and J. Zhou , Large-scale l-bfgs using mapreduce, Advances in neural information processing systems, 27 (2014)

work page 2014
[4]

Chen, S.-K

Z. Chen, S.-K. Lai, and Z. Yang , At-pinn: Advanced time-marching physics-informed neural network for structural vibration analysis, Thin-Walled Structures, 196 (2024), p. 111423

work page 2024
[5]

Duchi, E

J. Duchi, E. Hazan, and Y. Singer , Adaptive subgradient methods for online learning and stochastic optimization., Journal of machine learning research, 12 (2011)

work page 2011
[6]

A. E. Gelfand, Gibbs sampling, Journal of the American statistical Association, 95 (2000), pp. 1300–1304

work page 2000
[7]

Grohs, F

P. Grohs, F. Hornung, A. Jentzen, and P. Von Wurstemberger , A proof that artifi- cial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations , vol. 284, American Mathematical Society, 2023

work page 2023
[8]

J. Han, A. Jentzen, et al. , Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations , Commu- nications in mathematics and statistics, 5 (2017), pp. 349–380

work page 2017
[9]

Han and J

J. Han and J. Long , Convergence of the deep bsde method for coupled fbsdes , Probability, Uncertainty and Quantitative Risk, 5 (2020), p. 5

work page 2020
[10]

Counterparty Risk Valuation: A Marked Branching Diffusion Approach

P. Henry-Labordere, Counterparty risk valuation: A marked branching diffusion approach , arXiv preprint arXiv:1203.2369, (2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012
[11]

Henry-Labord`ere, N

P. Henry-Labord`ere, N. Oudjane, X. Tan, N. Touzi, and X. Warin , Branching dif- fusion representation of semilinear pdes and monte carlo approximation , 55 1 ANNALES DE L’INSTITUT HENRI POINCAR ´E PROBABILIT ´ES ET STATISTIQUES Vol. 55, No. 1 (February, 2019) 1–607, 55 (2019), pp. 184–210

work page 2019
[12]

Henry-Labordere, X

P. Henry-Labordere, X. Tan, and N. Touzi , A numerical algorithm for a class of bsdes via the branching process , Stochastic Processes and their Applications, 124 (2014), pp. 1112– 1140

work page 2014
[13]

Hofgard, J

W. Hofgard, J. Sun, and A. Cohen , Convergence of the deep galerkin method for mean field control problems, arXiv preprint arXiv:2405.13346, (2024)

work page arXiv 2024
[14]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are univer- sal approximators, Neural networks, 2 (1989), pp. 359–366

work page 1989
[15]

Hur´e, H

C. Hur´e, H. Pham, and X. Warin, Deep backward schemes for high-dimensional nonlinear pdes, Mathematics of Computation, 89 (2020), pp. 1547–1579. 16

work page 2020
[16]

Hutzenthaler, A

M. Hutzenthaler, A. Jentzen, T. Kruse, et al. , Multilevel picard iterations for solving smooth semilinear parabolic heat equations, Partial Differential Equations and Applications, 2 (2021), pp. 1–31

work page 2021
[17]

X. Ji, Y. Jiao, X. Lu, P. Song, and F. Wang , Deep ritz method for elliptical multiple eigenvalue problems, Journal of Scientific Computing, 98 (2024), p. 48

work page 2024
[18]

Katanforoosh, D

K. Katanforoosh, D. Kunin, and J. Ma , Parameter optimization in neural networks , 2019

work page 2019
[19]

Kiefer and J

J. Kiefer and J. Wolfowitz , Stochastic estimation of the maximum of a regression func- tion, The Annals of Mathematical Statistics, (1952), pp. 462–466

work page 1952
[20]

X. Li, Z. Xia, and H. Zhang , Cauchy activation function and xnet , arXiv preprint arXiv:2409.19221, (2024)

work page arXiv 2024
[21]

X. Li, X. Zheng, and Z. Xia , Enhancing neural function approximation: The xnet outper- forming kan, arXiv preprint arXiv:2501.18959, (2025)

work page arXiv 2025
[22]

D. C. Liu and J. Nocedal, On the limited memory bfgs method for large scale optimization , Mathematical programming, 45 (1989), pp. 503–528

work page 1989
[23]

Mishra and R

S. Mishra and R. Molinaro, Estimates on the generalization error of physics-informed neu- ral networks for approximating a class of inverse problems for pdes , IMA Journal of Numerical Analysis, 42 (2022), pp. 981–1022

work page 2022
[24]

Pardoux and S

E. Pardoux and S. Peng , Backward stochastic differential equations and quasilinear parabolic partial differential equations, in Stochastic Partial Differential Equations and Their Applications: Proceedings of IFIP WG 7/1 International Conference University of North Car- olina at Charlotte, NC June 6–8, 1991, Springer, 2005, pp. 200–217

work page 1991
[25]

Pardoux and S

E. Pardoux and S. Tang, Forward-backward stochastic differential equations and quasilinear parabolic pdes, Probability theory and related fields, 114 (1999), pp. 123–150

work page 1999
[26]

Quarteroni and A

A. Quarteroni and A. Valli , Numerical approximation of partial differential equations , vol. 23, Springer Science & Business Media, 2008

work page 2008
[27]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis , Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics, 378 (2019), pp. 686–707

work page 2019
[28]

S. J. Reddi, S. Kale, and S. Kumar , On the convergence of adam and beyond , arXiv preprint arXiv:1904.09237, (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1904
[29]

Z. Shen, H. Yang, and S. Zhang , Neural network approximation: Three hidden layers are enough, Neural Networks, 141 (2021), pp. 160–173

work page 2021
[30]

Y. Shin, J. Darbon, and G. E. Karniadakis , On the convergence of physics in- formed neural networks for linear second-order elliptic and parabolic type pdes , arXiv preprint arXiv:2004.01806, (2020)

work page arXiv 2004
[31]

Sirignano and K

J. Sirignano and K. Spiliopoulos , Dgm: A deep learning algorithm for solving partial differential equations, Journal of computational physics, 375 (2018), pp. 1339–1364. 17

work page 2018
[32]

Sobo´l, Quasi-monte carlo methods, Progress in Nuclear Energy, 24 (1990), pp

I. Sobo´l, Quasi-monte carlo methods, Progress in Nuclear Energy, 24 (1990), pp. 55–61

work page 1990
[33]

S. T. Tokdar and R. E. Kass , Importance sampling: a review , Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010), pp. 54–60

work page 2010
[34]

Wang and L

Y. Wang and L. Zhong, Nas-pinn: neural architecture search-guided physics-informed neural network for solving pdes , Journal of Computational Physics, 496 (2024), p. 112603

work page 2024
[35]

J. Xiao, F. Fu, and X. Wang, Deep learning based on randomized quasi-monte carlo method for solving linear kolmogorov partial differential equation , Journal of Computational and Ap- plied Mathematics, (2024), p. 116088

work page 2024
[36]

Yu et al

B. Yu et al. , The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics, 6 (2018), pp. 1–12

work page 2018
[37]

Zhang and J

J. Zhang and J. Zhang , Backward stochastic differential equations, Springer, 2017. 18

work page 2017

[1] [1]

C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld, Deep splitting method for parabolic pdes, SIAM Journal on Scientific Computing, 43 (2021), pp. A3135–A3154

work page 2021

[2] [2]

Y. Z. Bergman , Option pricing with differential interest rates , The Review of Financial Studies, 8 (1995), pp. 475–500

work page 1995

[3] [3]

W. Chen, Z. Wang, and J. Zhou , Large-scale l-bfgs using mapreduce, Advances in neural information processing systems, 27 (2014)

work page 2014

[4] [4]

Chen, S.-K

Z. Chen, S.-K. Lai, and Z. Yang , At-pinn: Advanced time-marching physics-informed neural network for structural vibration analysis, Thin-Walled Structures, 196 (2024), p. 111423

work page 2024

[5] [5]

Duchi, E

J. Duchi, E. Hazan, and Y. Singer , Adaptive subgradient methods for online learning and stochastic optimization., Journal of machine learning research, 12 (2011)

work page 2011

[6] [6]

A. E. Gelfand, Gibbs sampling, Journal of the American statistical Association, 95 (2000), pp. 1300–1304

work page 2000

[7] [7]

Grohs, F

P. Grohs, F. Hornung, A. Jentzen, and P. Von Wurstemberger , A proof that artifi- cial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations , vol. 284, American Mathematical Society, 2023

work page 2023

[8] [8]

J. Han, A. Jentzen, et al. , Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations , Commu- nications in mathematics and statistics, 5 (2017), pp. 349–380

work page 2017

[9] [9]

Han and J

J. Han and J. Long , Convergence of the deep bsde method for coupled fbsdes , Probability, Uncertainty and Quantitative Risk, 5 (2020), p. 5

work page 2020

[10] [10]

Counterparty Risk Valuation: A Marked Branching Diffusion Approach

P. Henry-Labordere, Counterparty risk valuation: A marked branching diffusion approach , arXiv preprint arXiv:1203.2369, (2012)

work page internal anchor Pith review Pith/arXiv arXiv 2012

[11] [11]

Henry-Labord`ere, N

P. Henry-Labord`ere, N. Oudjane, X. Tan, N. Touzi, and X. Warin , Branching dif- fusion representation of semilinear pdes and monte carlo approximation , 55 1 ANNALES DE L’INSTITUT HENRI POINCAR ´E PROBABILIT ´ES ET STATISTIQUES Vol. 55, No. 1 (February, 2019) 1–607, 55 (2019), pp. 184–210

work page 2019

[12] [12]

Henry-Labordere, X

P. Henry-Labordere, X. Tan, and N. Touzi , A numerical algorithm for a class of bsdes via the branching process , Stochastic Processes and their Applications, 124 (2014), pp. 1112– 1140

work page 2014

[13] [13]

Hofgard, J

W. Hofgard, J. Sun, and A. Cohen , Convergence of the deep galerkin method for mean field control problems, arXiv preprint arXiv:2405.13346, (2024)

work page arXiv 2024

[14] [14]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are univer- sal approximators, Neural networks, 2 (1989), pp. 359–366

work page 1989

[15] [15]

Hur´e, H

C. Hur´e, H. Pham, and X. Warin, Deep backward schemes for high-dimensional nonlinear pdes, Mathematics of Computation, 89 (2020), pp. 1547–1579. 16

work page 2020

[16] [16]

Hutzenthaler, A

M. Hutzenthaler, A. Jentzen, T. Kruse, et al. , Multilevel picard iterations for solving smooth semilinear parabolic heat equations, Partial Differential Equations and Applications, 2 (2021), pp. 1–31

work page 2021

[17] [17]

X. Ji, Y. Jiao, X. Lu, P. Song, and F. Wang , Deep ritz method for elliptical multiple eigenvalue problems, Journal of Scientific Computing, 98 (2024), p. 48

work page 2024

[18] [18]

Katanforoosh, D

K. Katanforoosh, D. Kunin, and J. Ma , Parameter optimization in neural networks , 2019

work page 2019

[19] [19]

Kiefer and J

J. Kiefer and J. Wolfowitz , Stochastic estimation of the maximum of a regression func- tion, The Annals of Mathematical Statistics, (1952), pp. 462–466

work page 1952

[20] [20]

X. Li, Z. Xia, and H. Zhang , Cauchy activation function and xnet , arXiv preprint arXiv:2409.19221, (2024)

work page arXiv 2024

[21] [21]

X. Li, X. Zheng, and Z. Xia , Enhancing neural function approximation: The xnet outper- forming kan, arXiv preprint arXiv:2501.18959, (2025)

work page arXiv 2025

[22] [22]

D. C. Liu and J. Nocedal, On the limited memory bfgs method for large scale optimization , Mathematical programming, 45 (1989), pp. 503–528

work page 1989

[23] [23]

Mishra and R

S. Mishra and R. Molinaro, Estimates on the generalization error of physics-informed neu- ral networks for approximating a class of inverse problems for pdes , IMA Journal of Numerical Analysis, 42 (2022), pp. 981–1022

work page 2022

[24] [24]

Pardoux and S

E. Pardoux and S. Peng , Backward stochastic differential equations and quasilinear parabolic partial differential equations, in Stochastic Partial Differential Equations and Their Applications: Proceedings of IFIP WG 7/1 International Conference University of North Car- olina at Charlotte, NC June 6–8, 1991, Springer, 2005, pp. 200–217

work page 1991

[25] [25]

Pardoux and S

E. Pardoux and S. Tang, Forward-backward stochastic differential equations and quasilinear parabolic pdes, Probability theory and related fields, 114 (1999), pp. 123–150

work page 1999

[26] [26]

Quarteroni and A

A. Quarteroni and A. Valli , Numerical approximation of partial differential equations , vol. 23, Springer Science & Business Media, 2008

work page 2008

[27] [27]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis , Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics, 378 (2019), pp. 686–707

work page 2019

[28] [28]

S. J. Reddi, S. Kale, and S. Kumar , On the convergence of adam and beyond , arXiv preprint arXiv:1904.09237, (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1904

[29] [29]

Z. Shen, H. Yang, and S. Zhang , Neural network approximation: Three hidden layers are enough, Neural Networks, 141 (2021), pp. 160–173

work page 2021

[30] [30]

Y. Shin, J. Darbon, and G. E. Karniadakis , On the convergence of physics in- formed neural networks for linear second-order elliptic and parabolic type pdes , arXiv preprint arXiv:2004.01806, (2020)

work page arXiv 2004

[31] [31]

Sirignano and K

J. Sirignano and K. Spiliopoulos , Dgm: A deep learning algorithm for solving partial differential equations, Journal of computational physics, 375 (2018), pp. 1339–1364. 17

work page 2018

[32] [32]

Sobo´l, Quasi-monte carlo methods, Progress in Nuclear Energy, 24 (1990), pp

I. Sobo´l, Quasi-monte carlo methods, Progress in Nuclear Energy, 24 (1990), pp. 55–61

work page 1990

[33] [33]

S. T. Tokdar and R. E. Kass , Importance sampling: a review , Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010), pp. 54–60

work page 2010

[34] [34]

Wang and L

Y. Wang and L. Zhong, Nas-pinn: neural architecture search-guided physics-informed neural network for solving pdes , Journal of Computational Physics, 496 (2024), p. 112603

work page 2024

[35] [35]

J. Xiao, F. Fu, and X. Wang, Deep learning based on randomized quasi-monte carlo method for solving linear kolmogorov partial differential equation , Journal of Computational and Ap- plied Mathematics, (2024), p. 116088

work page 2024

[36] [36]

Yu et al

B. Yu et al. , The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics, 6 (2018), pp. 1–12

work page 2018

[37] [37]

Zhang and J

J. Zhang and J. Zhang , Backward stochastic differential equations, Springer, 2017. 18

work page 2017