pith. sign in

arxiv: 2502.06238 · v2 · submitted 2025-02-10 · 💻 cs.CE

XNet-Enhanced Deep BSDE Method and Numerical Analysis

Pith reviewed 2026-05-23 04:15 UTC · model grok-4.3

classification 💻 cs.CE
keywords Deep BSDEnon-LipschitzAllen-CahnHamilton-Jacobi-BellmanconvergenceXNethigh-dimensional PDEsemilinear parabolic PDE
0
0 comments X

The pith

Deep BSDE methods converge for non-Lipschitz generators in Allen-Cahn and HJB equations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that the Deep Backward Stochastic Differential Equation method converges even when the generator is not globally Lipschitz, as occurs in Allen-Cahn equations with cubic terms and Hamilton-Jacobi-Bellman equations with quadratic growth. The proof uses a bounded double-well lemma together with a truncated analysis of the backward stochastic differential equation. It also introduces XNet, a shallow network with linear parameter scaling in its depth, that keeps the approximation strength but cuts the optimization and run-time expense. Tests in one hundred dimensions back up the theory and quantify the efficiency improvement over usual networks. Readers should care because many dynamical systems in science produce semilinear PDEs that lie outside the Lipschitz setting, so the method becomes usable for a wider set of high-dimensional models.

Core claim

We establish the convergence theory for non-Lipschitz generators covering Allen-Cahn equations with cubic nonlinearity and HJB equations with quadratic gradient growth based on a bounded double-well lemma and a truncated-BSDE analysis. Computationally, we instantiate the framework with XNet, a shallow architecture with O(L) parameters that preserves strong approximation while substantially reducing optimization and computational cost.

What carries the argument

XNet shallow architecture with O(L) parameters, supported by bounded double-well lemma and truncated-BSDE analysis for non-Lipschitz convergence

If this is right

  • The method converges for Allen-Cahn equations with cubic nonlinearity.
  • The method converges for HJB equations with quadratic gradient growth.
  • XNet achieves strong approximation with far fewer parameters than standard networks.
  • Numerical tests on 100-dimensional problems confirm both the convergence rates and the cost savings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same truncation technique could apply to other PDEs whose nonlinearities grow faster than linear but stay bounded in certain ways.
  • Efficiency improvements from XNet may make it feasible to solve time-dependent problems in real time for applications in physics and engineering.
  • Further work could examine whether the approach scales to dimensions beyond 100 without loss of accuracy.

Load-bearing premise

The bounded double-well lemma holds and the truncated-BSDE analysis extends to the non-Lipschitz generators considered.

What would settle it

Running the Deep BSDE solver on an Allen-Cahn equation and observing that the approximation error fails to decrease as the number of time steps or network width increases would disprove the convergence claim.

Figures

Figures reproduced from arXiv: 2502.06238 by Xiaotao Zheng, Xingye Yue, Xin Li, Zhihong Xia.

Figure 1
Figure 1. Figure 1: The neural network architecture for Deep BSDE method. The network consists of multiple [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of Two Network Architectures for Solving the Allen-Cahn Equation under [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of Two Network Architectures for Solving the Allen-Cahn Equation under [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of Two Network Architectures for Solving the PricingDiffrate Equation under [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results of solving the Allen-Cahn Equation using the Deep BSDE method by XNet under [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of Two Network Architectures for Solving the PricingDiddrate under 10- [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Solving the PricingDiddrate Equation by XNet under various settings with 20-step-time [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

Semilinear parabolic partial differential equations (PDEs) are fundamental to modeling complex dynamical systems across scientific domains. The Deep Backward Stochastic Differential Equation (BSDE) method is a promising approach for high-dimensional PDEs; however, existing convergence results apply only to globally Lipschitz generators, excluding important cases such as Allen--Cahn and Hamilton--Jacobi--Bellman (HJB) equations. This paper presents both a theoretical and a computational advance for Deep BSDE methods. Theoretically, we establish the convergence theory for non--Lipschitz generators--covering Allen--Cahn equations with cubic nonlinearity and HJB equations with quadratic gradient growth--based on a bounded double--well lemma and a truncated-BSDE analysis within the Bouchard--Touzi--Zhang theory. Computationally, we instantiate the framework with XNet, a shallow architecture with $\mathcal O(L)$ parameters that preserves strong approximation while substantially reducing optimization and computational cost. Numerical experiments on 100--dimensional PDEs corroborate the predicted convergence behavior and demonstrate significant efficiency gains over standard feedforward implementations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to extend the convergence theory of the Deep BSDE method to semilinear parabolic PDEs with non-Lipschitz generators (Allen-Cahn with cubic nonlinearity and HJB with quadratic gradient growth) by combining a bounded double-well lemma with truncated-BSDE analysis inside the Bouchard-Touzi-Zhang framework; it further introduces the shallow XNet architecture (O(L) parameters) that preserves strong approximation while lowering optimization cost, and reports numerical experiments on 100-dimensional instances that corroborate the predicted rates and efficiency gains.

Significance. If the convergence statements close, the work would meaningfully enlarge the class of PDEs amenable to Deep BSDE solvers, directly covering models that arise in phase transitions and stochastic control. The XNet construction and the high-dimensional numerical corroboration would constitute concrete practical contributions.

major comments (1)
  1. [truncated-BSDE analysis within Bouchard-Touzi-Zhang theory] Truncated-BSDE analysis for HJB equations with quadratic gradient growth: the passage to the limit as the truncation level tends to infinity requires an a-priori bound on the difference between the truncated and original BSDEs that is uniform in the truncation parameter. The abstract does not indicate whether this bound is derived independently of the Lipschitz condition being relaxed or whether it relies on solution moments that have not yet been established; this step is load-bearing for the claimed extension of Bouchard-Touzi-Zhang theory.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying this load-bearing step in the convergence argument. We address the comment below.

read point-by-point responses
  1. Referee: [truncated-BSDE analysis within Bouchard-Touzi-Zhang theory] Truncated-BSDE analysis for HJB equations with quadratic gradient growth: the passage to the limit as the truncation level tends to infinity requires an a-priori bound on the difference between the truncated and original BSDEs that is uniform in the truncation parameter. The abstract does not indicate whether this bound is derived independently of the Lipschitz condition being relaxed or whether it relies on solution moments that have not yet been established; this step is load-bearing for the claimed extension of Bouchard-Touzi-Zhang theory.

    Authors: We agree that this uniformity is essential. In the manuscript (Section 3.2 and the proof of Theorem 3.5), the a-priori bound on the difference between truncated and original BSDEs is obtained from the bounded double-well lemma (Lemma 2.3) together with the moment estimates of the truncated processes; these estimates are derived before the limit is taken and hold uniformly in the truncation level by exploiting the quadratic growth structure and the specific form of the truncation, without invoking global Lipschitz continuity. The abstract condenses the overall strategy but does not spell out the independence of the bound. We will add a short clarifying sentence in the introduction (and, if space permits, the abstract) to make this explicit. revision: partial

Circularity Check

0 steps flagged

No significant circularity; extension of external BTZ theory via independent lemmas

full rationale

The paper's central claim is an extension of the Bouchard-Touzi-Zhang convergence theory to non-Lipschitz generators, achieved through a new bounded double-well lemma (for Allen-Cahn) and truncated-BSDE analysis (for HJB). No steps reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the derivation chain relies on external BTZ results plus the paper's own analytical additions. This is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no specific free parameters, axioms, or invented entities can be extracted or audited.

pith-pipeline@v0.9.0 · 5718 in / 972 out tokens · 48690 ms · 2026-05-23T04:15:57.801196+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 2 internal anchors

  1. [1]

    C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld, Deep splitting method for parabolic pdes, SIAM Journal on Scientific Computing, 43 (2021), pp. A3135–A3154

  2. [2]

    Y. Z. Bergman , Option pricing with differential interest rates , The Review of Financial Studies, 8 (1995), pp. 475–500

  3. [3]

    W. Chen, Z. Wang, and J. Zhou , Large-scale l-bfgs using mapreduce, Advances in neural information processing systems, 27 (2014)

  4. [4]

    Chen, S.-K

    Z. Chen, S.-K. Lai, and Z. Yang , At-pinn: Advanced time-marching physics-informed neural network for structural vibration analysis, Thin-Walled Structures, 196 (2024), p. 111423

  5. [5]

    Duchi, E

    J. Duchi, E. Hazan, and Y. Singer , Adaptive subgradient methods for online learning and stochastic optimization., Journal of machine learning research, 12 (2011)

  6. [6]

    A. E. Gelfand, Gibbs sampling, Journal of the American statistical Association, 95 (2000), pp. 1300–1304

  7. [7]

    Grohs, F

    P. Grohs, F. Hornung, A. Jentzen, and P. Von Wurstemberger , A proof that artifi- cial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations , vol. 284, American Mathematical Society, 2023

  8. [8]

    J. Han, A. Jentzen, et al. , Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations , Commu- nications in mathematics and statistics, 5 (2017), pp. 349–380

  9. [9]

    Han and J

    J. Han and J. Long , Convergence of the deep bsde method for coupled fbsdes , Probability, Uncertainty and Quantitative Risk, 5 (2020), p. 5

  10. [10]

    Counterparty Risk Valuation: A Marked Branching Diffusion Approach

    P. Henry-Labordere, Counterparty risk valuation: A marked branching diffusion approach , arXiv preprint arXiv:1203.2369, (2012)

  11. [11]

    Henry-Labord`ere, N

    P. Henry-Labord`ere, N. Oudjane, X. Tan, N. Touzi, and X. Warin , Branching dif- fusion representation of semilinear pdes and monte carlo approximation , 55 1 ANNALES DE L’INSTITUT HENRI POINCAR ´E PROBABILIT ´ES ET STATISTIQUES Vol. 55, No. 1 (February, 2019) 1–607, 55 (2019), pp. 184–210

  12. [12]

    Henry-Labordere, X

    P. Henry-Labordere, X. Tan, and N. Touzi , A numerical algorithm for a class of bsdes via the branching process , Stochastic Processes and their Applications, 124 (2014), pp. 1112– 1140

  13. [13]

    Hofgard, J

    W. Hofgard, J. Sun, and A. Cohen , Convergence of the deep galerkin method for mean field control problems, arXiv preprint arXiv:2405.13346, (2024)

  14. [14]

    Hornik, M

    K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are univer- sal approximators, Neural networks, 2 (1989), pp. 359–366

  15. [15]

    Hur´e, H

    C. Hur´e, H. Pham, and X. Warin, Deep backward schemes for high-dimensional nonlinear pdes, Mathematics of Computation, 89 (2020), pp. 1547–1579. 16

  16. [16]

    Hutzenthaler, A

    M. Hutzenthaler, A. Jentzen, T. Kruse, et al. , Multilevel picard iterations for solving smooth semilinear parabolic heat equations, Partial Differential Equations and Applications, 2 (2021), pp. 1–31

  17. [17]

    X. Ji, Y. Jiao, X. Lu, P. Song, and F. Wang , Deep ritz method for elliptical multiple eigenvalue problems, Journal of Scientific Computing, 98 (2024), p. 48

  18. [18]

    Katanforoosh, D

    K. Katanforoosh, D. Kunin, and J. Ma , Parameter optimization in neural networks , 2019

  19. [19]

    Kiefer and J

    J. Kiefer and J. Wolfowitz , Stochastic estimation of the maximum of a regression func- tion, The Annals of Mathematical Statistics, (1952), pp. 462–466

  20. [20]

    X. Li, Z. Xia, and H. Zhang , Cauchy activation function and xnet , arXiv preprint arXiv:2409.19221, (2024)

  21. [21]

    X. Li, X. Zheng, and Z. Xia , Enhancing neural function approximation: The xnet outper- forming kan, arXiv preprint arXiv:2501.18959, (2025)

  22. [22]

    D. C. Liu and J. Nocedal, On the limited memory bfgs method for large scale optimization , Mathematical programming, 45 (1989), pp. 503–528

  23. [23]

    Mishra and R

    S. Mishra and R. Molinaro, Estimates on the generalization error of physics-informed neu- ral networks for approximating a class of inverse problems for pdes , IMA Journal of Numerical Analysis, 42 (2022), pp. 981–1022

  24. [24]

    Pardoux and S

    E. Pardoux and S. Peng , Backward stochastic differential equations and quasilinear parabolic partial differential equations, in Stochastic Partial Differential Equations and Their Applications: Proceedings of IFIP WG 7/1 International Conference University of North Car- olina at Charlotte, NC June 6–8, 1991, Springer, 2005, pp. 200–217

  25. [25]

    Pardoux and S

    E. Pardoux and S. Tang, Forward-backward stochastic differential equations and quasilinear parabolic pdes, Probability theory and related fields, 114 (1999), pp. 123–150

  26. [26]

    Quarteroni and A

    A. Quarteroni and A. Valli , Numerical approximation of partial differential equations , vol. 23, Springer Science & Business Media, 2008

  27. [27]

    Raissi, P

    M. Raissi, P. Perdikaris, and G. E. Karniadakis , Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics, 378 (2019), pp. 686–707

  28. [28]

    S. J. Reddi, S. Kale, and S. Kumar , On the convergence of adam and beyond , arXiv preprint arXiv:1904.09237, (2019)

  29. [29]

    Z. Shen, H. Yang, and S. Zhang , Neural network approximation: Three hidden layers are enough, Neural Networks, 141 (2021), pp. 160–173

  30. [30]

    Y. Shin, J. Darbon, and G. E. Karniadakis , On the convergence of physics in- formed neural networks for linear second-order elliptic and parabolic type pdes , arXiv preprint arXiv:2004.01806, (2020)

  31. [31]

    Sirignano and K

    J. Sirignano and K. Spiliopoulos , Dgm: A deep learning algorithm for solving partial differential equations, Journal of computational physics, 375 (2018), pp. 1339–1364. 17

  32. [32]

    Sobo´l, Quasi-monte carlo methods, Progress in Nuclear Energy, 24 (1990), pp

    I. Sobo´l, Quasi-monte carlo methods, Progress in Nuclear Energy, 24 (1990), pp. 55–61

  33. [33]

    S. T. Tokdar and R. E. Kass , Importance sampling: a review , Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010), pp. 54–60

  34. [34]

    Wang and L

    Y. Wang and L. Zhong, Nas-pinn: neural architecture search-guided physics-informed neural network for solving pdes , Journal of Computational Physics, 496 (2024), p. 112603

  35. [35]

    J. Xiao, F. Fu, and X. Wang, Deep learning based on randomized quasi-monte carlo method for solving linear kolmogorov partial differential equation , Journal of Computational and Ap- plied Mathematics, (2024), p. 116088

  36. [36]

    Yu et al

    B. Yu et al. , The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics, 6 (2018), pp. 1–12

  37. [37]

    Zhang and J

    J. Zhang and J. Zhang , Backward stochastic differential equations, Springer, 2017. 18