An Optimisation Framework for the Well-Conditioned Training of Physics-Informed Neural Networks
Pith reviewed 2026-07-03 16:51 UTC · model grok-4.3
The pith
DSGNAR optimization reaches relative errors of 3e-16 for physics-informed neural networks on PDEs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DSGNAR couples a doubly-sketched Gauss-Newton model with a novel strategy that carefully controls both regularisation and step length. Across nonlinear, chaotic, multi-scale, high-dimensional, and Navier-Stokes problems the framework attains relative L2 errors as low as 3 times 10 to the minus 16 in double precision, improves contemporary results by five orders of magnitude on Burgers equation and eight orders on a high-dimensional Poisson problem, and remains markedly faster. In single precision, solutions at the limit of round-off error are obtained very quickly, for example Burgers equation to relative L2 of 4.75 times 10 to the minus 7 in under ten seconds. The framework is robust to arc
What carries the argument
Doubly-Sketched Gauss-Newton with Adaptive Ratio (DSGNAR), which approximates the Hessian via sketching and adaptively tunes regularization and step length to stabilize the ill-conditioned PINN loss landscape.
If this is right
- PINN solutions to standard nonlinear PDEs can reach limits of double-precision round-off.
- High-dimensional Poisson problems become solvable to eight orders better accuracy than prior PINN methods.
- Single-precision training can still deliver near-round-off results on Burgers-type equations in seconds.
- The same framework works across varied network architectures without retuning.
- Training time remains lower than competing first-order or other second-order PINN optimizers while achieving the accuracy gains.
Where Pith is reading between the lines
- The sketching-plus-adaptive-ratio pattern may transfer to other ill-conditioned scientific machine-learning tasks beyond PINNs.
- If the method scales to larger 3-D time-dependent Navier-Stokes cases, it could support real-time surrogate modeling in engineering workflows.
- The observed robustness to arithmetic precision suggests the framework could be useful on hardware with limited floating-point support.
Load-bearing premise
The assumption that the doubly-sketched Gauss-Newton model combined with the adaptive ratio strategy will reliably control the severe ill-conditioning of the PINN loss landscape across the tested problem classes without introducing new instabilities or requiring problem-specific tuning.
What would settle it
A run on the canonical Burgers equation following the paper's stated procedure and hyperparameters that yields relative L2 error larger than 10 to the minus 10 in double precision.
Figures
read the original abstract
Physics-informed neural networks (PINNs) have emerged as a promising route to solve partial differential equations, yet they have struggled to reach the precision of classical solvers. The obstacle is increasingly understood to be one of optimisation, owing to the severely ill-conditioned loss landscape. We present $\textbf{DSGNAR}$: Doubly-Sketched Gauss-Newton with Adaptive Ratio, a scalable second-order optimisation framework that confronts this ill-conditioning and, in doing so, obtains unprecedented accuracy and speed. $\textbf{DSGNAR}$ couples a doubly-sketched Gauss-Newton model with a novel strategy that carefully controls both regularisation and step length. Across a suite of problems spanning nonlinear, chaotic, multi-scale, high-dimensional, and Navier-Stokes, the framework greatly improves on the state of the art: able to attain relative $\ell_2$ errors as low as $3\times10^{-16}$ in double precision, improve contemporary results by five orders of magnitude on the canonical Burgers' equation, and as much as eight orders on a high-dimensional Poisson problem, while remaining markedly faster. We further show that, in single precision, solutions at the limit of round-off error can be obtained very quickly: Burgers' equation to $\ell_2^{\text{rel}} = 4.75 \times 10^{-7}$ in under ten seconds. The framework is also robust to the choice of architecture, arithmetic precision, and initial hyperparameters. The code is available at https://www.github.com/wephy/physics-informed-neural-networks
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DSGNAR, a doubly-sketched Gauss-Newton optimization framework augmented with an adaptive ratio strategy for regularization and step-length control. It targets the severely ill-conditioned loss landscapes of PINNs and reports relative ℓ₂ errors down to 3×10^{-16} in double precision, five-order-of-magnitude gains on Burgers' equation, eight-order gains on a high-dimensional Poisson problem, round-off-limited accuracy in single precision within seconds, and robustness across architectures, precisions, and initial hyperparameters. Code is released.
Significance. If the empirical claims are supported by the necessary ablations, error-bar statistics, and evidence that the adaptive rule generalizes without implicit problem-dependent tuning, the work would constitute a notable advance in PINN optimization, potentially rendering PINNs competitive with classical solvers on accuracy while retaining their flexibility. The open-source code strengthens reproducibility.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the reported accuracy numbers (e.g., 3×10^{-16}, five- and eight-order improvements) are presented without error-bar analysis, ablation isolating the doubly-sketched Gauss-Newton component from the adaptive-ratio component, or verification that the same hyper-parameters were not re-used in the performance metric. These omissions are load-bearing for the central claim that the combined framework reliably controls ill-conditioning.
- [§3.2] §3.2 (Adaptive Ratio Strategy): the adaptation rule is asserted to control regularization and step length robustly across nonlinear/chaotic/multi-scale/high-dimensional/Navier-Stokes problems without new instabilities or problem-specific tuning, yet the manuscript supplies neither a convergence analysis of the ratio thresholds nor failure-mode ablations. This directly underpins the robustness claim.
minor comments (2)
- [Table 2] Table 2: the single-precision Burgers' timing result would be clearer if wall-clock time were reported alongside iteration count.
- [§2] Notation in §2: the distinction between the two sketching matrices should be made explicit at first use to avoid ambiguity in later equations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point-by-point below, agreeing where revisions are needed to strengthen the empirical support for our claims.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the reported accuracy numbers (e.g., 3×10^{-16}, five- and eight-order improvements) are presented without error-bar analysis, ablation isolating the doubly-sketched Gauss-Newton component from the adaptive-ratio component, or verification that the same hyper-parameters were not re-used in the performance metric. These omissions are load-bearing for the central claim that the combined framework reliably controls ill-conditioning.
Authors: We agree that the central claims would be strengthened by error-bar statistics, component-wise ablations, and explicit clarification on hyperparameter usage. The current manuscript reports point estimates from individual runs without statistical error bars and does not include dedicated ablations that isolate the doubly-sketched Gauss-Newton step from the adaptive-ratio mechanism. Hyperparameters were chosen once on a small validation set and then held fixed for all reported comparisons; they were not re-tuned to the final performance metric. In revision we will add (i) error bars computed over multiple random initializations, (ii) ablation tables that disable each component in turn, and (iii) a short subsection confirming the fixed-hyperparameter protocol. These additions directly address the load-bearing concerns. revision: yes
-
Referee: [§3.2] §3.2 (Adaptive Ratio Strategy): the adaptation rule is asserted to control regularization and step length robustly across nonlinear/chaotic/multi-scale/high-dimensional/Navier-Stokes problems without new instabilities or problem-specific tuning, yet the manuscript supplies neither a convergence analysis of the ratio thresholds nor failure-mode ablations. This directly underpins the robustness claim.
Authors: The adaptive-ratio rule is presented as an empirical heuristic whose thresholds were observed to work across the tested suite. The manuscript contains no theoretical convergence analysis of the ratio thresholds, nor systematic failure-mode ablations that deliberately stress the rule outside the reported problem classes. We will add failure-mode experiments (e.g., extreme initializations, deliberately poor ratio thresholds, and additional Navier-Stokes variants) to the revised §3.2 and §4. A formal convergence analysis, however, lies outside the empirical scope of the present work. revision: partial
- A theoretical convergence analysis of the adaptive-ratio thresholds; this would require substantial new mathematical derivations beyond the empirical focus of the manuscript.
Circularity Check
No circularity: framework presented as independent optimisation method
full rationale
The abstract and description present DSGNAR as a general second-order framework (doubly-sketched Gauss-Newton plus adaptive ratio) whose performance claims are outcomes on external benchmark PDE problems. No equations, derivations, or claims reduce a reported accuracy or prediction to a fitted parameter or self-citation by construction. The method is described as robust across architectures and precisions without invoking prior self-authored uniqueness theorems or ansatzes that would make the central result tautological. This is the common case of a self-contained proposal evaluated against independent benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors
[Ail+09] Nir Ailon and Bernard Chazelle. “The Fast Johnson–Lindenstrauss Transform and Approximate Nearest Neighbors” . In: SIAM Journal on Computing 39.1 (June 2009), pp. 302–322. doi: 10.1 137/060673096. 27 [Ana+24] Sokratis J Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, and George Em Karni- adakis. “Residual-Based Attention in Physics-In...
-
[2]
Minimization of Functions Having Lipschitz Continuous First Partial Deriva- tives
issn: 3005-1436. doi: 10.1007/S44379-026- 00071-1. [Arm66] Larry Armijo. “Minimization of Functions Having Lipschitz Continuous First Partial Deriva- tives” . In:Pacific Journal of Mathematics 16.1 (1966), pp. 1–3. doi: 10.2140/pjm.1966.16.1. [Al-98] M Al-Baali. “Numerical Experience with a Class of Self-Scaling Quasi-Newton Algorithms” . In: Journal of O...
-
[3]
PMLR, June 2025, pp
Proceedings of Machine Learning Research. PMLR, June 2025, pp. 4005–4019. https://openreview.net/forum?id=bKsZomnmqn. [Bra+18] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Yash Katariya, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman- Milne, and Qiao Zhang. JAX: Composable Transformations o...
2025
-
[4]
Dedalus: A flexible framework for numerical simulations with spectral methods
http://github.com/jax-ml/jax. [Bur+20] Keaton J. Burns, Geoffrey M. Vasil, Jeffrey Oishi, Daniel Lecoanet, and Benjamin P. Brown. “Dedalus: A flexible framework for numerical simulations with spectral methods” . In: Physical Review Research 2.2 (Apr. 2020), p. 023068. doi: 10.1103/PhysRevResearch.2.023068. [Cao+25] Fujun Cao, Xiaobin Guo, Xinzheng Dong, a...
-
[5]
Exponential Time Differencing for Stiff Systems
doi: 10.1137/1.9780898719857. [Cox+02] S. M. Cox and P. C. Matthews. “Exponential Time Differencing for Stiff Systems” . In: Journal of Computational Physics 176.2 (2002), pp. 430–455. doi: 10.1006/jcph.2002.6995. [Dai+26] Chen-Yang Dai, Che-Chia Chang, Te-Sheng Lin, Ming-Chih Lai, and Chieh-Hsin Lai. TINNs: Time-Induced Neural Networks for Solving Time-D...
-
[6]
TINNs: Time-Induced Neural Networks for Solving Time-Dependent PDEs
doi: 10.48550/arXiv.2601.20361. [Dan+24] Felix Dangel, Johannes Müller, and Marius Zeinhofer. “Kronecker-Factored Approximate Cur- vature for Physics-Informed Neural Networks” . In: Advances in Neural Information Processing Systems. Vol
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.20361
-
[7]
Neural-Network-Based Approximations for Solving Partial Differential Equations
2024, pp. 34582–34636. doi: 10.48550/arXiv.2405.15603. 28 [Dis+94] M W M G Dissanayake and N Phan-Thien. “Neural-Network-Based Approximations for Solving Partial Differential Equations” . In: Communications in Numerical Methods in Engineering 10.3 (1994), pp. 195–201. doi: 10.1002/cnm.1640100303. [Don+21] Suchuan Dong and Naxian Ni. “A method for represen...
-
[8]
Monotone Piecewise Cubic Interpolation
isbn: 978-1903398005. [Fri+80] F N Fritsch and R E Carlson. “Monotone Piecewise Cubic Interpolation” . In: SIAM Journal on Numerical Analysis 17.2 (1980), pp. 238–246. issn: 0036-1429. doi: 10.1137/0717021. [Guz+25] Andrés Guzmán-Cordero, Felix Dangel, Gil Goldshlager, and Marius Zeinhofer. “Improving Energy Natural Gradient Descent through Woodbury, Mome...
-
[9]
2025, pp. 113870–113900. doi: 10.48550/arXiv.2505.12149. [Hai+93] Ernst Hairer, Syvert P. Nørsett, and Gerhard Wanner. Solving Ordinary Differential Equations I: Nonstiff Problems . 2nd ed. Vol
-
[10]
doi: 10.1007/978-3-540-78862-1 . [Hal+11] N Halko, P G Martinsson, and J A Tropp. “Finding Structure with Randomness: Probabilis- tic Algorithms for Constructing Approximate Matrix Decompositions” . In: SIAM Review 53.2 (2011), pp. 217–288. doi: 10.1137/090771806. [Jni+26] Anas Jnini, Flavio Vella, and Marius Zeinhofer. “Gauss-Newton Natural Gradient Desc...
-
[11]
Random Feature Maps for Dot Product Kernels
PMLR, 2012, pp. 583–591. doi: 10.48550/arXiv.1201.6530. [Kar+21] George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. “Physics-informed machine learning” . In: Nature Reviews Physics 2021 3:6 3.6 (June 2021), pp. 422–440. issn: 2522-5820. doi: 10.1038/s42254-021-00314-5 . [Kas+06] Aly Khan Kassam and Lloyd N Tref...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1201.6530 2012
-
[12]
Adam: A Method for Stochastic Optimization
doi: 10.48550/arXiv.1412.6980. [Kiy+25] Elham Kiyani, Khemraj Shukla, Jorge F Urbán, Jérôme Darbon, and George Em Karniadakis. “Optimizing the Optimizer for Physics-Informed Neural Networks and Kolmogorov-Arnold Net- works” . In: Computer Methods in Applied Mechanics and Engineering 446 (2025), p. 118308. doi: 10.1016/j.cma.2025.118308. [Kri+21] Aditi Kri...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980 2025
-
[13]
Artificial Neural Networks for Solving Ordinary and Partial Differential Equations
2021, pp. 26548–26560. doi: 10.48550/arX iv.2109.01050. [Lag+98] I E Lagaris, A Likas, and D I Fotiadis. “Artificial Neural Networks for Solving Ordinary and Partial Differential Equations” . In: IEEE Transactions on Neural Networks 9.5 (1998), pp. 987–
-
[14]
doi: 10.1109/72.712178. 29 [LeC+98] Yann LeCun, Leon Bottou, Genevieve B Orr, and Klaus -Robert Müller. “Efficient BackProp” . In: (1998), pp. 9–50. doi: 10.1007/3-540-49430-8_2 . [Lev44] Kenneth Levenberg. “A Method for the Solution of Certain Non-Linear Problems in Least Squares” . In:Quarterly of Applied Mathematics 2 (1944), pp. 164–168. doi: 10.1090/qam/1066
-
[15]
Revisiting PINNs: Generative Adversarial Physics-Informed Neural Networks and Point-Weighting Method
[Li+22] Wensheng Li, Chao Zhang, Chuncheng Wang, Hanting Guan, and Dacheng Tao. Revisiting PINNs: Generative Adversarial Physics-Informed Neural Networks and Point-Weighting Method. arXiv:2205.08754
-
[16]
Revisiting PINNs: Generative Adversarial Physics-Informed Neural Networks and Point-Weighting Method
doi: 10.48550/arXiv.2205.08754. [Mar63] Donald W Marquardt. “An Algorithm for Least-Squares Estimation of Nonlinear Parameters” . In: Journal of the Society for Industrial and Applied Mathematics 11.2 (1963), pp. 431–441. doi: 10.1137/0111030. [Mar+18] James Martens, Jimmy Ba, and Matt Johnson. “Kronecker-Factored Curvature Approximations for Recurrent Ne...
-
[17]
Randomized Numerical Linear Algebra: Founda- tions and Algorithms
PMLR, 2015, pp. 2408–2417. doi: 10.48550/arXiv.1503.05671. [Mar+20] Per-Gunnar Martinsson and Joel A Tropp. “Randomized Numerical Linear Algebra: Founda- tions and Algorithms” . In: Acta Numerica 29 (2020), pp. 403–572. doi: 10.1017/S0962492920 000021. [Mei+19] Michela Meister, Tamas Sarlos, and David Woodruff. “Tight Dimensionality Reduction for Sketchin...
-
[18]
doi: 10.5555/3454287.3455137. [Mos+23] Ben Moseley, Andrew Markham, and Tarje Nissen-Meyer. “Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equa- tions” . In: Advances in Computational Mathematics 49.4 (2023), p
-
[19]
Achieving High Accuracy with PINNs via Energy Natural Gradient Descent
issn: 1572-9044. doi: 10.1007/s10444-023-10065-9 . [Mül+23] Johannes Müller and Marius Zeinhofer. “Achieving High Accuracy with PINNs via Energy Natural Gradient Descent” . In: Proceedings of the 40th International Conference on Machine Learning. Vol
-
[20]
OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings
Proceedings of Machine Learning Research. PMLR, June 2023, pp. 25471– 25485. doi: 10.5555/3618408.3619465. [Nel+12] Jelani Nelson and Huy L. Nguyễn. “OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings” . In: IEEE Annual Symposium on Foundations of Computer Science (2012), pp. 117–126. issn: 02725428. doi: 10.1109/FOCS.2013.2...
-
[21]
Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence
doi: 10.1007/978-0-387-40065-5 . [Pil+17] Mert Pilanci and Martin J. Wainwright. “Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence” . In: SIAM Journal on Optimization 27.1 (Feb. 2017), pp. 205–245. issn: 10526234. doi: 10.1137/15M1021106. [Rai+19] M Raissi, P Perdikaris, and G E Karniadakis. “Physics-Informed Neur...
-
[22]
2020, pp. 7462–7473. doi: 10.48550/arXiv .2006.09661. [Tan+20] Matthew Tancik, P Srinivasan, B Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, R Ramamoorthi, J Barron, and Ren Ng. “Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains” . In: Neural Information Processing Systems (2020). doi: 10.48550/ar...
work page internal anchor Pith review doi:10.48550/arxiv 2020
-
[23]
When and why PINNs fail to train: A neu- ral tangent kernel perspective
doi: 10.48550/arXiv.2502.00604. [Wan+22] Sifan Wang, Xinling Yu, and Paris Perdikaris. “When and why PINNs fail to train: A neu- ral tangent kernel perspective” . In: Journal of Computational Physics 449 (June 2022). issn: 10902716. doi: 10.1016/j.jcp.2021.110768. [Wu+23] Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. “A comprehensive study of ...
-
[24]
These results prioritise accuracy, whilst computed with remarkable speed, achieving ℓrel 2 = 1.25 × 10−11 in 334.6 seconds
All other aspects of architecture and implementation are kept the same. These results prioritise accuracy, whilst computed with remarkable speed, achieving ℓrel 2 = 1.25 × 10−11 in 334.6 seconds. 41 Solution 7 | Wave Single precision 0.0 0.5 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 t PINN solution 0.0 0.5 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 t PINN absolute error 0 50 Wall...
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.