Physics-Informed Neural Networks: A Didactic Derivation of the Complete Training Cycle
Pith reviewed 2026-05-10 03:33 UTC · model grok-4.3
The pith
A step-by-step manual derivation of the full training cycle for physics-informed neural networks is given using explicit calculations on a 22-parameter network.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Performing every stage of the training cycle by hand on a 1-3-3-1 network with 22 trainable parameters applied to a first-order initial-value problem shows that the composite physics-informed loss can be driven to a minimum that yields an approximate solution whose relative L2 error against the known analytic solution is 4.290 times 10 to the minus 4. The same calculations produce general recursive sensitivity-propagation relations that compute the required gradients for networks of any depth and connect directly to the automatic-differentiation engines used in practice.
What carries the argument
The sensitivity-propagation relations, which recursively apply the chain and product rules to propagate partial derivatives of the network output and its temporal derivative backward through each layer to obtain gradients with respect to all weights and biases.
Load-bearing premise
The specific hand calculations and product-rule handling performed for this 22-parameter network and ODE contain no algebraic or numerical errors and extend without modification to deeper networks or more complex physics constraints.
What would settle it
Direct numerical comparison, at every training step, between the hand-derived gradients for the 22 parameters and the gradients returned by automatic differentiation on the same network and loss; any discrepancy would show an error in the manual derivation.
read the original abstract
This paper is a step-by-step, self-contained guide to the complete training cycle of a Physics-Informed Neural Network (PINN) -- a topic that existing tutorials and guides typically delegate to automatic differentiation libraries without exposing the underlying algebra. Using a first-order initial value problem with a known analytical solution as a running example, we walk through every stage of the process: forward propagation of both the network output and its temporal derivative, evaluation of a composite loss function built from the ODE residual and the initial condition, backpropagation of gradients -- with particular attention to the product rule that arises in hidden layers -- and a gradient descent parameter update. Every calculation is presented with explicit, verifiable numerical values using a 1-3-3-1 multilayer perceptron with two hidden layers and 22 trainable parameters. From these concrete examples, we derive general recursive formulas -- expressed as sensitivity propagation relations -- that extend the gradient computation to networks of arbitrary depth, and we connect these formulas to the automatic differentiation engines used in practice. The trained network is then validated against the exact solution, achieving a relative $L^2$ error of $4.290 \times 10^{-4}$ using only the physics-informed loss, without any data from the true solution. A companion Jupyter/PyTorch notebook reproduces every manual calculation and the full training pipeline, providing mutual validation between hand-derived and machine-computed gradients.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper provides a self-contained, step-by-step derivation of the complete PINN training cycle (forward propagation of network output and derivative, composite loss from ODE residual plus initial condition, backpropagation with explicit product-rule handling, and gradient-descent update) for a concrete 1-3-3-1 MLP with 22 parameters on a first-order IVP with known analytic solution. Explicit numerical values are given for every step; general recursive sensitivity-propagation formulas are then derived from the concrete case to extend the gradient computation to arbitrary depth; the trained network is validated with a relative L² error of 4.290 × 10^{-4}; and a companion PyTorch notebook reproduces all hand calculations.
Significance. If the derivations hold, the work supplies a transparent educational resource that makes the algebraic mechanics of PINN training explicit rather than delegating them to automatic differentiation. The combination of hand-derived numerical values, direct verification against code, and the low error achieved on a linear ODE where network capacity is sufficient constitutes a reproducible, falsifiable demonstration that strengthens the didactic value.
major comments (1)
- [Derivation of general recursive formulas] The claim that the recursive sensitivity-propagation formulas extend the gradient computation to networks of arbitrary depth (derived in the section following the 1-3-3-1 example) rests on pattern extraction from the specific 22-parameter case without an accompanying inductive argument or numerical confirmation on a deeper architecture; this leaves open the possibility of algebraic or indexing errors when the formulas are applied beyond the toy network.
minor comments (2)
- [Validation against analytic solution] In the validation paragraph, the relative L² error is reported to four significant digits; adding the corresponding absolute L² error or pointwise maximum error would allow readers to assess the result against the scale of the analytic solution.
- The abstract states that every manual calculation is reproduced in the notebook; a brief table in the main text mapping each hand-computed quantity (e.g., specific partial derivatives) to the corresponding notebook cell would improve cross-verification.
Simulated Author's Rebuttal
We thank the referee for the positive summary and recommendation of minor revision. We address the single major comment below.
read point-by-point responses
-
Referee: [Derivation of general recursive formulas] The claim that the recursive sensitivity-propagation formulas extend the gradient computation to networks of arbitrary depth (derived in the section following the 1-3-3-1 example) rests on pattern extraction from the specific 22-parameter case without an accompanying inductive argument or numerical confirmation on a deeper architecture; this leaves open the possibility of algebraic or indexing errors when the formulas are applied beyond the toy network.
Authors: We agree that the general recursive sensitivity-propagation formulas are obtained by pattern generalization from the explicit 1-3-3-1 calculations rather than a formal inductive proof. In the revised manuscript we will insert a concise inductive argument establishing that the forward and backward sensitivity relations hold for arbitrary depth, starting from the base case already computed. We will also extend the companion notebook to include a numerical verification on a deeper architecture (e.g., 1-5-5-1), confirming that the same formulas produce correct gradients and comparable accuracy. These additions directly close the gap identified while preserving the paper's didactic focus on explicit algebra. revision: yes
Circularity Check
Derivation is self-contained from chain rule and explicit calculations
full rationale
The paper walks through the full PINN training cycle for a concrete 1-3-3-1 network using only the chain rule, product rule, and direct substitution into the network definition and composite loss. General recursive sensitivity formulas are obtained by pattern generalization from the finite explicit case, not by fitting or self-reference. Validation uses an independent analytic solution for the linear ODE, with all manual steps cross-checked by a companion PyTorch notebook. No self-citations, fitted inputs renamed as predictions, or ansatzes appear in the load-bearing steps. The reported L2 error is a post-training verification metric, not an input to the derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math The chain rule and product rule hold for the composite functions formed by the network layers and the loss.
- domain assumption The network output and its derivative can be computed exactly via the chosen activation and the explicit forward-pass formulas.
Reference graph
Works this paper leans on
-
[1]
M. Raissi and P. Perdikaris and G. E. Karniadakis , title =. J. Comput. Phys. , volume =
-
[2]
I. E. Lagaris and A. Likas and D. I. Fotiadis , title =. IEEE Trans. Neural Netw. , volume =
-
[3]
G. E. Karniadakis and I. G. Kevrekidis and L. Lu and P. Perdikaris and S. Wang and L. Yang , title =. Nat. Rev. Phys. , volume =
- [4]
-
[5]
S. Cuomo and V. S. Scientific machine learning through physics-informed neural networks:. J. Sci. Comput. , volume =
-
[6]
S. Wang and Y. Teng and P. Perdikaris , title =. SIAM J. Sci. Comput. , volume =
- [7]
- [8]
-
[9]
X. Glorot and Y. Bengio , title =. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS) , pages =
- [10]
-
[11]
V. Sitzmann and J. N. P. Martel and A. W. Bergman and D. B. Lindell and G. Wetzstein , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[12]
A. D. Jagtap and K. Kawaguchi and G. E. Karniadakis , title =. J. Comput. Phys. , volume =
-
[13]
R. L. Burden and J. D. Faires , title =
-
[14]
D. P. Kingma and J. Ba , title =. Proceedings of the 3rd International Conference on Learning Representations (ICLR) , year =
-
[15]
D. C. Liu and J. Nocedal , title =. Math. Program. , volume =
-
[16]
S. Wang and S. Sankaran and P. Perdikaris , title =. Comput. Methods Appl. Mech. Engrg. , volume =
- [17]
-
[18]
I. Thawon and D. Vo and T. Q. Bui and K. Rattanamongkhonkun and C. Chamroon and N. Tippayawong and Y. Mona and R. Wanison and P. Suttakul , title =. Comput. Model. Eng. Sci. , volume =. 2026 , doi =
work page 2026
- [19]
-
[20]
D. Katsikis and A. D. Muradova and G. Stavroulakis , title =. J. Adv. Appl. Comput. Math. , volume =
- [21]
-
[22]
A. Ogueda-Oliva and P. Seshaiyer , title =. Int. J. Math. Educ. Sci. Technol. , volume =
-
[23]
S. Wang and S. Sankaran and H. Wang and P. Perdikaris , title =. arXiv preprint , year =
- [24]
-
[25]
A. Almqvist , Fundamentals of physics-informed neural networks applied to solve the R eynolds boundary value problem , Lubricants, 9 (2021), p. 82
work page 2021
- [26]
-
[27]
R. L. Burden and J. D. Faires , Numerical Analysis , Cengage Learning, Boston, MA, 10th ed., 2015
work page 2015
- [28]
-
[29]
X. Glorot and Y. Bengio , Understanding the difficulty of training deep feedforward neural networks , in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 249--256
work page 2010
-
[30]
I. Goodfellow, Y. Bengio, and A. Courville , Deep Learning , MIT Press, Cambridge, MA, 2016
work page 2016
-
[31]
K. He, X. Zhang, S. Ren, and J. Sun , Delving deep into rectifiers: S urpassing human-level performance on ImageNet classification , in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026--1034
work page 2015
- [32]
-
[33]
A. D. Jagtap, K. Kawaguchi, and G. E. Karniadakis , Adaptive activation functions accelerate convergence in deep and physics-informed neural networks , J. Comput. Phys., 404 (2020), p. 109136
work page 2020
-
[34]
G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang , Physics-informed machine learning , Nat. Rev. Phys., 3 (2021), pp. 422--440
work page 2021
-
[35]
D. Katsikis, A. D. Muradova, and G. Stavroulakis , A gentle introduction to physics-informed neural networks, with applications in static rod and beam problems , J. Adv. Appl. Comput. Math., 9 (2022), pp. 103--128
work page 2022
-
[36]
D. P. Kingma and J. Ba , Adam: A method for stochastic optimization , in Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2015
work page 2015
-
[37]
I. E. Lagaris, A. Likas, and D. I. Fotiadis , Artificial neural networks for solving ordinary and partial differential equations , IEEE Trans. Neural Netw., 9 (1998), pp. 987--1000
work page 1998
-
[38]
D. C. Liu and J. Nocedal , On the limited memory BFGS method for large scale optimization , Math. Program., 45 (1989), pp. 503--528
work page 1989
-
[39]
L. McClenny and U. Braga-Neto , Self-adaptive physics-informed neural networks using a soft attention mechanism , J. Comput. Phys., 474 (2023), p. 111722
work page 2023
-
[40]
A. Ogueda-Oliva and P. Seshaiyer , Literate programming for motivating and teaching neural network-based approaches to solve differential equations , Int. J. Math. Educ. Sci. Technol., 55 (2024), pp. 2657--2685
work page 2024
- [41]
- [42]
-
[43]
V. Sitzmann, J. N. P. Martel, A. W. Bergman, D. B. Lindell, and G. Wetzstein , Implicit neural representations with periodic activation functions , in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 7462--7473
work page 2020
-
[44]
A. Tahimi , Supplementary code for: P hysics-informed neural networks: A didactic derivation of the complete training cycle , 2026, https://doi.org/10.5281/zenodo.19641577. https://github.com/Tahimi/PINN-Didactic-Training-Cycle
-
[45]
I. Thawon, D. Vo, T. Q. Bui, K. Rattanamongkhonkun, C. Chamroon, N. Tippayawong, Y. Mona, R. Wanison, and P. Suttakul , Physics-informed neural networks: C urrent progress and challenges in computational solid and structural mechanics , Comput. Model. Eng. Sci., 146 (2026), https://doi.org/10.32604/cmes.2026.077044
-
[46]
S. Wang, S. Sankaran, and P. Perdikaris , Respecting causality in forward and inverse physics-informed neural networks , Comput. Methods Appl. Mech. Engrg., 418 (2024), p. 116527
work page 2024
- [47]
-
[48]
S. Wang, Y. Teng, and P. Perdikaris , Understanding and mitigating gradient flow pathologies in physics-informed neural networks , SIAM J. Sci. Comput., 43 (2021), pp. A3055--A3081
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.