Gradient Scaling Effects in Adaptive Spectral PINNs for Stiff Nonlinear ODEs

Isabela M. Yepes; Pavlos Protopapas

arxiv: 2605.04502 · v1 · submitted 2026-05-06 · 💻 cs.LG

Gradient Scaling Effects in Adaptive Spectral PINNs for Stiff Nonlinear ODEs

Isabela M. Yepes , Pavlos Protopapas This is my paper

Pith reviewed 2026-05-08 16:39 UTC · model grok-4.3

classification 💻 cs.LG

keywords PINNsstiff ODEsspectral methodsgradient scalinginitial condition embeddingadaptive Fourier featuresphysics-informed neural networksoptimization conditioning

0 comments

The pith

The choice of initial-condition gating function induces time-dependent gradient scaling that alters optimization dominance in adaptive spectral PINNs for stiff ODEs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how initial-condition embeddings affect training reliability in Physics-Informed Neural Networks on stiff nonlinear dynamical systems. It shows that the gating function used to enforce initial conditions produces explicit time-dependent scaling of gradients during backpropagation. This scaling interacts with spectral parameterizations, creating stiffness-dependent shifts in which gating strategy performs better. Experiments on a nonlinear stiff spring-pendulum ODE reveal that exponential gating often yields lower error at moderate stiffness while linear gating becomes preferable at higher stiffness, with the pattern consistent across error metrics and statistical tests. The results establish that initial-condition embeddings actively shape optimization conditioning rather than serving as neutral design choices.

Core claim

In adaptive spectral PINNs, the IC gating function induces explicit time-dependent gradient scaling which interacts with spectral representations during training and produces stiffness-dependent changes in relative dominance for exponential versus linear gates on a nonlinear stiff spring-pendulum ODE.

What carries the argument

The IC gating function (exponential or linear), which applies time-dependent modulation to enforce initial conditions and thereby generates explicit scaling factors on the gradients that interact with the adaptive Fourier spectral trunk.

If this is right

At moderate stiffness (k=20) exponential gating often produces lower relative L2 and maximum pointwise error but with higher seed-to-seed variability.
At higher stiffness (k=60) linear gating becomes preferable, with additional reversals appearing at still larger k values.
The trends hold for both fixed and adaptive spectral trunks and are supported by paired Wilcoxon signed-rank tests with Holm correction.
IC embeddings are therefore an active factor that materially shapes optimization conditioning in stiff regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the scaling mechanism is primary, similar stiffness-dependent reversals may appear when other IC enforcement techniques are used in non-spectral PINNs.
Monitoring gradient norms over training time could allow dynamic selection of the gating function without exhaustive search.
The interaction points to a possible design principle where gating is adapted to an online estimate of local stiffness.

Load-bearing premise

Observed performance differences between exponential and linear gates are caused by the induced gradient scaling rather than by other uncontrolled factors such as optimization hyperparameters or spectral adaptation details.

What would settle it

An experiment that artificially equalizes the gradient scaling factors between the two gating methods while holding all other training elements fixed and checks whether the stiffness-dependent performance reversal disappears.

Figures

Figures reproduced from arXiv: 2605.04502 by Isabela M. Yepes, Pavlos Protopapas.

**Figure 1.** Figure 1: summarizes performance across stiffness values under λIC = 50. Similar trends are observed under λIC = 0 (Appendix A.2 view at source ↗

**Figure 2.** Figure 2: Full-scale ReL2E versus stiffness for λIC = 50. 7 view at source ↗

**Figure 3.** Figure 3: Gate comparison under λIC = 0 (mean ± 95% CI). Adaptive models use 20 seeds at k = 20 and k = 60 (used for statistical testing) and 10 seeds elsewhere. (a) Relative L 2 error (ReL2E). (b) Max absolute error (MaxAE) view at source ↗

**Figure 4.** Figure 4: Gate comparison under λIC = 50 (mean ± 95% CI). Adaptive models use 20 seeds for each k ∈ {20, 30, 50, 60, 70, 80, 90, 100, 110, 120, 130} and 10 seeds elsewhere. 8 view at source ↗

read the original abstract

Physics-Informed Neural Networks (PINNs) often struggle to train reliably on stiff and oscillatory dynamical systems due to poor optimization conditioning. While prior work has emphasized representational remedies such as spectral parameterizations, the optimization implications of initial-condition (IC) embeddings in adaptive spectral PINNs have not been well characterized. In this work, we show that the choice of IC gating function induces explicit time-dependent gradient scaling, which interacts with spectral representations during training. Using a nonlinear stiff spring-pendulum ODE as a controlled benchmark, we compare exponential and linear IC gates in combination with fixed and adaptive Fourier spectral trunks. We observe stiffness-dependent changes in relative dominance for adaptive PINNs: at moderate stiffness ($k=20$), exponential gating often yields lower error but exhibits heterogeneous behavior across random seeds, whereas at higher stiffness ($k=60$), linear gating becomes preferable, with additional reversals observed at larger $k$. These trends hold for both relative $L^2$ error and maximum pointwise error and are confirmed by paired Wilcoxon signed-rank tests with Holm correction. Overall, our results demonstrate that IC embeddings are not a neutral design choice in PINNs: the induced gradient scaling materially shapes optimization conditioning in stiff regimes, with distinct sensitivity patterns in baseline and adaptive spectral models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports stiffness-dependent reversals in IC gate performance for adaptive spectral PINNs on a stiff ODE benchmark, but does not isolate gradient scaling as the cause.

read the letter

The main takeaway is that exponential and linear IC gates produce reversed relative errors at moderate versus high stiffness (k=20 vs k=60) in adaptive Fourier PINNs, with the pattern holding on both L2 and pointwise error and backed by Wilcoxon tests with correction. This is a clean empirical observation on a controlled nonlinear spring-pendulum example and extends prior spectral PINN work by checking both fixed and adaptive trunks across seeds. The authors also note that the effect appears in baseline models too, which is useful data for practitioners who already use spectral bases on stiff ODEs. The work is honest about the trends and avoids overclaiming universality. The soft spot is the causal story. The abstract and stress-test note indicate they derive that the gates create different time-dependent multipliers on residual gradients, yet the experiments contain no direct checks—no per-epoch gradient norm ratios, no reweighted-loss controls that neutralize the scaling factor, and no ablation that holds optimizer and adaptation schedules fixed while varying only the gate multiplier. Other gate properties (smoothness, saturation) could drive the observed reversals instead. Without those controls the mechanism claim stays plausible but unproven. This paper is for people already running or extending spectral PINNs on stiff dynamics who want a practical design note on IC embeddings. It is not a foundational advance, but the benchmark results are reproducible enough and the statistical reporting is solid, so it deserves a serious referee rather than a desk reject. The empirical part stands on its own even if the gradient-scaling interpretation needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an empirical study on Gradient Scaling Effects in Adaptive Spectral PINNs for Stiff Nonlinear ODEs. It derives that exponential versus linear initial-condition (IC) gating functions induce distinct time-dependent multipliers on residual loss gradients. Using a nonlinear stiff spring-pendulum ODE benchmark, the work compares these gates in combination with fixed and adaptive Fourier spectral trunks. It reports stiffness-dependent reversals in relative dominance (exponential often lower error at k=20 with heterogeneous seed behavior; linear preferable at k=60, with further reversals at larger k) for both relative L² error and maximum pointwise error, supported by paired Wilcoxon signed-rank tests with Holm correction. The central claim is that IC embeddings are not neutral design choices because the induced gradient scaling materially shapes optimization conditioning in stiff regimes.

Significance. If the causal attribution to gradient scaling holds, the result is significant for PINN design: it shows that seemingly minor IC embedding choices can reverse performance trends across stiffness levels and interact with spectral adaptation, offering concrete guidance for stiff dynamical systems where standard PINNs fail. The statistical testing and dual error metrics add rigor to the benchmark. The work also highlights an under-characterized optimization aspect of spectral PINNs beyond representational capacity.

major comments (2)

[Results] Results section: the reported stiffness-dependent performance reversals (exponential vs linear gates at k=20 vs k=60) are not isolated from confounding factors. No ablation reweights the loss to neutralize the gating-induced scaling factor, no per-epoch gradient-norm logs confirm the predicted time-dependent multiplier ratios, and no controls hold spectral adaptation schedule and optimizer hyperparameters fixed while varying only the scaling mechanism.
[Methods] Methods section: the comparison of exponential and linear IC gates does not rule out alternative explanations for the observed reversals, such as differences in gate smoothness, saturation behavior, or unintended interactions with the adaptive frequency selection procedure, rather than the claimed explicit gradient scaling.

minor comments (2)

[Abstract] The abstract states 'additional reversals observed at larger k' without specifying the exact k values tested or showing the full trend in a figure or table.
Notation for the time-dependent multipliers induced by each gate should be introduced with an equation number in the derivation section for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below, clarifying our experimental controls and theoretical derivations while indicating revisions that will strengthen the presentation.

read point-by-point responses

Referee: [Results] Results section: the reported stiffness-dependent performance reversals (exponential vs linear gates at k=20 vs k=60) are not isolated from confounding factors. No ablation reweights the loss to neutralize the gating-induced scaling factor, no per-epoch gradient-norm logs confirm the predicted time-dependent multiplier ratios, and no controls hold spectral adaptation schedule and optimizer hyperparameters fixed while varying only the scaling mechanism.

Authors: We agree that further isolation of the scaling effect would be beneficial. Our design already fixes the spectral adaptation schedule, optimizer hyperparameters, network architecture, and all other training elements, varying solely the IC gate function between exponential and linear forms. The Methods section derives the explicit time-dependent multiplier on residual gradients induced by each gate. To address the request for confirmation, we will add per-epoch gradient-norm plots in the revised results section. A loss-reweighting ablation to neutralize the scaling is not straightforward, as it would fundamentally alter the optimization objective; we will instead add a discussion of this limitation and emphasize that the fixed-trunk controls already isolate the scaling while holding adaptation fixed. revision: partial
Referee: [Methods] Methods section: the comparison of exponential and linear IC gates does not rule out alternative explanations for the observed reversals, such as differences in gate smoothness, saturation behavior, or unintended interactions with the adaptive frequency selection procedure, rather than the claimed explicit gradient scaling.

Authors: The Methods section derives the gradient scaling explicitly as a multiplicative, time-dependent factor arising from the functional form of the gate applied to the IC embedding. Smoothness and saturation differences are intrinsic to how each gate generates this scaling factor rather than separate confounds. To rule out interactions with adaptation, we already include fixed (non-adaptive) Fourier trunk results, which exhibit the same stiffness-dependent reversals. We will revise the Methods section to more prominently highlight this control experiment and the derivation that isolates the scaling mechanism from other factors. revision: partial

Circularity Check

0 steps flagged

Empirical benchmark study with no load-bearing derivation that reduces to its inputs

full rationale

The paper is an empirical comparison of exponential vs. linear IC gating functions in adaptive spectral PINNs on a stiff ODE benchmark. It reports observed performance reversals at different stiffness levels (k=20 vs k=60) supported by Wilcoxon tests, without any derivation chain, fitted-parameter predictions, or self-citation that defines the central result by construction. No equations or claims reduce the reported gradient-scaling interaction to an input by definition; the work treats the scaling as an observed mechanism whose causal isolation is left to future controls. This is a standard non-circular empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical comparison study. No free parameters are fitted to produce the central claim; the stiffness parameter k is an input to the benchmark ODE. No new axioms or invented entities are introduced.

pith-pipeline@v0.9.0 · 5524 in / 1177 out tokens · 27087 ms · 2026-05-08T16:39:46.333403+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Archives of Computational Methods in Engineering , year =

Stiff-PDEs and Physics-Informed Neural Networks , author =. Archives of Computational Methods in Engineering , year =

work page
[2]

Advances in Neural Information Processing Systems , year =

Neural Tangent Kernel: Convergence and Generalization in Neural Networks , author =. Advances in Neural Information Processing Systems , year =

work page
[3]

Journal of Computational Physics , year =

When and Why PINNs Fail to Train: A Neural Tangent Kernel Perspective , author =. Journal of Computational Physics , year =

work page
[6]

IEEE Transactions on Neural Networks , year =

Artificial neural networks for solving ordinary and partial differential equations , author =. IEEE Transactions on Neural Networks , year =

work page
[7]

Mathematical & Computational Applications , volume =

Error Estimates and Generalized Trial Constructions for Solving ODEs Using Physics-Informed Neural Networks , author =. Mathematical & Computational Applications , volume =

work page
[8]

Error estimates and generalized trial constructions for solving odes using physics-informed neural networks

Atmane Babni, Ismail Jamiai, and José Alberto Rodrigues. Error estimates and generalized trial constructions for solving odes using physics-informed neural networks. Mathematical & Computational Applications, 30 0 (6): 0 127, 2025

work page 2025
[9]

Neural tangent kernel: Convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Cl \'e ment Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems, 2018

work page 2018
[10]

Lagaris, Aristidis Likas, and Dimitrios I

Isaac E. Lagaris, Aristidis Likas, and Dimitrios I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 1998

work page 1998
[11]

On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed ne ural networks,

Sifan Wang, Hanwen Wang, and Paris Perdikaris. On the eigenvector bias of fourier feature networks. arXiv preprint arXiv:2012.10047, 2020

work page arXiv 2012
[12]

When and why pinns fail to train: A neural tangent kernel perspective

Sifan Wang, Yujie Teng, and Paris Perdikaris. When and why pinns fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 2021

work page 2021
[13]

Separated-variable spectral neural networks: a physics-informed learn- ing approach for high-frequency pdes.arXiv preprint arXiv:2508.00628, 2025

Xiaodong Xiong, Zhen Zhang, Rui Hu, Cheng Gao, and Zhi Deng. Separated-variable spectral neural networks: A physics-informed learning approach for high-frequency pdes. arXiv preprint arXiv:2508.00628, 2025

work page arXiv 2025

[1] [1]

Archives of Computational Methods in Engineering , year =

Stiff-PDEs and Physics-Informed Neural Networks , author =. Archives of Computational Methods in Engineering , year =

work page

[2] [2]

Advances in Neural Information Processing Systems , year =

Neural Tangent Kernel: Convergence and Generalization in Neural Networks , author =. Advances in Neural Information Processing Systems , year =

work page

[3] [3]

Journal of Computational Physics , year =

When and Why PINNs Fail to Train: A Neural Tangent Kernel Perspective , author =. Journal of Computational Physics , year =

work page

[4] [6]

IEEE Transactions on Neural Networks , year =

Artificial neural networks for solving ordinary and partial differential equations , author =. IEEE Transactions on Neural Networks , year =

work page

[5] [7]

Mathematical & Computational Applications , volume =

Error Estimates and Generalized Trial Constructions for Solving ODEs Using Physics-Informed Neural Networks , author =. Mathematical & Computational Applications , volume =

work page

[6] [8]

Error estimates and generalized trial constructions for solving odes using physics-informed neural networks

Atmane Babni, Ismail Jamiai, and José Alberto Rodrigues. Error estimates and generalized trial constructions for solving odes using physics-informed neural networks. Mathematical & Computational Applications, 30 0 (6): 0 127, 2025

work page 2025

[7] [9]

Neural tangent kernel: Convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Cl \'e ment Hongler. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems, 2018

work page 2018

[8] [10]

Lagaris, Aristidis Likas, and Dimitrios I

Isaac E. Lagaris, Aristidis Likas, and Dimitrios I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 1998

work page 1998

[9] [11]

On the eigenvector bias of Fourier feature networks: From regression to solving multi-scale PDEs with physics-informed ne ural networks,

Sifan Wang, Hanwen Wang, and Paris Perdikaris. On the eigenvector bias of fourier feature networks. arXiv preprint arXiv:2012.10047, 2020

work page arXiv 2012

[10] [12]

When and why pinns fail to train: A neural tangent kernel perspective

Sifan Wang, Yujie Teng, and Paris Perdikaris. When and why pinns fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 2021

work page 2021

[11] [13]

Separated-variable spectral neural networks: a physics-informed learn- ing approach for high-frequency pdes.arXiv preprint arXiv:2508.00628, 2025

Xiaodong Xiong, Zhen Zhang, Rui Hu, Cheng Gao, and Zhi Deng. Separated-variable spectral neural networks: A physics-informed learning approach for high-frequency pdes. arXiv preprint arXiv:2508.00628, 2025

work page arXiv 2025