The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.
arXiv preprint arXiv:2006.07232 , year =
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
Bifurcations cause sNTK to reduce to a dominant rank-one channel matching normal forms, collapsing effective rank and funneling gradient descent into critical dynamical directions.
The authors compare multiple methods for incorporating action information into RNN state updates for RL and report empirical results on illustrative domains.
Online RNNs (RTRL, SnAp-1) beat linear filters and transformers at medium-to-long horizon forecasting of PCA respiratory motion weights in two cine-MRI datasets, yielding sub-1.4 mm and sub-2.8 mm geometric errors.
citing papers explorer
-
The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning
The global empirical NTK for finite-width networks has a universal Kronecker-core form that makes it structurally low-rank and biases gradient descent toward dominant modes of joint input-hidden activity.
-
State-Space NTK Collapse Near Bifurcations
Bifurcations cause sNTK to reduce to a dominant rank-one channel matching normal forms, collapsing effective rank and funneling gradient descent into critical dynamical directions.
-
Investigating Action Encodings in Recurrent Neural Networks in Reinforcement Learning
The authors compare multiple methods for incorporating action information into RNN state updates for RL and report empirical results on illustrative domains.
-
Frame forecasting in cine MRI using the PCA respiratory motion model: comparing recurrent neural networks trained online and transformers
Online RNNs (RTRL, SnAp-1) beat linear filters and transformers at medium-to-long horizon forecasting of PCA respiratory motion weights in two cine-MRI datasets, yielding sub-1.4 mm and sub-2.8 mm geometric errors.