Dynamics of neural scaling laws in random feature regression with powerlaw-distributed kernel eigenvalues

Jakob Kramp; Javed Lindner; Moritz Helias

arxiv: 2602.23039 · v2 · pith:V3OVCADLnew · submitted 2026-02-26 · ❄️ cond-mat.dis-nn

Dynamics of neural scaling laws in random feature regression with powerlaw-distributed kernel eigenvalues

Jakob Kramp , Javed Lindner , Moritz Helias This is my paper

classification ❄️ cond-mat.dis-nn

keywords dynamicsneurallearningbehaviorgaussiangeneralizationlawsnetwork

0 comments

read the original abstract

Training large neural networks exposes neural scaling laws for the generalization error, which points to a universal behavior across network architectures of learning in high dimensions. It was also shown that this effect persists in the limit of highly overparametrized networks as well as the Neural network Gaussian process limit. We here develop a principled understanding of the typical behavior of generalization in Neural Network Gaussian process regression dynamics. We derive a dynamical mean-field theory that captures the typical case learning dynamics: This allows us to unify multiple existing regimes of learning studied in the current literature, namely Bayesian inference on Gaussian processes, gradient flow with or without weight-decay, and stochastic Langevin training dynamics. Employing tools from statistical physics, the unified framework we derive in either of these cases yields an effective description of the high-dimensional microscopic behavior of networks dynamics in terms of lower dimensional order parameters. We show that collective training dynamics may be separated into the dynamics of N independent eigenmodes, those evolution equations are only coupled through collective response functions and a common statistics of an effective, independent noise. Our approach allows us to quantitatively explain the dynamics of the generalization error by linking spectral and dynamical properties of learning on data with power law spectra, including phenomena such as neural scaling laws and the effect of early stopping.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Asymmetric Scaling Laws from Sparse Features
stat.ML 2026-05 unverdicted novelty 5.0

A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.