Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

Bruno Loureiro; Emanuele Troiani; Florent Krzakala; Julius Girardin; Lenka Zdeborov\'a; Leonardo Defilippis; Vittorio Erba; Yizhou Xu

arxiv: 2509.24882 · v2 · pith:EQ5GSC7Xnew · submitted 2025-09-29 · 💻 cs.LG · cond-mat.dis-nn· cs.AI· stat.ML

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

Leonardo Defilippis , Yizhou Xu , Julius Girardin , Emanuele Troiani , Vittorio Erba , Lenka Zdeborov\'a , Bruno Loureiro , Florent Krzakala This is my paper

classification 💻 cs.LG cond-mat.dis-nncs.AIstat.ML

keywords scalingneurallawslearninganalysisempiricalfeaturenetwork

0 comments

read the original abstract

Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks
stat.ML 2026-05 unverdicted novelty 7.0

In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).
Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer
cond-mat.dis-nn 2026-05 unverdicted novelty 7.0

A two-level DMFT predicts width-consistent outlier escape and hyperparameter transfer under μP in deep networks, with bulk restructuring dominating for tasks with many outputs.
A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights
stat.ML 2026-05 conditional novelty 6.0

Neural networks prioritize amplitude over phase in Fourier space during training on translation-invariant data; power-law spectra accelerate phase learning despite not aiding classification.
Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer
cond-mat.dis-nn 2026-05 unverdicted novelty 6.0

A two-level DMFT tracks bulk and outlier spectral dynamics in wide networks, predicting width-consistent outlier growth and hyperparameter transfer under muP scaling for deep linear nets while noting bulk restructurin...
Asymmetric Scaling Laws from Sparse Features
stat.ML 2026-05 unverdicted novelty 5.0

A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction
cs.LG 2026-03 unverdicted novelty 5.0

HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.
There Will Be a Scientific Theory of Deep Learning
stat.ML 2026-04 unverdicted novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...