A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.
arXiv preprint arXiv:2510.24616 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
stat.ML 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.
citing papers explorer
-
Asymmetric Scaling Laws from Sparse Features
A sparse-activation model predicts double-descent loss with distinct under- and over-parameterized scaling exponents set by sparsity, plus a compute-optimal frontier favoring dataset growth.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.