Muon achieves higher storage capacity than SGD and matches Newton's method in one-step recovery rates for associative memory under power-law distributions, while saturating at larger critical batch sizes and showing faster initial multi-step dynamics.
A dynamical model of neural scaling laws
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A two-level DMFT tracks bulk and outlier spectral dynamics in wide networks, predicting width-consistent outlier growth and hyperparameter transfer under muP scaling for deep linear nets while noting bulk restructuring for large-output tasks.
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.
citing papers explorer
-
Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory
Muon achieves higher storage capacity than SGD and matches Newton's method in one-step recovery rates for associative memory under power-law distributions, while saturating at larger critical batch sizes and showing faster initial multi-step dynamics.
-
Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer
A two-level DMFT tracks bulk and outlier spectral dynamics in wide networks, predicting width-consistent outlier growth and hyperparameter transfer under muP scaling for deep linear nets while noting bulk restructuring for large-output tasks.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.