In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).
The quantization model of neural scaling.Advances in Neural Information Processing Systems, 36:28699–28722
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.
citing papers explorer
-
Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks
In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).
-
Superposition Yields Robust Neural Scaling
Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.