Explaining neural scaling laws.Proceedings of the National Academy of Sciences, 121(27):e2311878121

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks

stat.ML · 2026-05-11 · unverdicted · novelty 7.0

In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).

A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Derives α^{-1/3} scaling for generalization error in online softmax classification from boundary layers in a teacher-student model.

Superposition Yields Robust Neural Scaling

cs.LG · 2025-05-15 · conditional · novelty 6.0

Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.

citing papers explorer

Showing 3 of 3 citing papers.

Sharp feature-learning transitions and Bayes-optimal neural scaling laws in extensive-width networks stat.ML · 2026-05-11 · unverdicted · none · ref 5
In extensive-width networks, features are recovered sequentially through sharp phase transitions, yielding an effective width k_c that unifies Bayes-optimal generalization error scaling as Θ(k_c d / n).
A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification cs.LG · 2026-05-21 · unverdicted · none · ref 7
Derives α^{-1/3} scaling for generalization error in online softmax classification from boundary layers in a teacher-student model.
Superposition Yields Robust Neural Scaling cs.LG · 2025-05-15 · conditional · none · ref 15
Strong superposition causes neural loss to scale as the inverse of model dimension due to geometric feature overlaps, explaining scaling laws for broad frequency distributions.

Explaining neural scaling laws.Proceedings of the National Academy of Sciences, 121(27):e2311878121

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer