Advances in Neural Information Processing Systems , pages=

Training compute-optimal large language models , author=

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Prescriptive Scaling Laws for Data Constrained Training

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Prescriptive Scaling Laws for Data Constrained Training cs.LG · 2026-05-02 · unverdicted · none · ref 4
A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.

Advances in Neural Information Processing Systems , pages=

fields

years

verdicts

representative citing papers

citing papers explorer