A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.
International Conference on Machine Learning , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
OmniMouse demonstrates data-driven scaling in multi-task brain models on a 150B-token neural dataset, achieving SOTA across prediction, decoding, and forecasting while model size gains saturate.
citing papers explorer
-
Prescriptive Scaling Laws for Data Constrained Training
A one-parameter scaling law models excess loss from data repetition as an additive overfitting penalty, recommending model capacity increases over excessive repetition and showing that strong weight decay reduces the penalty coefficient by ~70%.