← back to paper
arxiv: 2605.12715 · 2 revisions
Scaling Laws for Mixture Pretraining Under Data Constraints