← back to paper
arxiv: 2605.08541 · 2 revisions
Tokens-per-Parameter Coverage Is Critical for Robust LLM Scaling Law Extrapolation