pith:MT7WG7D5
When is Warmstarting Effective for Scaling Language Models?
A 2x growth factor from smaller checkpoints reliably speeds language model convergence, but an upper bound on growth factor makes training from scratch more efficient beyond it.
arxiv:2605.13405 v1 · 2026-05-13 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MT7WG7D53RYAIFYLJRKF6PV3O7}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
a 2× growth factor is the most reliable in yielding convergence speedups, with gains most pronounced under 20 tokens/parameter budgets and diminishing as budget increases. We empirically identify an upper bound on the growth factor g beyond which training from scratch is more efficient.
The observed upper bound on growth factor and the superiority of simple growth operators generalize beyond the tested dense MLPs and dense language models to other architectures and training regimes.
A 2x growth factor in model warmstarting yields reliable training speedups for language models under 20 tokens/parameter budgets, with an empirical upper bound on effective growth factors.
References
Receipt and verification
| First computed | 2026-05-18T02:44:47.534177Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
64ff637c7ddc7004170b4c545f3ebb77d0c1f00b74fe6ffd1c8e65085cfdb7c3
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MT7WG7D53RYAIFYLJRKF6PV3O7 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 64ff637c7ddc7004170b4c545f3ebb77d0c1f00b74fe6ffd1c8e65085cfdb7c3
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "94ea31b292acf63c6846c1dde73b4d1eac980cee696388925b18d4db710c3abf",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-13T12:00:11Z",
"title_canon_sha256": "10000b5d09405667967c5c43b9627937f3e43d26401cdc3cb30fd1c9741306ab"
},
"schema_version": "1.0",
"source": {
"id": "2605.13405",
"kind": "arxiv",
"version": 1
}
}