pith:YK5CILY5
Compute Optimal Tokenization
In compute-optimal regimes, language model parameter counts scale with the byte volume of data rather than the number of tokens.
arxiv:2605.01188 v2 · 2026-05-02 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YK5CILY5R3OBKUTLDZRQ6XEY2S}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
in compute-optimal configurations, model parameter counts scale proportionally to data size measured in bytes, not in tokens as commonly perceived (Kaplan et al., 2020; Hoffmann et al., 2022)
That the behavior of latent tokenized BLT models generalizes to standard subword tokenizers and that the observed scaling trends extend beyond the tested range up to 7B parameters.
Compute-optimal language models require parameter count to scale with data bytes rather than tokens, with optimal token compression rate decreasing as compute budget grows.
Cited by
Receipt and verification
| First computed | 2026-05-27T02:06:14.186783Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c2ba242f1d8edc15526b1e630f5c98d48a21f4d659242a900ec11432447ec65e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YK5CILY5R3OBKUTLDZRQ6XEY2S \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c2ba242f1d8edc15526b1e630f5c98d48a21f4d659242a900ec11432447ec65e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "e7fe2f567d44a77bb36cdea77cec8567704cde4940c5a000ac85c1215ff375d4",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-02T01:53:22Z",
"title_canon_sha256": "2ba189736d0694a11dca2c844800d19f7f4888957472955e501cd91a3fcce290"
},
"schema_version": "1.0",
"source": {
"id": "2605.01188",
"kind": "arxiv",
"version": 2
}
}