pith:JNCKSCVR
The Efficiency Gap in Byte Modeling
Byte modeling incurs a larger scaling penalty under masked diffusion than under autoregressive training because diffusion destroys local byte contiguity.
arxiv:2605.12928 v1 · 2026-05-13 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JNCKSCVRQUIZRDLKLYJ4Q2TAM7}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
the performance penalty of byte modeling is not uniform; across scale, the scaling overhead of byte modeling is worse for MDM than for AR. We hypothesize that this disparity stems from context fragility: while AR's stable causal history allows models to naturally rediscover subword patterns, the MDM objective destroys the local contiguity required to efficiently resolve semantics from raw bytes.
that the observed scaling disparity is caused by context fragility in MDM rather than differences in how compute is allocated or other unmeasured factors in the experimental setup.
Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.
References
Receipt and verification
| First computed | 2026-05-18T03:09:10.006060Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
4b44a90ab18511988d6a5e13c86a6067cb7cb4711e9a45a747f462dfc8dd4538
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JNCKSCVRQUIZRDLKLYJ4Q2TAM7 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4b44a90ab18511988d6a5e13c86a6067cb7cb4711e9a45a747f462dfc8dd4538
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "d420a61b1fa40c89431cc362687a93720712c43b755aba16ae7b670fe245945f",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-13T03:03:30Z",
"title_canon_sha256": "8b768de3edb5f072d26865a3fd76fd1ed1a7cfcd9abac425c11eadd03e3c719d"
},
"schema_version": "1.0",
"source": {
"id": "2605.12928",
"kind": "arxiv",
"version": 1
}
}