pith. sign in
Pith Number

pith:JNCKSCVR

pith:2026:JNCKSCVRQUIZRDLKLYJ4Q2TAM7
not attested not anchored not stored refs resolved

The Efficiency Gap in Byte Modeling

Alexander M. Rush, Celine Lee, Chen Liang, Derek Cheng, Ed Chi, Fernando Pereira, Jeremiah Liu, Jiaxin Shi, Jing Nathan Yan, Pengcheng Yin, Ruoxi Wang, Yin Zhang

Byte modeling incurs a larger scaling penalty under masked diffusion than under autoregressive training because diffusion destroys local byte contiguity.

arxiv:2605.12928 v1 · 2026-05-13 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JNCKSCVRQUIZRDLKLYJ4Q2TAM7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

the performance penalty of byte modeling is not uniform; across scale, the scaling overhead of byte modeling is worse for MDM than for AR. We hypothesize that this disparity stems from context fragility: while AR's stable causal history allows models to naturally rediscover subword patterns, the MDM objective destroys the local contiguity required to efficiently resolve semantics from raw bytes.

C2weakest assumption

that the observed scaling disparity is caused by context fragility in MDM rather than differences in how compute is allocated or other unmeasured factors in the experimental setup.

C3one line summary

Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.

References

51 extracted · 51 resolved · 3 Pith anchors

[1] Adapters for Altering 2025
[2] Arnaud Pannatier and Evann Courdier and François Fleuret , year=. 2404.09562 , archivePrefix=
[3] Shkarin, D. A. , title =. Problems of Information Transmission , year =
[4] 2024 , howpublished = 2024
[5] Kucherawy , title = 2021
Receipt and verification
First computed 2026-05-18T03:09:10.006060Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4b44a90ab18511988d6a5e13c86a6067cb7cb4711e9a45a747f462dfc8dd4538

Aliases

arxiv: 2605.12928 · arxiv_version: 2605.12928v1 · doi: 10.48550/arxiv.2605.12928 · pith_short_12: JNCKSCVRQUIZ · pith_short_16: JNCKSCVRQUIZRDLK · pith_short_8: JNCKSCVR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JNCKSCVRQUIZRDLKLYJ4Q2TAM7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4b44a90ab18511988d6a5e13c86a6067cb7cb4711e9a45a747f462dfc8dd4538
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "d420a61b1fa40c89431cc362687a93720712c43b755aba16ae7b670fe245945f",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T03:03:30Z",
    "title_canon_sha256": "8b768de3edb5f072d26865a3fd76fd1ed1a7cfcd9abac425c11eadd03e3c719d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12928",
    "kind": "arxiv",
    "version": 1
  }
}