pith. sign in
Pith Number

pith:272IFNPD

pith:2026:272IFNPDMQLGMDYCG4I4OQB64T
not attested not anchored not stored refs resolved

Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

Amirmehdi Jafari Fesharaki, Aslan Tchamkerten, Mohammadamin Rami

Fragmentation into smaller units can strictly raise the minimal log-loss achievable by any finite-context transformer on Markov sources.

arxiv:2605.13485 v1 · 2026-05-13 · cs.LG · cs.CL · cs.IT · math.IT

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{272IFNPDMQLGMDYCG4I4OQB64T}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We prove that fragmentation can strictly increase the optimal finite-context log-loss, showing that the gap is not merely an optimization or capacity issue, but can be intrinsic to the representation.

C2weakest assumption

The analysis assumes data generated by Markov sources; the strict increase and loss guarantees may not hold for the long-range dependencies and non-stationarities present in natural language.

C3one line summary

Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.

References

44 extracted · 44 resolved · 3 Pith anchors

[1] Bellard, Fabrice , year = 2021, month = feb, langid = 2021
[2] International Conference on Learning Representations , year =
[3] International Conference on Learning Representations , year =
[4] Bondaschi, Marco and Rajaraman, Nived and Wei, Xiuying and Pascanu, Razvan and Gulcehre, Caglar and Gastpar, Michael and Makkuva, Ashok Vardhan , year = 2025, month = oct, urldate =. From. The 2025
[5] 2019 , url = 2019
Receipt and verification
First computed 2026-05-18T02:44:41.285062Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d7f482b5e36416660f023711c7403ee4d29d6ceb60869baa189e7974f238600c

Aliases

arxiv: 2605.13485 · arxiv_version: 2605.13485v1 · doi: 10.48550/arxiv.2605.13485 · pith_short_12: 272IFNPDMQLG · pith_short_16: 272IFNPDMQLGMDYC · pith_short_8: 272IFNPD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/272IFNPDMQLGMDYCG4I4OQB64T \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d7f482b5e36416660f023711c7403ee4d29d6ceb60869baa189e7974f238600c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "30a2248cf5781ee889e8b1b66c56a97b161ec89496548afa1ee2acbfafbb6aa9",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.IT",
      "math.IT"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T13:08:08Z",
    "title_canon_sha256": "b636134e6829766c6fe58bd9f1d0d22d17bfda0e5d4152856a6426c74be25076"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13485",
    "kind": "arxiv",
    "version": 1
  }
}