{"paper":{"title":"Effective Context in Transformers: An Analysis of Fragmentation and Tokenization","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Fragmentation into smaller units can strictly raise the minimal log-loss achievable by any finite-context transformer on Markov sources.","cross_cats":["cs.CL","cs.IT","math.IT"],"primary_cat":"cs.LG","authors_text":"Amirmehdi Jafari Fesharaki, Aslan Tchamkerten, Mohammadamin Rami","submitted_at":"2026-05-13T13:08:08Z","abstract_excerpt":"Transformers predict over a representation of a sequence. The same data can be written as bytes, characters, or subword tokens, and these representations may be lossless. Yet, under a fixed context window, they need not expose the same information to the model. This raises a basic question: how does the choice of representation change what a finite-context predictor can achieve?\n  We study this question on Markov sources and uncover two complementary phenomena. First, we observe that moving to smaller representation units can hurt prediction even when the context window is enlarged to cover th"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We prove that fragmentation can strictly increase the optimal finite-context log-loss, showing that the gap is not merely an optimization or capacity issue, but can be intrinsic to the representation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The analysis assumes data generated by Markov sources; the strict increase and loss guarantees may not hold for the long-range dependencies and non-stationarities present in natural language.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Fragmentation into smaller units can strictly raise the minimal log-loss achievable by any finite-context transformer on Markov sources.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"890adea31d40fc1283f13d4edce72743c750a5fc5c7bd7e6346ceedb6c595d02"},"source":{"id":"2605.13485","kind":"arxiv","version":1},"verdict":{"id":"fa69c1b0-8880-47e3-a79c-bd422a57e14c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:36:33.818963Z","strongest_claim":"We prove that fragmentation can strictly increase the optimal finite-context log-loss, showing that the gap is not merely an optimization or capacity issue, but can be intrinsic to the representation.","one_line_summary":"Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The analysis assumes data generated by Markov sources; the strict increase and loss guarantees may not hold for the long-range dependencies and non-stationarities present in natural language.","pith_extraction_headline":"Fragmentation into smaller units can strictly raise the minimal log-loss achievable by any finite-context transformer on Markov sources."},"references":{"count":44,"sample":[{"doi":"","year":2021,"title":"Bellard, Fabrice , year = 2021, month = feb, langid =","work_id":"2f649c25-8933-4296-acd7-5d408ba411bf","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"International Conference on Learning Representations , year =","work_id":"27e7743d-1296-4c1c-a762-761468888f0b","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"International Conference on Learning Representations , year =","work_id":"7256ff81-23f1-44b4-bba3-3caaa5983ac5","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Bondaschi, Marco and Rajaraman, Nived and Wei, Xiuying and Pascanu, Razvan and Gulcehre, Caglar and Gastpar, Michael and Makkuva, Ashok Vardhan , year = 2025, month = oct, urldate =. From. The","work_id":"8c4acaea-a4c8-4f83-be53-0a2a4a88147b","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"2019 , url =","work_id":"a52fc5f7-dc7e-4eee-a962-ed14339b2ad9","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":44,"snapshot_sha256":"9e4580c2cc92e3fefd7d0f0ce6e0c85a9050f2e2a52cdf212e3ef619a3bff567","internal_anchors":3},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}