pith. sign in
Pith Number

pith:UYBVMRVF

pith:2026:UYBVMRVFJBFVCGNBWMPAWXS2KE
not attested not anchored not stored refs resolved

Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models

Faezeh Ghaderi, Mahdi Naser Moghadasi

Benchmark of 118 transformers shows performance walls where success drops to zero at 2048 tokens.

arxiv:2605.15413 v1 · 2026-05-14 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UYBVMRVFJBFVCGNBWMPAWXS2KE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our findings challenge prevailing scaling assumptions and provide the first quantitative evidence that the theoretical O(n2) attention complexity translates into measurable performance walls.

C2weakest assumption

The 118 models across seven architectural categories and the chosen sequence lengths from 128 to 2048 tokens are representative enough to reveal fundamental, generalizable performance walls in modern transformers rather than artifacts of specific model selection or hardware.

C3one line summary

Empirical tests on 118 transformers show success falling from 88.1% at 512 tokens to 0% at 2048 tokens, with compressed models achieving 649.2 tokens/sec/M parameters versus 12.5 for large generative ones.

References

64 extracted · 64 resolved · 24 Pith anchors

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, ”Attention is all you need,” inAdvances in neural information processing systems, 2017, pp. 5998–6 2017
[2] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ”BERT: Pre- training of deep bidirectional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapte 2019
[3] Brown et al., ”Language models are few-shot learners,” inAdvances in neural information processing systems, vol 2020
[4] Dosovitskiy et al., ”An image is worth 16x16 words: Transformers for image recognition at scale,” inInternational Conference on Learning Representations, 2021 2021
[5] Longformer: The Long-Document Transformer 2004 · arXiv:2004.05150

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:57.337812Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a6035646a5484b5119a1b31e0b5e5a51057ca475398611b6d32a8e45996674eb

Aliases

arxiv: 2605.15413 · arxiv_version: 2605.15413v1 · doi: 10.48550/arxiv.2605.15413 · pith_short_12: UYBVMRVFJBFV · pith_short_16: UYBVMRVFJBFVCGNB · pith_short_8: UYBVMRVF
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UYBVMRVFJBFVCGNBWMPAWXS2KE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a6035646a5484b5119a1b31e0b5e5a51057ca475398611b6d32a8e45996674eb
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2a49b079418a93f3424ec125972eee8d6d1fc01a7eb9d5bcaa43878bfbeccf16",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T20:57:15Z",
    "title_canon_sha256": "7e7b75926d1ea4522798d8adfe560b6d520f07bc8b45b1761ef3b75fd6f0f744"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15413",
    "kind": "arxiv",
    "version": 1
  }
}