pith:DKVJN7NY
Robust Reasoning Benchmark
Open-weight reasoning models lose up to 55 percent accuracy when AIME problems receive 14 simple text perturbations.
arxiv:2604.08571 v2 · 2026-03-26 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DKVJN7NY3VJINP6R22LISMYHOS}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
open weights reasoning models suffer catastrophic collapses (up to 55% average accuracy drops across perturbations and up to 100% on some), exposing structural fragility. ... intermediate reasoning steps permanently pollute standard dense attention mechanisms.
That the 14 perturbations preserve the underlying mathematical content and difficulty so that accuracy drops can be attributed specifically to reasoning or parsing failures rather than altered problem solvability.
Perturbations to math problem text cause up to 55% average accuracy drops in open-weight LLMs and sequential solving reveals context pollution in attention mechanisms.
Formal links
Receipt and verification
| First computed | 2026-05-22T01:03:19.298591Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
1aaa96fdb8dd5286bfd1d69689330774a87df6cff0f040971799b6d507e39b3a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DKVJN7NY3VJINP6R22LISMYHOS \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1aaa96fdb8dd5286bfd1d69689330774a87df6cff0f040971799b6d507e39b3a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "5f4b4257e5fef207bee4cc639ca93ecbb7cbd1d84d957c58fbf00cf4e1cd5ace",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-03-26T22:19:33Z",
"title_canon_sha256": "1506d1d446ef8dad258b04c28e5f6f006c00c5e44e54fb5ae902ba7b52b0030b"
},
"schema_version": "1.0",
"source": {
"id": "2604.08571",
"kind": "arxiv",
"version": 2
}
}