pith. sign in
Pith Number

pith:XWKVJIXS

pith:2025:XWKVJIXSLVYNKHWXKRVTXQIOU2
not attested not anchored not stored refs resolved

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Iman Mirzadeh, Keivan Alizadeh, Maxwell Horton, Mehrdad Farajtabar, Parshin Shojaee, Samy Bengio

Large Reasoning Models exhibit complete accuracy collapse beyond certain complexities and reduce reasoning effort despite available compute.

arxiv:2506.06941 v3 · 2025-06-07 · cs.AI · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XWKVJIXSLVYNKHWXKRVTXQIOU2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counterintuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having remaining token budget.

C2weakest assumption

That the chosen controllable puzzle environments provide an unbiased and generalizable measure of reasoning complexity without introducing artifacts that do not appear in other domains such as math or coding.

C3one line summary

LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.

References

55 extracted · 55 resolved · 12 Pith anchors

[1] OpenAI o1 System Card 2024 · arXiv:2412.16720
[2] Introducing openai o1 2024
[3] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948
[4] Claude 3.7 sonnet 2025
[5] Gemini flash thinking.Google AI Blog, Jan 2025 2025

Formal links

2 machine-checked theorem links

Cited by

34 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.945787Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

bd9554a2f25d70d51ed7546b3bc10ea6987dd4cbd948aa53f779b964a512b7c5

Aliases

arxiv: 2506.06941 · arxiv_version: 2506.06941v3 · doi: 10.48550/arxiv.2506.06941 · pith_short_12: XWKVJIXSLVYN · pith_short_16: XWKVJIXSLVYNKHWX · pith_short_8: XWKVJIXS
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XWKVJIXSLVYNKHWXKRVTXQIOU2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bd9554a2f25d70d51ed7546b3bc10ea6987dd4cbd948aa53f779b964a512b7c5
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "8a45099f14d045accff594ca13ca08c77d46017efad9a353a561b48d2641f330",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-06-07T22:42:29Z",
    "title_canon_sha256": "a0d32bd599754e05eb9948d06ed7aed1b2cdac8f3f64203a8c1b4e2a57a86a6c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2506.06941",
    "kind": "arxiv",
    "version": 3
  }
}