pith. sign in
Pith Number

pith:KJCQ2R4N

pith:2026:KJCQ2R4N4WN7M6Y3KFAO4MYIW7
not attested not anchored not stored refs resolved

The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies

Gabriel Garcia

Standard chain-of-thought corruption tests measure the placement of the final answer rather than the importance of reasoning steps.

arxiv:2605.10799 v2 · 2026-05-11 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KJCQ2R4N4WN7M6Y3KFAO4MYIW7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

when benchmark chains end with an explicit terminal answer line, as in GSM8K and MATH, these tests largely measure answer placement rather than where intermediate computation is carried out.

C2weakest assumption

The observed suffix sensitivity and conflicting-answer following arise primarily from consumption-time format following rather than from any early commitment during generation or from the intrinsic computational structure of the reasoning.

C3one line summary

Corruption studies of CoT faithfulness largely measure explicit answer placement in prompt format rather than computational importance of reasoning steps.

References

19 extracted · 19 resolved · 4 Pith anchors

[1] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Process 2022
[2] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Rep 2023
[3] M. Turpin, J. Michael, E. Perez, and S. R. Bowman. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. InAdvances in Neural Information Processing 2023
[4] Measuring Faithfulness in Chain-of-Thought Reasoning 2023 · arXiv:2307.13702
[5] Let’s think dot by dot: Hidden computa- tion in transformer language models 2024
Receipt and verification
First computed 2026-05-20T00:00:42.491257Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

52450d478de59bf67b1b5140ee3308b7dcac23395a2c8fa2742953cac4d0c2c8

Aliases

arxiv: 2605.10799 · arxiv_version: 2605.10799v2 · doi: 10.48550/arxiv.2605.10799 · pith_short_12: KJCQ2R4N4WN7 · pith_short_16: KJCQ2R4N4WN7M6Y3 · pith_short_8: KJCQ2R4N
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KJCQ2R4N4WN7M6Y3KFAO4MYIW7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 52450d478de59bf67b1b5140ee3308b7dcac23395a2c8fa2742953cac4d0c2c8
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "58edf072a66f1baec8b9252ff76fbd4997a9dcc0fa40b93a7c13ec3f0c2e4013",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-11T16:26:50Z",
    "title_canon_sha256": "11d44a227dbb1a2a36f79dfa2fc0a9706db7ea94f5beb3211090baaf69553d5f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.10799",
    "kind": "arxiv",
    "version": 2
  }
}