Pith Number

pith:UT5GIVDB

pith:2023:UT5GIVDB3BSDCYF3FCPANR4NUV

not attested not anchored not stored refs resolved

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Ethan Perez, Julian Michael, Miles Turpin, Samuel R. Bowman

Chain-of-thought explanations in language models often ignore biasing features in the prompt and rationalize the resulting answer instead.

arxiv:2305.04388 v2 · 2023-05-07 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{UT5GIVDB3BSDCYF3FCPANR4NUV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CoT explanations can be heavily influenced by adding biasing features to model inputs—e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always “(A)”—which models systematically fail to mention in their explanations.

C2weakest assumption

That the introduced biasing features (option ordering, stereotype cues) are not legitimately part of the reasoning process the model is supposed to use, so any influence from them counts as unfaithfulness rather than valid use of prompt context.

C3one line summary

Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.

References

18 extracted · 18 resolved · 3 Pith anchors

[1] Towards A Rigorous Science of Interpretable Machine Learning 2022 · doi:10.18653/v1/2020.findings-emnlp.390

[2] Holistic Evaluation of Language Models 2006 · doi:10.1016/j.tics.2006.08.004

[3] Discovering Language Model Behaviors with Model-Written Evaluations 2022 · doi:10.18653/v1/2022.findings-acl.165

[4] Do Prompt-Based Models Really Understand the Meaning of Their Prompts? 2019 · doi:10.18653/v1/2022.naacl-main.167

[5] (2022), generate CoTs for the 30 examples that we held out as training examples 2022

Formal links

1 machine-checked theorem link

Cited by

33 papers in Pith

An Overview of Catastrophic AI Risks

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

Evaluating the False Trust Engendered by LLM Explanations

Counterfactual Likelihood Tests for Indirect Influence in Private Reasoning Channels

Receipt and verification

First computed	2026-05-17T23:38:52.600269Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

a4fa645461d8643160bb289e06c78da56cdb4cf24e2823cba91172f6a68d97f4

Aliases

arxiv: 2305.04388 · arxiv_version: 2305.04388v2 · doi: 10.48550/arxiv.2305.04388 · pith_short_12: UT5GIVDB3BSD · pith_short_16: UT5GIVDB3BSDCYF3 · pith_short_8: UT5GIVDB

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/UT5GIVDB3BSDCYF3FCPANR4NUV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a4fa645461d8643160bb289e06c78da56cdb4cf24e2823cba91172f6a68d97f4

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "149c8675d0527e28f6fcfbbfe47a10670a9c33da5773fb63fea3603766816811",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-05-07T22:44:25Z",
    "title_canon_sha256": "be138b9d7383a8a7b1dabe9f8f93959e1b0e01fb944ef79933ea2a65f6f84012"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2305.04388",
    "kind": "arxiv",
    "version": 2
  }
}