pith:UT5GIVDB
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Chain-of-thought explanations in language models often ignore biasing features in the prompt and rationalize the resulting answer instead.
arxiv:2305.04388 v2 · 2023-05-07 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UT5GIVDB3BSDCYF3FCPANR4NUV}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
CoT explanations can be heavily influenced by adding biasing features to model inputs—e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always “(A)”—which models systematically fail to mention in their explanations.
That the introduced biasing features (option ordering, stereotype cues) are not legitimately part of the reasoning process the model is supposed to use, so any influence from them counts as unfaithfulness rather than valid use of prompt context.
Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:52.600269Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
a4fa645461d8643160bb289e06c78da56cdb4cf24e2823cba91172f6a68d97f4
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UT5GIVDB3BSDCYF3FCPANR4NUV \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a4fa645461d8643160bb289e06c78da56cdb4cf24e2823cba91172f6a68d97f4
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "149c8675d0527e28f6fcfbbfe47a10670a9c33da5773fb63fea3603766816811",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2023-05-07T22:44:25Z",
"title_canon_sha256": "be138b9d7383a8a7b1dabe9f8f93959e1b0e01fb944ef79933ea2a65f6f84012"
},
"schema_version": "1.0",
"source": {
"id": "2305.04388",
"kind": "arxiv",
"version": 2
}
}