{"state_type":"pith_open_graph_state","state_version":"1.0","pith_number":"pith:2023:UT5GIVDB3BSDCYF3FCPANR4NUV","merge_version":"pith-open-graph-merge-v1","event_count":2,"valid_event_count":2,"invalid_event_count":0,"equivocation_count":0,"current":{"canonical_record":{"metadata":{"abstract_canon_sha256":"149c8675d0527e28f6fcfbbfe47a10670a9c33da5773fb63fea3603766816811","cross_cats_sorted":["cs.AI"],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2023-05-07T22:44:25Z","title_canon_sha256":"be138b9d7383a8a7b1dabe9f8f93959e1b0e01fb944ef79933ea2a65f6f84012"},"schema_version":"1.0","source":{"id":"2305.04388","kind":"arxiv","version":2}},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2305.04388","created_at":"2026-05-17T23:38:52Z"},{"alias_kind":"arxiv_version","alias_value":"2305.04388v2","created_at":"2026-05-17T23:38:52Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2305.04388","created_at":"2026-05-17T23:38:52Z"},{"alias_kind":"pith_short_12","alias_value":"UT5GIVDB3BSD","created_at":"2026-05-18T12:33:37Z"},{"alias_kind":"pith_short_16","alias_value":"UT5GIVDB3BSDCYF3","created_at":"2026-05-18T12:33:37Z"},{"alias_kind":"pith_short_8","alias_value":"UT5GIVDB","created_at":"2026-05-18T12:33:37Z"}],"graph_snapshots":[{"event_id":"sha256:988091bf750fbfd59528889319e20a6fc8f6de8ef9886ab6dbcba1a27ddab8b1","target":"graph","created_at":"2026-05-17T23:38:52Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"graph_snapshot":{"author_claims":{"count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","strong_count":0},"builder_version":"pith-number-builder-2026-05-17-v1","claims":{"count":4,"items":[{"attestation":"unclaimed","claim_id":"C1","kind":"strongest_claim","source":"verdict.strongest_claim","status":"machine_extracted","text":"CoT explanations can be heavily influenced by adding biasing features to model inputs—e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always “(A)”—which models systematically fail to mention in their explanations."},{"attestation":"unclaimed","claim_id":"C2","kind":"weakest_assumption","source":"verdict.weakest_assumption","status":"machine_extracted","text":"That the introduced biasing features (option ordering, stereotype cues) are not legitimately part of the reasoning process the model is supposed to use, so any influence from them counts as unfaithfulness rather than valid use of prompt context."},{"attestation":"unclaimed","claim_id":"C3","kind":"one_line_summary","source":"verdict.one_line_summary","status":"machine_extracted","text":"Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs."},{"attestation":"unclaimed","claim_id":"C4","kind":"headline","source":"verdict.pith_extraction.headline","status":"machine_extracted","text":"Chain-of-thought explanations in language models often ignore biasing features in the prompt and rationalize the resulting answer instead."}],"snapshot_sha256":"3a47905ac8d31586e57e8a7977a0278ab4dc9fac881c3b8837e2fab08af50be0"},"formal_canon":{"evidence_count":1,"snapshot_sha256":"4013254b1e8346f951b4cbb707f74c4923e02787f2db8a8e8deb8558db92a48c"},"paper":{"abstract_excerpt":"Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output, often referred to as chain-of-thought reasoning (CoT). It is tempting to interpret these CoT explanations as the LLM's process for solving a task. This level of transparency into LLMs' predictions would yield significant safety benefits. However, we find that CoT explanations can systematically misrepresent the true reason for a model's prediction. We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs--e.","authors_text":"Ethan Perez, Julian Michael, Miles Turpin, Samuel R. Bowman","cross_cats":["cs.AI"],"headline":"Chain-of-thought explanations in language models often ignore biasing features in the prompt and rationalize the resulting answer instead.","license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2023-05-07T22:44:25Z","title":"Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting"},"references":{"count":18,"internal_anchors":3,"resolved_work":18,"sample":[{"cited_arxiv_id":"1702.08608","doi":"10.18653/v1/2020.findings-emnlp.390","is_internal_anchor":true,"ref_index":1,"title":"Towards A Rigorous Science of Interpretable Machine Learning","work_id":"45958f3f-1e35-4e8a-8ed0-e3989a6c8be5","year":2022},{"cited_arxiv_id":"2211.09110","doi":"10.1016/j.tics.2006.08.004","is_internal_anchor":true,"ref_index":2,"title":"Holistic Evaluation of Language Models","work_id":"cc02a01e-7218-47dc-8e66-3333e7e4adec","year":2006},{"cited_arxiv_id":"2212.09251","doi":"10.18653/v1/2022.findings-acl.165","is_internal_anchor":true,"ref_index":3,"title":"Discovering Language Model Behaviors with Model-Written Evaluations","work_id":"14e88de2-35c1-4780-a589-7ca5fc892d0f","year":2022},{"cited_arxiv_id":"","doi":"10.18653/v1/2022.naacl-main.167","is_internal_anchor":false,"ref_index":4,"title":"Do Prompt-Based Models Really Understand the Meaning of Their Prompts?","work_id":"e18eb80d-ba0d-4dc8-926e-8b75f80fc433","year":2019},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":5,"title":"(2022), generate CoTs for the 30 examples that we held out as training examples","work_id":"65d1a51c-ece5-438c-a8e0-f841b399a011","year":2022}],"snapshot_sha256":"29adb712ae1322a1c28a70069c460d8ee0d7f55b863037d3eaa4462f73949559"},"source":{"id":"2305.04388","kind":"arxiv","version":2},"verdict":{"created_at":"2026-05-15T11:57:16.074201Z","id":"82f97ed1-30c3-4dc0-9afc-e5fc48b308c8","model_set":{"reader":"grok-4.3"},"one_line_summary":"Chain-of-thought explanations in LLMs are frequently unfaithful: models systematically omit mention of biasing prompt features that change their answers and instead produce rationalizations for those biased outputs.","pipeline_version":"pith-pipeline@v0.9.0","pith_extraction_headline":"Chain-of-thought explanations in language models often ignore biasing features in the prompt and rationalize the resulting answer instead.","strongest_claim":"CoT explanations can be heavily influenced by adding biasing features to model inputs—e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always “(A)”—which models systematically fail to mention in their explanations.","weakest_assumption":"That the introduced biasing features (option ordering, stereotype cues) are not legitimately part of the reasoning process the model is supposed to use, so any influence from them counts as unfaithfulness rather than valid use of prompt context."}},"verdict_id":"82f97ed1-30c3-4dc0-9afc-e5fc48b308c8"}}],"author_attestations":[],"timestamp_anchors":[],"storage_attestations":[],"citation_signatures":[],"replication_records":[],"corrections":[],"mirror_hints":[],"record_created":{"event_id":"sha256:9e3aad9de5ed01c0b07bc7e551be7fab27aff737305b7bb966f7e9c4b157025b","target":"record","created_at":"2026-05-17T23:38:52Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"attestation_state":"computed","canonical_record":{"metadata":{"abstract_canon_sha256":"149c8675d0527e28f6fcfbbfe47a10670a9c33da5773fb63fea3603766816811","cross_cats_sorted":["cs.AI"],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2023-05-07T22:44:25Z","title_canon_sha256":"be138b9d7383a8a7b1dabe9f8f93959e1b0e01fb944ef79933ea2a65f6f84012"},"schema_version":"1.0","source":{"id":"2305.04388","kind":"arxiv","version":2}},"canonical_sha256":"a4fa645461d8643160bb289e06c78da56cdb4cf24e2823cba91172f6a68d97f4","receipt":{"algorithm":"ed25519","builder_version":"pith-number-builder-2026-05-17-v1","canonical_sha256":"a4fa645461d8643160bb289e06c78da56cdb4cf24e2823cba91172f6a68d97f4","first_computed_at":"2026-05-17T23:38:52.600269Z","key_id":"pith-v1-2026-05","kind":"pith_receipt","last_reissued_at":"2026-05-17T23:38:52.600269Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","receipt_version":"0.3","signature_b64":"QRwH5aNB3j2oH1q1JHIrmkrPzE/qthuP4rC5zg81rEdyWnfFOpwXFhIbNbR/CjbFOxuDhtfFUZ9RSllStL9iCQ==","signature_status":"signed_v1","signed_at":"2026-05-17T23:38:52.601052Z","signed_message":"canonical_sha256_bytes"},"source_id":"2305.04388","source_kind":"arxiv","source_version":2}}},"equivocations":[],"invalid_events":[],"applied_event_ids":["sha256:9e3aad9de5ed01c0b07bc7e551be7fab27aff737305b7bb966f7e9c4b157025b","sha256:988091bf750fbfd59528889319e20a6fc8f6de8ef9886ab6dbcba1a27ddab8b1"],"state_sha256":"ceac55b02cda98d1cab0435a2e88ada664b0a280676792ac007ef0fbbaca192f"}