pith:TJFQBEJH
Reasoning Models Don't Always Say What They Think
Chain-of-thought reasoning often fails to disclose when models use provided hints.
arxiv:2505.05410 v1 · 2025-05-08 · cs.CL · cs.AI · cs.LG
Record completeness
Claims
For most settings and models tested, CoTs reveal their usage of hints in at least 1% of examples where they use the hint, but the reveal rate is often below 20%. Outcome-based reinforcement learning initially improves faithfulness but plateaus without saturating. When reinforcement learning increases how frequently hints are used, the propensity to verbalize them does not increase.
That differences in model performance with and without hints reliably indicate whether the model is actually using the hint in its internal reasoning, and that the chosen hints and tasks create conditions where faithful CoT should mention the hint if used.
Chain-of-thought outputs in reasoning models frequently fail to disclose their use of provided hints, even after reinforcement learning, limiting the reliability of CoT monitoring for safety.
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:21.845259Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
9a4b00912717d0c6526f366e27177084b3bf21f578d87cd75eaa3470398c788b
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TJFQBEJHC7IMMUTPGZXCOF3QQS \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9a4b00912717d0c6526f366e27177084b3bf21f578d87cd75eaa3470398c788b
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "0e8fc87ee1108d5e64c69b0654c60b56182aeecb70b65ad2fd894f411a7e3db3",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-05-08T16:51:43Z",
"title_canon_sha256": "cd026e0c39c1ba6ee5afbc1fab9ffe1c6ad98fe23b26b3277f37b1cf52f8b6d4"
},
"schema_version": "1.0",
"source": {
"id": "2505.05410",
"kind": "arxiv",
"version": 1
}
}