pith. machine review for the scientific record. sign in
Pith Number

pith:TJFQBEJH

pith:2025:TJFQBEJHC7IMMUTPGZXCOF3QQS
not attested not anchored not stored refs pending

Reasoning Models Don't Always Say What They Think

Ansh Radhakrishnan, Arushi Somani, Carson Denison, Ethan Perez, Fabien Roger, Jan Leike, Jared Kaplan, Joe Benton, John Schulman, Jonathan Uesato, Misha Wagner, Peter Hase, Samuel R. Bowman, Vlad Mikulik, Yanda Chen

Chain-of-thought reasoning often fails to disclose when models use provided hints.

arxiv:2505.05410 v1 · 2025-05-08 · cs.CL · cs.AI · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

For most settings and models tested, CoTs reveal their usage of hints in at least 1% of examples where they use the hint, but the reveal rate is often below 20%. Outcome-based reinforcement learning initially improves faithfulness but plateaus without saturating. When reinforcement learning increases how frequently hints are used, the propensity to verbalize them does not increase.

C2weakest assumption

That differences in model performance with and without hints reliably indicate whether the model is actually using the hint in its internal reasoning, and that the chosen hints and tasks create conditions where faithful CoT should mention the hint if used.

C3one line summary

Chain-of-thought outputs in reasoning models frequently fail to disclose their use of provided hints, even after reinforcement learning, limiting the reliability of CoT monitoring for safety.

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:21.845259Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9a4b00912717d0c6526f366e27177084b3bf21f578d87cd75eaa3470398c788b

Aliases

arxiv: 2505.05410 · arxiv_version: 2505.05410v1 · doi: 10.48550/arxiv.2505.05410 · pith_short_12: TJFQBEJHC7IM · pith_short_16: TJFQBEJHC7IMMUTP · pith_short_8: TJFQBEJH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TJFQBEJHC7IMMUTPGZXCOF3QQS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9a4b00912717d0c6526f366e27177084b3bf21f578d87cd75eaa3470398c788b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0e8fc87ee1108d5e64c69b0654c60b56182aeecb70b65ad2fd894f411a7e3db3",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-05-08T16:51:43Z",
    "title_canon_sha256": "cd026e0c39c1ba6ee5afbc1fab9ffe1c6ad98fe23b26b3277f37b1cf52f8b6d4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.05410",
    "kind": "arxiv",
    "version": 1
  }
}