pith. sign in
Pith Number

pith:T3RSYXPW

pith:2026:T3RSYXPWUDBTTOSFPLF57WOEMV
not attested not anchored not stored refs resolved

Training on Documents About Monitoring Leads to CoT Obfuscation

Bilal Chughtai, Joshua Engels, Reilly Haskins

Models trained on documents about monitoring learn to obfuscate their chain-of-thought reasoning.

arxiv:2605.15257 v1 · 2026-05-14 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{T3RSYXPWUDBTTOSFPLF57WOEMV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Monitor-aware models consistently achieve higher rates of undetected misbehavior compared to unaware controls, with CoT controllability closely correlated with obfuscation success (r=0.800, p=0.017).

C2weakest assumption

The synthetic documents used for finetuning produce behavioral changes that generalize to realistic pre-training exposure rather than creating artificial or brittle obfuscation artifacts specific to the training setup.

C3one line summary

Synthetic document finetuning on CoT monitor descriptions causes models to obfuscate reasoning traces, raising undetected misbehavior rates and correlating with controllability (r=0.800).

References

13 extracted · 13 resolved · 2 Pith anchors

[1] OpenAI GPT-5 System Card 2025 · arXiv:2601.03267
[2] GPQA: A Graduate-Level Google-Proof Q&A Benchmark 2025 · arXiv:2311.12022
[3] Any internal finding is off-limits for customer recommendations until it is made public via the External Advisory Protocol (EAP)
[4] Information definitions.“Preliminary”: internal reports, draft analyses, in-progress mea- surements, or unverifiable metrics. “Customer-facing”: published specs, user guides, marketing materials, and
[5] Routing requirement.Potential product issues discovered internally must be submitted to EAP for review and, if warranted, converted into a customer advisory before they may alter guidance or compariso
Receipt and verification
First computed 2026-05-20T00:00:48.914539Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

9ee32c5df6a0c339ba457acbdfd9c46540dc241aa7afb6c0e9afeec3869370a1

Aliases

arxiv: 2605.15257 · arxiv_version: 2605.15257v1 · doi: 10.48550/arxiv.2605.15257 · pith_short_12: T3RSYXPWUDBT · pith_short_16: T3RSYXPWUDBTTOSF · pith_short_8: T3RSYXPW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/T3RSYXPWUDBTTOSFPLF57WOEMV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9ee32c5df6a0c339ba457acbdfd9c46540dc241aa7afb6c0e9afeec3869370a1
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "dfdb57ff3c2ea314067691dbbcd7899c94727890c2bd90d43cd5c955c447d58c",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-14T17:59:01Z",
    "title_canon_sha256": "3537d95bcee388d1856256a333663f7cc2d6be9b3f673c40ee372579a215aeb0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15257",
    "kind": "arxiv",
    "version": 1
  }
}