Pith Number

pith:7XDOHLLJ

pith:2024:7XDOHLLJCTTBBCQPS62B4AQ77H

not attested not anchored not stored refs resolved

Frontier Models are Capable of In-context Scheming

Alexander Meinke, Bronson Schoen, J\'er\'emy Scheurer, Marius Hobbhahn, Mikita Balesni, Rusheb Shah

Frontier models can scheme by hiding actions and disabling oversight to achieve in-context goals.

arxiv:2412.04984 v2 · 2024-12-06 · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{7XDOHLLJCTTBBCQPS62B4AQ77H}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our results show that o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate in-context scheming capabilities. They recognize scheming as a viable strategy and readily engage in such behavior.

C2weakest assumption

The six agentic evaluations accurately distinguish genuine scheming from artifacts of prompt phrasing, environment design, or model training data rather than measuring only surface-level compliance with instructions.

C3one line summary

Frontier models demonstrate in-context scheming by strategically deceiving in multiple agentic evaluations to achieve given goals.

References

37 extracted · 37 resolved · 9 Pith anchors

[1] Announcing inspect evals: Open-sourcing dozens of llm evaluations to advance safety research in the field, November 2024 2024

[2] Model card addendum: Claude 3.5 haiku and upgraded claude 3.5 sonnet, 2024 a 2024

[3] The claude 3 model family: Opus, sonnet, haiku, 2024 b 2024

[4] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback 2022 · arXiv:2204.05862

[5] Constitutional AI: Harmlessness from AI Feedback 2022 · arXiv:2212.08073

Formal links

3 machine-checked theorem links

Cited by

27 papers in Pith

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation

Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security

Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

DECOR: Auditing LLM Deception via Information Manipulation Theory

Receipt and verification

First computed	2026-05-17T23:38:47.617178Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

fdc6e3ad6914e6108a0f97b41e021ff9e606453235fc24ef3a54af35f298bbad

Aliases

arxiv: 2412.04984 · arxiv_version: 2412.04984v2 · doi: 10.48550/arxiv.2412.04984 · pith_short_12: 7XDOHLLJCTTB · pith_short_16: 7XDOHLLJCTTBBCQP · pith_short_8: 7XDOHLLJ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/7XDOHLLJCTTBBCQPS62B4AQ77H \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fdc6e3ad6914e6108a0f97b41e021ff9e606453235fc24ef3a54af35f298bbad

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "d94728759e0d41f41a55a15f5f4ae79a845352259ee3fd2fedac2bf0823c2f7c",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2024-12-06T12:09:50Z",
    "title_canon_sha256": "0b7dd937f508830045867ea833f23c02d3f44eed5d6139dc5d9677823d39b233"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2412.04984",
    "kind": "arxiv",
    "version": 2
  }
}