Pith Number

pith:R5VTAKTW

pith:2019:R5VTAKTWDE5JPZRHHJII3RAWIJ

not attested not anchored not stored refs resolved

Risks from Learned Optimization in Advanced Machine Learning Systems

Chris van Merwijk, Evan Hubinger, Joar Skalse, Scott Garrabrant, Vladimir Mikulik

Learned models in machine learning can themselves become optimizers whose objectives diverge from the training loss.

arxiv:1906.01820 v3 · 2019-06-05 · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{R5VTAKTWDE5JPZRHHJII3RAWIJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems: under what circumstances will learned models be optimizers, and when a learned model is an optimizer, what will its objective be and how can it be aligned?

C2weakest assumption

The analysis assumes that sufficiently capable learned models will contain internal optimization processes whose objectives can be analyzed separately from the outer training loss, without providing formal conditions or empirical thresholds for when this separation becomes load-bearing.

C3one line summary

Mesa-optimization arises when learned models act as optimizers with objectives that can differ from their training loss, creating alignment risks in advanced machine learning.

References

40 extracted · 40 resolved · 18 Pith anchors

[1] Bottle caps aren’t optimisers, 2018 2018

[2] TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning 2018 · arXiv:1710.11417

[3] Universal Planning Networks 2018 · arXiv:1804.00645

[4] 2016 , month = nov, journal = 2016 · arXiv:1606.04474

[5] Bartlett, Ilya Sutskever, and Pieter Abbeel 2016 · arXiv:1611.02779

Formal links

1 machine-checked theorem link

Cited by

35 papers in Pith

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

Understanding Goal Generalisation in Sequential Reinforcement Learning

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

Mechanistic Interpretability Needs Philosophy

Receipt and verification

First computed	2026-05-17T23:38:52.563737Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

8f6b302a76193a97e6273a508dc416426df405243f227763f9e7b7d97e8765c4

Aliases

arxiv: 1906.01820 · arxiv_version: 1906.01820v3 · doi: 10.48550/arxiv.1906.01820 · pith_short_12: R5VTAKTWDE5J · pith_short_16: R5VTAKTWDE5JPZRH · pith_short_8: R5VTAKTW

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/R5VTAKTWDE5JPZRHHJII3RAWIJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8f6b302a76193a97e6273a508dc416426df405243f227763f9e7b7d97e8765c4

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "fe532e59f6ef9e5e2eb21aa8854b30c924bb21870540f1a3482ecaa5b9e719e9",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2019-06-05T04:43:25Z",
    "title_canon_sha256": "bef97e85af23a1b58be90a7b9e8ecf0a42d495c76af7ada23acf73d323ce9916"
  },
  "schema_version": "1.0",
  "source": {
    "id": "1906.01820",
    "kind": "arxiv",
    "version": 3
  }
}