pith. sign in
Pith Number

pith:R5VTAKTW

pith:2019:R5VTAKTWDE5JPZRHHJII3RAWIJ
not attested not anchored not stored refs resolved

Risks from Learned Optimization in Advanced Machine Learning Systems

Chris van Merwijk, Evan Hubinger, Joar Skalse, Scott Garrabrant, Vladimir Mikulik

Learned models in machine learning can themselves become optimizers whose objectives diverge from the training loss.

arxiv:1906.01820 v3 · 2019-06-05 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{R5VTAKTWDE5JPZRHHJII3RAWIJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems: under what circumstances will learned models be optimizers, and when a learned model is an optimizer, what will its objective be and how can it be aligned?

C2weakest assumption

The analysis assumes that sufficiently capable learned models will contain internal optimization processes whose objectives can be analyzed separately from the outer training loss, without providing formal conditions or empirical thresholds for when this separation becomes load-bearing.

C3one line summary

Mesa-optimization arises when learned models act as optimizers with objectives that can differ from their training loss, creating alignment risks in advanced machine learning.

References

40 extracted · 40 resolved · 18 Pith anchors

[1] Bottle caps aren’t optimisers, 2018 2018
[2] TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning 2018 · arXiv:1710.11417
[3] Universal Planning Networks 2018 · arXiv:1804.00645
[4] 2016 , month = nov, journal = 2016 · arXiv:1606.04474
[5] Bartlett, Ilya Sutskever, and Pieter Abbeel 2016 · arXiv:1611.02779

Formal links

1 machine-checked theorem link

Cited by

35 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.563737Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

8f6b302a76193a97e6273a508dc416426df405243f227763f9e7b7d97e8765c4

Aliases

arxiv: 1906.01820 · arxiv_version: 1906.01820v3 · doi: 10.48550/arxiv.1906.01820 · pith_short_12: R5VTAKTWDE5J · pith_short_16: R5VTAKTWDE5JPZRH · pith_short_8: R5VTAKTW
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/R5VTAKTWDE5JPZRHHJII3RAWIJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8f6b302a76193a97e6273a508dc416426df405243f227763f9e7b7d97e8765c4
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "fe532e59f6ef9e5e2eb21aa8854b30c924bb21870540f1a3482ecaa5b9e719e9",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2019-06-05T04:43:25Z",
    "title_canon_sha256": "bef97e85af23a1b58be90a7b9e8ecf0a42d495c76af7ada23acf73d323ce9916"
  },
  "schema_version": "1.0",
  "source": {
    "id": "1906.01820",
    "kind": "arxiv",
    "version": 3
  }
}