Pith Number

pith:QZPNLERN

pith:2019:QZPNLERNGAKNBUDI22PIJ77TFS

not attested not anchored not stored refs resolved

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Arthur Guez, David Silver, Demis Hassabis, Edward Lockhart, Ioannis Antonoglou, Julian Schrittwieser, Karen Simonyan, Laurent Sifre, Simon Schmitt, Thomas Hubert, Thore Graepel, Timothy Lillicrap

MuZero achieves superhuman performance in Atari, Go, chess and shogi by learning a model that predicts only the reward, policy and value needed for planning.

arxiv:1911.08265 v2 · 2019-11-19 · cs.LG · stat.ML

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{QZPNLERNGAKNBUDI22PIJ77TFS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MuZero achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

C2weakest assumption

That the learned model, when applied iteratively inside tree search, produces sufficiently accurate long-horizon predictions of reward, policy, and value to support effective planning even when the true dynamics are unknown and high-dimensional.

C3one line summary

MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environment dynamics.

References

53 extracted · 53 resolved · 6 Pith anchors

[1] Lipton, and Animashree Anandkumar 2018

[2] The arcade learning environment: An evaluation platform for general agents 2013

[3] Superhuman ai for heads-up no-limit poker: Libratus beats top profes- sionals 2018

[4] Learning and Querying Fast Generative Models for Reinforcement Learning 2018 · arXiv:1802.03006

[5] Joseph Hoane, Jr., and Feng-hsiung Hsu 2002

Formal links

2 machine-checked theorem links

Cited by

21 papers in Pith

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale

The Serial Scaling Hypothesis

Latent Chain-of-Thought World Modeling for End-to-End Driving

Receipt and verification

First computed	2026-05-17T23:38:46.177763Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

865ed5922d3014d0d068d69e84fff32caf82ec05d3984e27e9a7f6d3678b1b63

Aliases

arxiv: 1911.08265 · arxiv_version: 1911.08265v2 · doi: 10.48550/arxiv.1911.08265 · pith_short_12: QZPNLERNGAKN · pith_short_16: QZPNLERNGAKNBUDI · pith_short_8: QZPNLERN

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/QZPNLERNGAKNBUDI22PIJ77TFS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 865ed5922d3014d0d068d69e84fff32caf82ec05d3984e27e9a7f6d3678b1b63

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "c17ad5bd13adb1c24b6da393fcd423d7782deb1152195b6ec0860f51eaf91b91",
    "cross_cats_sorted": [
      "stat.ML"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2019-11-19T13:58:52Z",
    "title_canon_sha256": "e1b6e9a101ccbe0c56a2ef0ef7c6625e447d26efc29afa6fac4b14d44e852264"
  },
  "schema_version": "1.0",
  "source": {
    "id": "1911.08265",
    "kind": "arxiv",
    "version": 2
  }
}