pith. sign in
Pith Number

pith:QZPNLERN

pith:2019:QZPNLERNGAKNBUDI22PIJ77TFS
not attested not anchored not stored refs resolved

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Arthur Guez, David Silver, Demis Hassabis, Edward Lockhart, Ioannis Antonoglou, Julian Schrittwieser, Karen Simonyan, Laurent Sifre, Simon Schmitt, Thomas Hubert, Thore Graepel, Timothy Lillicrap

MuZero achieves superhuman performance in Atari, Go, chess and shogi by learning a model that predicts only the reward, policy and value needed for planning.

arxiv:1911.08265 v2 · 2019-11-19 · cs.LG · stat.ML

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{QZPNLERNGAKNBUDI22PIJ77TFS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MuZero achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

C2weakest assumption

That the learned model, when applied iteratively inside tree search, produces sufficiently accurate long-horizon predictions of reward, policy, and value to support effective planning even when the true dynamics are unknown and high-dimensional.

C3one line summary

MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environment dynamics.

References

53 extracted · 53 resolved · 6 Pith anchors

[1] Lipton, and Animashree Anandkumar 2018
[2] The arcade learning environment: An evaluation platform for general agents 2013
[3] Superhuman ai for heads-up no-limit poker: Libratus beats top profes- sionals 2018
[4] Learning and Querying Fast Generative Models for Reinforcement Learning 2018 · arXiv:1802.03006
[5] Joseph Hoane, Jr., and Feng-hsiung Hsu 2002

Formal links

2 machine-checked theorem links

Cited by

21 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.177763Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

865ed5922d3014d0d068d69e84fff32caf82ec05d3984e27e9a7f6d3678b1b63

Aliases

arxiv: 1911.08265 · arxiv_version: 1911.08265v2 · doi: 10.48550/arxiv.1911.08265 · pith_short_12: QZPNLERNGAKN · pith_short_16: QZPNLERNGAKNBUDI · pith_short_8: QZPNLERN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QZPNLERNGAKNBUDI22PIJ77TFS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 865ed5922d3014d0d068d69e84fff32caf82ec05d3984e27e9a7f6d3678b1b63
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c17ad5bd13adb1c24b6da393fcd423d7782deb1152195b6ec0860f51eaf91b91",
    "cross_cats_sorted": [
      "stat.ML"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2019-11-19T13:58:52Z",
    "title_canon_sha256": "e1b6e9a101ccbe0c56a2ef0ef7c6625e447d26efc29afa6fac4b14d44e852264"
  },
  "schema_version": "1.0",
  "source": {
    "id": "1911.08265",
    "kind": "arxiv",
    "version": 2
  }
}