{"paper":{"title":"Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"MuZero achieves superhuman performance in Atari, Go, chess and shogi by learning a model that predicts only the reward, policy and value needed for planning.","cross_cats":["stat.ML"],"primary_cat":"cs.LG","authors_text":"Arthur Guez, David Silver, Demis Hassabis, Edward Lockhart, Ioannis Antonoglou, Julian Schrittwieser, Karen Simonyan, Laurent Sifre, Simon Schmitt, Thomas Hubert, Thore Graepel, Timothy Lillicrap","submitted_at":"2019-11-19T13:58:52Z","abstract_excerpt":"Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their u"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"MuZero achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the learned model, when applied iteratively inside tree search, produces sufficiently accurate long-horizon predictions of reward, policy, and value to support effective planning even when the true dynamics are unknown and high-dimensional.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environment dynamics.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"MuZero achieves superhuman performance in Atari, Go, chess and shogi by learning a model that predicts only the reward, policy and value needed for planning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"fc5d4d70e58143e2eb08613abb5d8fb5234c20e7280a7423dcc820c749673d35"},"source":{"id":"1911.08265","kind":"arxiv","version":2},"verdict":{"id":"d05ceb39-5633-41b4-9a1c-ba680c1e5d23","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T23:52:41.366359Z","strongest_claim":"MuZero achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.","one_line_summary":"MuZero matches or exceeds AlphaZero-level performance in Go, Chess, Shogi and sets a new state of the art on 57 Atari games by learning a model that directly supports planning rather than reconstructing full environment dynamics.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the learned model, when applied iteratively inside tree search, produces sufficiently accurate long-horizon predictions of reward, policy, and value to support effective planning even when the true dynamics are unknown and high-dimensional.","pith_extraction_headline":"MuZero achieves superhuman performance in Atari, Go, chess and shogi by learning a model that predicts only the reward, policy and value needed for planning."},"references":{"count":53,"sample":[{"doi":"","year":2018,"title":"Lipton, and Animashree Anandkumar","work_id":"89d8a872-e25f-4e79-971e-9aad2c2d136a","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2013,"title":"The arcade learning environment: An evaluation platform for general agents","work_id":"dd383516-d2cf-40d5-b95c-99ff0ca6f83d","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Superhuman ai for heads-up no-limit poker: Libratus beats top profes- sionals","work_id":"2356edfc-3c56-477c-848e-709319c2218b","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Learning and Querying Fast Generative Models for Reinforcement Learning","work_id":"45700551-6f99-4914-b123-083e4ac20e0a","ref_index":4,"cited_arxiv_id":"1802.03006","is_internal_anchor":true},{"doi":"","year":2002,"title":"Joseph Hoane, Jr., and Feng-hsiung Hsu","work_id":"313124b1-b65d-4318-85e2-cb28a84a6476","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":53,"snapshot_sha256":"fd65d6b50c28d5f2436bc2689d847b03db653ada1b97a74fc8a57871844fecde","internal_anchors":6},"formal_canon":{"evidence_count":2,"snapshot_sha256":"01cc8924ba4b674bba76f2d749503c125b742dbd28c604fa0de38be27d3fb9d8"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}