{"paper":{"title":"Mastering Atari with Discrete World Models","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"DreamerV2 achieves human-level performance on Atari by learning behaviors inside a separately trained discrete world model.","cross_cats":["cs.AI","stat.ML"],"primary_cat":"cs.LG","authors_text":"Danijar Hafner, Jimmy Ba, Mohammad Norouzi, Timothy Lillicrap","submitted_at":"2020-10-05T17:52:14Z","abstract_excerpt":"Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model. The worl"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the learned discrete world model remains sufficiently accurate over the multi-step imagined trajectories used for policy optimization, without compounding errors that would invalidate the imagined returns.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"DreamerV2 achieves human-level performance on Atari by learning behaviors inside a separately trained discrete world model.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"b888d1f8367efcc45c8e083c9f04922fcda7e14e06a8085e45be447aaafce288"},"source":{"id":"2010.02193","kind":"arxiv","version":4},"verdict":{"id":"f5d18503-e20e-4ffc-8306-ea795f7035f4","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T01:21:23.129242Z","strongest_claim":"DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model.","one_line_summary":"DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the learned discrete world model remains sufficiently accurate over the multi-step imagined trajectories used for policy optimization, without compounding errors that would invalidate the imagined returns.","pith_extraction_headline":"DreamerV2 achieves human-level performance on Atari by learning behaviors inside a separately trained discrete world model."},"references":{"count":56,"sample":[{"doi":"","year":null,"title":"H., and Levine, S","work_id":"2b4f01f7-2946-42ed-ad06-677913824304","ref_index":1,"cited_arxiv_id":"1710.11252","is_internal_anchor":true},{"doi":"","year":2003,"title":"Agent57: Outperforming the Atari Human Benchmark","work_id":"1ebaef4b-4a53-434a-96c4-8c645798ae71","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"A distributional perspective on rein- forcement learning.arXiv preprint arXiv:1707.06887","work_id":"79879eb4-5aa1-4764-9aaa-505a0a1c0f7f","ref_index":3,"cited_arxiv_id":"1707.06887","is_internal_anchor":true},{"doi":"","year":null,"title":"Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation","work_id":"1fe8c7c8-aff7-4b94-9096-e549d7e60789","ref_index":4,"cited_arxiv_id":"1308.3432","is_internal_anchor":true},{"doi":"","year":null,"title":"Learning and Querying Fast Generative Models for Reinforcement Learning","work_id":"45700551-6f99-4914-b123-083e4ac20e0a","ref_index":5,"cited_arxiv_id":"1802.03006","is_internal_anchor":true}],"resolved_work":56,"snapshot_sha256":"d506c9036b0f063443309964941bfe3c726c3800f8812ee2169e139c0a4de2c5","internal_anchors":41},"formal_canon":{"evidence_count":2,"snapshot_sha256":"a861284b54472c82477bb4ead6f023fe9141056c0c6be6e432d72b653494f3ac"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}