{"paper":{"title":"Training Agents Inside of Scalable World Models","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Dreamer 4 obtains diamonds in Minecraft by training reinforcement learning behaviors inside a world model learned from offline videos.","cross_cats":["cs.LG","cs.RO","stat.ML"],"primary_cat":"cs.AI","authors_text":"Danijar Hafner, Timothy Lillicrap, Wilson Yan","submitted_at":"2025-09-29T09:42:27Z","abstract_excerpt":"World models learn general knowledge from videos and simulate experience for training behaviors in imagination, offering a path towards intelligent agents. However, previous world models have been unable to accurately predict object interactions in complex environments. We introduce Dreamer 4, a scalable agent that learns to solve control tasks by reinforcement learning inside of a fast and accurate world model. In the complex video game Minecraft, the world model accurately predicts object interactions and game mechanics, outperforming previous world models by a large margin. The world model "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The world model accurately predicts object interactions and game mechanics over the long action sequences required for the diamond task.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Dreamer 4 obtains diamonds in Minecraft by training reinforcement learning behaviors inside a world model learned from offline videos.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a5ed48d9d4e669289a408a2581c8da8d60ebc3bf363b27c85e57f8d791f7fd02"},"source":{"id":"2509.24527","kind":"arxiv","version":1},"verdict":{"id":"e70fdf4a-fa31-411b-a985-eb38cd501cd6","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:01:18.530066Z","strongest_claim":"By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction.","one_line_summary":"Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The world model accurately predicts object interactions and game mechanics over the long action sequences required for the diamond task.","pith_extraction_headline":"Dreamer 4 obtains diamonds in Minecraft by training reinforcement learning behaviors inside a world model learned from offline videos."},"references":{"count":84,"sample":[{"doi":"","year":2025,"title":"Mastering diverse control tasks through world models.Nature, pages 1–7, 2025","work_id":"11501645-f4f6-4b70-a15e-9da63b320e73","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Daydreamer: World models for physical robot learning","work_id":"3be0bb29-cef0-4cd3-90da-1ffb544f0046","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"TD-MPC2: Scalable, Robust World Models for Continuous Control","work_id":"360ec5fb-79fd-4490-bc73-3d161609c42d","ref_index":3,"cited_arxiv_id":"2310.16828","is_internal_anchor":true},{"doi":"","year":2024,"title":"Diffusion for world modeling: Visual details matter in atari.Advances in Neural Information Processing Systems, 37:58757–58791, 2024","work_id":"38bbb70f-29a1-4c4c-9a5b-f24d6be2e431","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1911,"title":"Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model","work_id":"964ed935-1570-495e-a162-9182456934cc","ref_index":5,"cited_arxiv_id":"1911.08265","is_internal_anchor":true}],"resolved_work":84,"snapshot_sha256":"23340cba0440dd8752f3268d74a4c14f9e42192b4a5ff5b96a8f6fe37b8ef1b4","internal_anchors":26},"formal_canon":{"evidence_count":2,"snapshot_sha256":"cfb102cd8a5c161d63ae1f4f8b90da852ce445831b12a5650c28aff7edf83ea3"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}