Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Steve-1: A generative model for text-to-behavior in minecraft.Advances in Neural Information Processing Systems, 36:69900–69929, 2023
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
GROW decomposes trajectories into state-action samples to enable GRPO for multi-turn VLM agents and reports state-of-the-art results on more than 800 Minecraft tasks.
citing papers explorer
-
Training Agents Inside of Scalable World Models
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
-
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
GROW decomposes trajectories into state-action samples to enable GRPO for multi-turn VLM agents and reports state-of-the-art results on more than 800 Minecraft tasks.