hub Canonical reference

Dream to Control: Learning Behaviors by Latent Imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi · 2019 · cs.LG · arXiv 1912.01603

Canonical reference. 95% of citing Pith papers cite this work as background.

71 Pith papers citing it

Background 95% of classified citations

open full Pith review browse 71 citing papers arXiv PDF

abstract

Learned world models summarize an agent's experience to facilitate learning complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer exceeds existing approaches in data-efficiency, computation time, and final performance.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 20 method 1

citation-polarity summary

background 20 use method 1

claims ledger

abstract Learned world models summarize an agent's experience to facilitate learning complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual contro
background Langugae-Conditoned MoCoGAN [29], U-Net [30], Latte [ 31], Wan [32], Sora 2 [ 33]. . . Embodied World Model SWIM [34], DreamDojo [ 35], RoboDreamer [36], RoboScape [37]. . . WM for VLA Imitation Learning Ctrl-World [38], RoboScape [37], DREMA [ 39] Reinforcement Learning Dreamer to Control [ 40] DreamerV2 [ 41], Dreamer 4 [ 42], RISE [ 43] DreamerV3 [44], DayDreamer [45], World-Env [46], RoboScape-R [47] WMPO [48], WoVR [49], VLA-RFT [50], RWML [51], MoDem-V2 [52] World-Gymnast [53], RWM-U [54],

co-cited works

representative citing papers

A Model-Free Universal AI

cs.AI · 2026-02-26 · unverdicted · novelty 8.0

AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

cs.CV · 2026-05-08 · conditional · novelty 7.0 · 3 refs

Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.

Operator-Guided Invariance Learning for Continuous Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

RopeDreamer uses quaternionic kinematic chains in a recurrent state space model with a dual decoder to cut open-loop prediction error by 40.52% over 50 steps on simulated DLO trajectories while preserving physical constraints.

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

cs.RO · 2026-04-21 · unverdicted · novelty 7.0

Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.

Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation

cs.NI · 2026-04-09 · unverdicted · novelty 7.0

MobiWM is a multimodal world model for mobile networks that learns state-action dynamics to enable unlimited-horizon counterfactual traffic simulations and optimization.

MoRight: Motion Control Done Right

cs.CV · 2026-04-08 · unverdicted · novelty 7.0

MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.

Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping

cs.CV · 2026-02-25 · unverdicted · novelty 7.0

MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Diffusion Models Are Real-Time Game Engines

cs.LG · 2024-08-27 · conditional · novelty 7.0

A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

cs.RO · 2023-10-16 · conditional · novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

Mastering Diverse Domains through World Models

cs.AI · 2023-01-10 · unverdicted · novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

Mastering Atari with Discrete World Models

cs.LG · 2020-10-05 · accept · novelty 7.0

DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

Spectral Souping: A Unified Framework for Online Preference Alignment

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

Spectral Souping learns offline specialized policies for fine-grained preferences and merges them online using a discovered universal spectral representation for efficient LLM alignment.

DiLA: Disentangled Latent Action World Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.

Feedback World Model Enables Precise Guidance of Diffusion Policy

cs.RO · 2026-05-15 · unverdicted · novelty 6.0

Feedback world model closes the prediction-observation loop at inference time to correct errors and improve diffusion policy performance under distribution shift in robotics.

Latent Video Prediction Learns Better World Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Latent prediction video models exhibit a distinct robustness profile across corruption, occlusion, fine-grained discrimination, and temporal sensitivity compared to other self-supervised video models when used as world models.

citing papers explorer

Showing 50 of 71 citing papers.

A Model-Free Universal AI cs.AI · 2026-02-26 · unverdicted · none · ref 3 · internal anchor
AIQI is the first model-free universal AI agent proven asymptotically ε-optimal in general RL by inducing over distributional Q-functions instead of policies or environments.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 132 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 3 · internal anchor
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery cs.RO · 2026-05-12 · unverdicted · none · ref 44 · internal anchor
VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.
One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy cs.CV · 2026-05-08 · conditional · none · ref 18 · 3 links · internal anchor
Reducing visual input to one token per frame in VLA world models maintains or improves long-horizon performance on MetaWorld, LIBERO, and real-robot tasks.
Operator-Guided Invariance Learning for Continuous Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 8 · internal anchor
VPSD-RL discovers exact and approximate value-preserving Lie-group operators in continuous RL to stabilize learning via transition augmentation and consistency regularization.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 26 · internal anchor
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
RopeDreamer: A Kinematic Recurrent State Space Model for Dynamics of Flexible Deformable Linear Objects cs.RO · 2026-04-30 · unverdicted · none · ref 10 · internal anchor
RopeDreamer uses quaternionic kinematic chains in a recurrent state space model with a dual decoder to cut open-loop prediction error by 40.52% over 50 steps on simulated DLO trajectories while preserving physical constraints.
Mask World Model: Predicting What Matters for Robust Robot Policy Learning cs.RO · 2026-04-21 · unverdicted · none · ref 12 · internal anchor
Mask World Model predicts semantic mask dynamics with video diffusion and integrates it with a diffusion policy head, outperforming RGB world models on LIBERO and RLBench while showing better real-world generalization and texture robustness.
Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation cs.NI · 2026-04-09 · unverdicted · none · ref 13 · internal anchor
MobiWM is a multimodal world model for mobile networks that learns state-action dynamics to enable unlimited-horizon counterfactual traffic simulations and optimization.
MoRight: Motion Control Done Right cs.CV · 2026-04-08 · unverdicted · none · ref 27 · internal anchor
MoRight disentangles object and camera motion via canonical-view specification and temporal cross-view attention, while decomposing motion into active user-driven and passive consequence components to learn and apply causality in video generation.
Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping cs.CV · 2026-02-25 · unverdicted · none · ref 13 · internal anchor
MoGaF groups Gaussians by motion in 4D splatting representations to enable stable long-term forecasting of dynamic scenes.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 65 · internal anchor
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Diffusion Models Are Real-Time Game Engines cs.LG · 2024-08-27 · conditional · none · ref 66 · internal anchor
A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.
Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 36 · internal anchor
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models cs.RO · 2023-10-16 · conditional · none · ref 21 · internal anchor
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 247 · internal anchor
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Mastering Diverse Domains through World Models cs.AI · 2023-01-10 · unverdicted · none · ref 21 · internal anchor
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Mastering Atari with Discrete World Models cs.LG · 2020-10-05 · accept · none · ref 24 · internal anchor
DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 31 · internal anchor
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
Spectral Souping: A Unified Framework for Online Preference Alignment cs.LG · 2026-05-19 · unverdicted · none · ref 7 · internal anchor
Spectral Souping learns offline specialized policies for fine-grained preferences and merges them online using a discovered universal spectral representation for efficient LLM alignment.
DiLA: Disentangled Latent Action World Models cs.CV · 2026-05-15 · unverdicted · none · ref 12 · internal anchor
DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.
Feedback World Model Enables Precise Guidance of Diffusion Policy cs.RO · 2026-05-15 · unverdicted · none · ref 3 · internal anchor
Feedback world model closes the prediction-observation loop at inference time to correct errors and improve diffusion policy performance under distribution shift in robotics.
Latent Video Prediction Learns Better World Models cs.CV · 2026-05-15 · unverdicted · none · ref 12 · internal anchor
Latent prediction video models exhibit a distinct robustness profile across corruption, occlusion, fine-grained discrimination, and temporal sensitivity compared to other self-supervised video models when used as world models.
PanoWorld: Geometry-Consistent Panoramic Video World Modeling cs.CV · 2026-05-14 · unverdicted · none · ref 10 · internal anchor
PanoWorld adds depth consistency and trajectory consistency losses plus spherical adaptations to a pre-trained video model, plus a new PanoGeo dataset, to produce geometry-consistent 360 video.
ReactiveGWM: Steering NPC in Reactive Game World Models cs.CV · 2026-05-14 · unverdicted · none · ref 13 · internal anchor
ReactiveGWM introduces a decoupled diffusion architecture for player-NPC interactions that learns game-agnostic response logic for zero-shot strategy transfer across games.
Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching cs.RO · 2026-05-10 · unverdicted · none · ref 18 · internal anchor
DRIS improves zero-shot sim-to-real transfer for reactive catching by maintaining and acting on sets of randomized dynamics instances instead of single instances per episode.
LaWM: Least Action World Models for Long-Horizon Physical Consistency from Visual Observations cs.LG · 2026-05-08 · unverdicted · none · ref 8 · internal anchor
LaWM induces latent transitions from a learned discrete variational principle rather than an unconstrained neural predictor, yielding improved physical consistency on synthetic dynamics and robot benchmarks.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 16 · internal anchor
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 147 · internal anchor
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
TRAP: Tail-aware Ranking Attack for World-Model Planning cs.LG · 2026-05-03 · unverdicted · none · ref 19 · internal anchor
TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning cs.LG · 2026-05-01 · unverdicted · none · ref 95 · internal anchor
Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.
Biased Dreams: Limitations to Epistemic Uncertainty Quantification in Latent Space Models cs.LG · 2026-04-28 · unverdicted · none · ref 14 · internal anchor
Latent transitions in models like Dreamer are biased toward dense regions, creating attractors that hide true dynamics discrepancies and cause epistemic uncertainty to be unreliable while overestimating rewards.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation cs.CV · 2026-04-20 · unverdicted · none · ref 39 · 2 links · internal anchor
OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
Learning Ad Hoc Network Dynamics via Graph-Structured World Models cs.LG · 2026-04-16 · unverdicted · none · ref 12 · internal anchor
G-RSSM learns per-node dynamics in wireless ad hoc networks via graph attention and trains clustering policies through imagined rollouts, generalizing from N=50 training to larger networks.
Simple but Stable, Fast and Safe: Achieve End-to-end Control by High-Fidelity Differentiable Simulation cs.RO · 2026-04-12 · conditional · none · ref 37 · internal anchor
An end-to-end RL policy trained via high-fidelity differentiable simulation maps depth images straight to bodyrate commands, achieving top success rates, low jerk, and zero-shot real-world generalization up to 7.5 m/s in dense environments.
Zero-shot World Models Are Developmentally Efficient Learners cs.AI · 2026-04-11 · unverdicted · none · ref 102 · internal anchor
A zero-shot visual world model trained on one child's experience achieves broad competence on physical understanding benchmarks while matching developmental behavioral patterns.
Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control cs.RO · 2026-04-03 · unverdicted · none · ref 13 · internal anchor
A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.
Safety, Security, and Cognitive Risks in World Models cs.CR · 2026-04-01 · unverdicted · none · ref 49 · internal anchor
World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction cs.LG · 2026-03-07 · unverdicted · none · ref 8 · internal anchor
Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.
World Action Models are Zero-shot Policies cs.RO · 2026-02-17 · unverdicted · none · ref 32 · internal anchor
DreamZero uses a 14B video diffusion model as a World Action Model to achieve over 2x better zero-shot generalization on real robots than state-of-the-art VLAs, real-time 7Hz closed-loop control, and cross-embodiment transfer with 10-30 minutes of data.
RISE: Self-Improving Robot Policy with Compositional World Model cs.RO · 2026-02-11 · unverdicted · none · ref 29 · internal anchor
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
DynaWeb: Model-Based Reinforcement Learning of Web Agents cs.CL · 2026-01-29 · unverdicted · none · ref 10 · internal anchor
DynaWeb introduces a model-based RL framework that trains web agents via imagined rollouts in a learned web world model interleaved with real expert trajectories, yielding consistent gains on WebArena and WebVoyager benchmarks.
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning cs.AI · 2026-01-22 · conditional · none · ref 7 · internal anchor
Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation cs.RO · 2025-10-11 · unverdicted · none · ref 17 · internal anchor
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions cs.LG · 2025-09-23 · unverdicted · none · ref 10 · internal anchor
DAWM introduces a modular diffusion world model with an inverse dynamics model to produce complete synthetic transitions that improve conservative offline RL algorithms like TD3BC and IQL on D4RL tasks.
HERO: Hierarchical Extrapolation and Refresh for Efficient World Models cs.CV · 2025-08-25 · unverdicted · none · ref 10 · internal anchor
HERO accelerates world model inference 1.73x via hierarchical patch-wise refresh in shallow layers and linear extrapolation in deeper layers with minimal quality loss.
GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation cs.RO · 2025-06-17 · unverdicted · none · ref 18 · internal anchor
GAF creates 4D dynamic scene models by adding motion to 3D Gaussians, enabling better reconstruction and 7.3% higher success in robotic tasks.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning cs.AI · 2025-06-11 · unverdicted · none · ref 28 · internal anchor
V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning cs.RO · 2024-11-07 · unverdicted · none · ref 21 · internal anchor
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.

Dream to Control: Learning Behaviors by Latent Imagination

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer