hub Canonical reference

TD-MPC2: Scalable, Robust World Models for Continuous Control

Nicklas Hansen, Hao Su, Xiaolong Wang · 2023 · cs.LG · arXiv 2310.16828

Canonical reference. 83% of citing Pith papers cite this work as background.

36 Pith papers citing it

Background 83% of classified citations

open full Pith review browse 36 citing papers arXiv PDF

abstract

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://tdmpc2.com

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 baseline 1 method 1

citation-polarity summary

background 10 baseline 1 use method 1

representative citing papers

Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models

cs.AI · 2026-05-16 · unverdicted · novelty 7.0

Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.

OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation

cs.RO · 2026-05-07 · unverdicted · novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

cs.RO · 2026-04-27 · unverdicted · novelty 7.0 · 2 refs

ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.

PlayWorld: Learning Robot World Models from Autonomous Play

cs.RO · 2026-03-09 · unverdicted · novelty 7.0

PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning

cs.LG · 2025-06-06 · conditional · novelty 7.0

BiTrajDiff augments offline RL datasets by running independent forward and backward diffusion processes from intermediate states, yielding higher performance than prior one-directional data-augmentation baselines on D4RL.

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.

MolWorld: Molecule World Models for Actionable Molecular Optimization

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

MolWorld expands a molecule-transfer graph using a world model to discover high-property molecules that maintain strong structural connectivity to known compounds for actionable optimization.

Predictive but Not Plannable: RC-aux for Latent World Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

TRAP: Tail-aware Ranking Attack for World-Model Planning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.

RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

RAY-TOLD combines ray-based latent dynamics from LiDAR with MPPI control and a learned policy prior via mixture sampling to lower collision rates in high-density dynamic obstacle environments compared to standard MPPI.

Toward Safe Autonomous Robotic Endovascular Interventions using World Models

cs.RO · 2026-04-22 · unverdicted · novelty 6.0

TD-MPC2 world models achieve 58% mean success in simulated endovascular navigation versus 36% for SAC, with comparable in-vitro rates but better path efficiency.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

GIRL reduces latent rollout drift by 38-61% versus DreamerV3 in MBRL by grounding transitions with DINOv2 embeddings and using an information-theoretic adaptive bottleneck, yielding better long-horizon returns on control benchmarks.

Neural Operators for Multi-Task Control and Adaptation

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Neural operators approximate the solution operator for multi-task optimal control, generalizing to new tasks and enabling efficient adaptation via branch-trunk structure and meta-training.

Hierarchical Planning with Latent World Models

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.

Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.

Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction

cs.LG · 2026-03-07 · unverdicted · novelty 6.0

Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.

RISE: Self-Improving Robot Policy with Compositional World Model

cs.RO · 2026-02-11 · unverdicted · novelty 6.0

RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

cs.AI · 2026-01-22 · conditional · novelty 6.0

Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.

Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making

cs.LG · 2025-12-18 · unverdicted · novelty 6.0

An adaptive RL-MPC framework uses RL to inform MPPI sampling and aggregates MPPI samples for value estimation, delivering up to 72% higher success rates and 2.1x faster convergence on tasks like race driving and Lunar Lander with obstacles.

citing papers explorer

Showing 36 of 36 citing papers.

Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models cs.AI · 2026-05-16 · unverdicted · none · ref 9 · internal anchor
Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 20 · internal anchor
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
Support-Safe Variational Hybrid Filtering for Contact-Mode and Sparse-Law Recovery cs.RO · 2026-05-12 · unverdicted · none · ref 5 · internal anchor
VHYDRO is a support-safe variational hybrid filter that jointly recovers continuous latent states, discrete contact modes, and sparse port-Hamiltonian laws per regime while preventing loss of feasible transitions.
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation cs.RO · 2026-05-07 · unverdicted · none · ref 25 · internal anchor
OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 29 · internal anchor
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations cs.RO · 2026-04-27 · unverdicted · none · ref 17 · 2 links · internal anchor
ACO-MoE recovers 95.3% of clean-input performance in visual control tasks under Markov-switching corruptions by routing restoration experts and anchoring representations to clean foreground masks.
PlayWorld: Learning Robot World Models from Autonomous Play cs.RO · 2026-03-09 · unverdicted · none · ref 47 · internal anchor
PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 3 · internal anchor
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning cs.LG · 2025-06-06 · conditional · none · ref 16 · internal anchor
BiTrajDiff augments offline RL datasets by running independent forward and backward diffusion processes from intermediate states, yielding higher performance than prior one-directional data-augmentation baselines on D4RL.
Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning cs.LG · 2026-05-10 · unverdicted · none · ref 19 · internal anchor
Ms.PR applies multi-scale predictive supervision to enforce goal-directed alignment in latent spaces for offline GCRL, yielding improved representation quality and performance on vision and state-based tasks.
MolWorld: Molecule World Models for Actionable Molecular Optimization cs.LG · 2026-05-09 · unverdicted · none · ref 21 · internal anchor
MolWorld expands a molecule-transfer graph using a world model to discover high-property molecules that maintain strong structural connectivity to known compounds for actionable optimization.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 20 · internal anchor
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
TRAP: Tail-aware Ranking Attack for World-Model Planning cs.LG · 2026-05-03 · unverdicted · none · ref 23 · internal anchor
TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.
RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC cs.RO · 2026-04-30 · unverdicted · none · ref 10 · internal anchor
RAY-TOLD combines ray-based latent dynamics from LiDAR with MPPI control and a learned policy prior via mixture sampling to lower collision rates in high-density dynamic obstacle environments compared to standard MPPI.
Toward Safe Autonomous Robotic Endovascular Interventions using World Models cs.RO · 2026-04-22 · unverdicted · none · ref 16 · internal anchor
TD-MPC2 world models achieve 58% mean success in simulated endovascular navigation versus 36% for SAC, with comparable in-vitro rates but better path efficiency.
Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 63 · internal anchor
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control cs.LG · 2026-04-08 · unverdicted · none · ref 4 · internal anchor
GIRL reduces latent rollout drift by 38-61% versus DreamerV3 in MBRL by grounding transitions with DINOv2 embeddings and using an information-theoretic adaptive bottleneck, yielding better long-horizon returns on control benchmarks.
Neural Operators for Multi-Task Control and Adaptation cs.LG · 2026-04-03 · unverdicted · none · ref 4 · internal anchor
Neural operators approximate the solution operator for multi-task optimal control, generalizing to new tasks and enabling efficient adaptation via branch-trunk structure and meta-training.
Hierarchical Planning with Latent World Models cs.LG · 2026-04-03 · unverdicted · none · ref 20 · internal anchor
Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.
Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots cs.RO · 2026-04-03 · unverdicted · none · ref 24 · internal anchor
DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction cs.LG · 2026-03-07 · unverdicted · none · ref 9 · internal anchor
Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.
RISE: Self-Improving Robot Policy with Compositional World Model cs.RO · 2026-02-11 · unverdicted · none · ref 35 · internal anchor
RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning cs.AI · 2026-01-22 · conditional · none · ref 12 · internal anchor
Single-stage fine-tuning of a video model to generate actions as latent frames plus future states and values yields state-of-the-art robot policy performance on LIBERO, RoboCasa, and bimanual tasks.
Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making cs.LG · 2025-12-18 · unverdicted · none · ref 4 · internal anchor
An adaptive RL-MPC framework uses RL to inform MPPI sampling and aggregates MPPI samples for value estimation, delivering up to 72% higher success rates and 2.1x faster convergence on tasks like race driving and Lunar Lander with obstacles.
Model-Based Reinforcement Learning under Random Observation Delays cs.LG · 2025-09-25 · unverdicted · none · ref 41 · internal anchor
A delay-aware model-based RL framework with sequential belief filtering handles random out-of-sequence observations in POMDPs and outperforms MDP baselines while showing robustness to delay shifts.
GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation cs.RO · 2025-06-17 · unverdicted · none · ref 22 · internal anchor
GAF creates 4D dynamic scene models by adding motion to 3D Gaussians, enabling better reconstruction and 7.3% higher success in robotic tasks.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning cs.AI · 2025-06-11 · unverdicted · none · ref 30 · internal anchor
V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning cs.RO · 2024-11-07 · unverdicted · none · ref 26 · internal anchor
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation cs.LG · 2026-05-20 · unverdicted · none · ref 8 · internal anchor
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing cs.AI · 2026-05-12 · unverdicted · none · ref 5 · internal anchor
TOPPO reformulates PPO with critic balancing to address gradient ill-conditioning in multi-task RL and reports stronger mean and tail performance than SAC baselines on Meta-World+ using fewer parameters and steps.
Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift cs.LG · 2026-04-30 · unverdicted · none · ref 1 · internal anchor
JEPA-Indexed Local Expert Growth adds local action corrections for detected shift clusters and yields statistically significant OOD gains on four shift conditions while keeping in-distribution performance intact.
World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems cs.RO · 2026-04-16 · unverdicted · none · ref 10 · internal anchor
The World-Value-Action model enables implicit planning for VLA systems by performing inference over a learned latent representation of high-value future trajectories instead of direct action prediction.
D2 Actor Critic: Diffusion Actor Meets Distributional Critic cs.LG · 2025-10-03 · unverdicted · none · ref 14 · internal anchor
D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.
EvolvingAgent: Curriculum Self-evolving Agent with Continual World Model for Long-Horizon Tasks cs.RO · 2025-02-09 · unverdicted · none · ref 7 · internal anchor
EvolvingAgent autonomously completes long-horizon tasks via a closed-loop planner-controller-reflector system with continual world model updates, reporting 111.74% higher success rates than baselines in Minecraft and human-level Atari performance.
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own cs.RO · 2023-10-04 · unverdicted · none · ref 13 · internal anchor
RLFP and the FAC algorithm combine foundation-model priors for policy, value, and rewards to produce sample-efficient robotic RL that reaches 86% real-robot success after one hour and 100% success on 7/8 Meta-world tasks in under 100k frames.
Active Inference: A method for Phenotyping Agency in AI systems? cs.AI · 2026-04-25 · unverdicted · none · ref 20 · internal anchor
Active inference offers a variational way to phenotype agency in AI systems by measuring empowerment in generative models via a T-maze paradigm.

TD-MPC2: Scalable, Robust World Models for Continuous Control

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer