hub

Visual foresight: Model-based deep reinforcement learning for vision-based robotic control

· 2018 · cs.RO · arXiv 1812.00568

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

open full Pith review browse 20 citing papers arXiv PDF

abstract

Deep reinforcement learning (RL) algorithms can learn complex robotic skills from raw sensory inputs, but have yet to achieve the kind of broad generalization and applicability demonstrated by deep learning methods in supervised domains. We present a deep RL method that is practical for real-world robotics tasks, such as robotic manipulation, and generalizes effectively to never-before-seen tasks and objects. In these settings, ground truth reward signals are typically unavailable, and we therefore propose a self-supervised model-based approach, where a predictive model learns to directly predict the future from raw sensory readings, such as camera images. At test time, we explore three distinct goal specification methods: designated pixels, where a user specifies desired object manipulation tasks by selecting particular pixels in an image and corresponding goal positions, goal images, where the desired goal state is specified with an image, and image classifiers, which define spaces of goal states. Our deep predictive models are trained using data collected autonomously and continuously by a robot interacting with hundreds of objects, without human supervision. We demonstrate that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 3 unclear 1

representative citing papers

PlayWorld: Learning Robot World Models from Autonomous Play

cs.RO · 2026-03-09 · unverdicted · novelty 7.0

PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

cs.RO · 2023-10-16 · conditional · novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

cs.RO · 2023-10-13 · unverdicted · novelty 7.0

A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

cs.RO · 2025-10-11 · unverdicted · novelty 6.0

A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

cs.RO · 2025-08-07 · unverdicted · novelty 6.0

Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.

ReSim: Reliable World Simulation for Autonomous Driving

cs.CV · 2025-06-11 · unverdicted · novelty 6.0

ReSim is a controllable video world model trained on heterogeneous real and simulated driving data that achieves higher fidelity and controllability for both expert and non-expert actions, plus a Video2Reward module for estimating action quality from simulated futures.

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

cs.RO · 2024-11-07 · unverdicted · novelty 6.0

DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.

Is Conditional Generative Modeling all you need for Decision-Making?

cs.LG · 2022-11-28 · unverdicted · novelty 6.0

Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.

RoboNet: Large-Scale Multi-Robot Learning

cs.RO · 2019-10-24 · conditional · novelty 6.0

RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.

Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

cs.AI · 2025-06-11 · unverdicted · novelty 6.0

V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

cs.RO · 2024-03-19 · accept · novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

cs.RO · 2021-09-27 · accept · novelty 6.0

A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.

WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems

cs.LG · 2026-03-15 · unverdicted · novelty 5.0

WestWorld introduces a scalable trajectory world model with Sys-MoE routing via system embeddings and structural embeddings for physical knowledge, pretrained on 89 environments to improve zero-shot prediction and real-robot control.

Reasoning and Generalization in RL: A Tool Use Perspective

cs.NE · 2019-07-03 · unverdicted · novelty 5.0

Proposes a tool-use inspired framework with multiple test sets to measure specified types of generalization in RL.

Learning to Cope with Adversarial Attacks

cs.LG · 2019-06-28 · unverdicted · novelty 5.0

MLAH agent in deep RL demonstrates hierarchical coping mechanisms and improved reward maintenance under spaced adversarial attacks, at the expense of stability.

Planning Robot Motion using Deep Visual Prediction

cs.RO · 2019-06-24 · unverdicted · novelty 3.0

PROM-Net performs unsupervised visual prediction of robot motion from raw frames and integrates the predictions into model predictive control for navigation in unknown dynamic settings.

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

cs.LG · 2020-05-04 · unverdicted · novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.

EgoExo-WM: Unlocking Exo Video for Ego World Models

cs.CV · 2026-05-14

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

cs.AI · 2026-05-01

citing papers explorer

Showing 20 of 20 citing papers.

PlayWorld: Learning Robot World Models from Autonomous Play cs.RO · 2026-03-09 · unverdicted · none · ref 66 · internal anchor
PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models cs.RO · 2023-10-16 · conditional · none · ref 17 · internal anchor
SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models cs.RO · 2023-10-13 · unverdicted · none · ref 74
A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation cs.RO · 2025-10-11 · unverdicted · none · ref 12 · internal anchor
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation cs.RO · 2025-08-07 · unverdicted · none · ref 12 · internal anchor
Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.
ReSim: Reliable World Simulation for Autonomous Driving cs.CV · 2025-06-11 · unverdicted · none · ref 64 · internal anchor
ReSim is a controllable video world model trained on heterogeneous real and simulated driving data that achieves higher fidelity and controllability for both expert and non-expert actions, plus a Video2Reward module for estimating action quality from simulated futures.
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning cs.RO · 2024-11-07 · unverdicted · none · ref 15 · internal anchor
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
Is Conditional Generative Modeling all you need for Decision-Making? cs.LG · 2022-11-28 · unverdicted · none · ref 104 · internal anchor
Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.
RoboNet: Large-Scale Multi-Robot Learning cs.RO · 2019-10-24 · conditional · none · ref 7 · internal anchor
RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.
Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning cs.RO · 2026-04-09 · unverdicted · none · ref 20
Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning cs.AI · 2025-06-11 · unverdicted · none · ref 20
V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset cs.RO · 2024-03-19 · accept · none · ref 12
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets cs.RO · 2021-09-27 · accept · none · ref 23
A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.
WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems cs.LG · 2026-03-15 · unverdicted · none · ref 9 · internal anchor
WestWorld introduces a scalable trajectory world model with Sys-MoE routing via system embeddings and structural embeddings for physical knowledge, pretrained on 89 environments to improve zero-shot prediction and real-robot control.
Reasoning and Generalization in RL: A Tool Use Perspective cs.NE · 2019-07-03 · unverdicted · none · ref 25 · internal anchor
Proposes a tool-use inspired framework with multiple test sets to measure specified types of generalization in RL.
Learning to Cope with Adversarial Attacks cs.LG · 2019-06-28 · unverdicted · none · ref 6 · internal anchor
MLAH agent in deep RL demonstrates hierarchical coping mechanisms and improved reward maintenance under spaced adversarial attacks, at the expense of stability.
Planning Robot Motion using Deep Visual Prediction cs.RO · 2019-06-24 · unverdicted · none · ref 3 · internal anchor
PROM-Net performs unsupervised visual prediction of robot motion from raw frames and integrates the predictions into model predictive control for navigation in unknown dynamic settings.
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems cs.LG · 2020-05-04 · unverdicted · none · ref 293
Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.
EgoExo-WM: Unlocking Exo Video for Ego World Models cs.CV · 2026-05-14 · unreviewed · ref 21 · internal anchor
Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling cs.AI · 2026-05-01 · unreviewed · ref 6

Visual foresight: Model-based deep reinforcement learning for vision-based robotic control

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer