Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Frederik Ebert , Chelsea Finn , Sudeep Dasari , Annie Xie , Alex Lee , Sergey Levine

Authors on Pith no claims yet

classification 💻 cs.RO cs.AIcs.CVcs.LG

keywords deepgoaltasksimagelearningmanipulationroboticdesired

read the original abstract

Deep reinforcement learning (RL) algorithms can learn complex robotic skills from raw sensory inputs, but have yet to achieve the kind of broad generalization and applicability demonstrated by deep learning methods in supervised domains. We present a deep RL method that is practical for real-world robotics tasks, such as robotic manipulation, and generalizes effectively to never-before-seen tasks and objects. In these settings, ground truth reward signals are typically unavailable, and we therefore propose a self-supervised model-based approach, where a predictive model learns to directly predict the future from raw sensory readings, such as camera images. At test time, we explore three distinct goal specification methods: designated pixels, where a user specifies desired object manipulation tasks by selecting particular pixels in an image and corresponding goal positions, goal images, where the desired goal state is specified with an image, and image classifiers, which define spaces of goal states. Our deep predictive models are trained using data collected autonomously and continuously by a robot interacting with hundreds of objects, without human supervision. We demonstrate that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
cs.RO 2023-10 unverdicted novelty 7.0

A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.
Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling
cs.AI 2026-05 unverdicted novelty 6.0

Hamiltonian World Models structure latent dynamics around energy-conserving Hamiltonian evolution to produce physically grounded, action-controllable predictions for embodied decision making.
Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning
cs.RO 2026-04 unverdicted novelty 6.0

Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
cs.AI 2025-06 unverdicted novelty 6.0

V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 h...
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
cs.RO 2024-03 accept novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
cs.RO 2021-09 accept novelty 6.0

A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
cs.LG 2020-05 unverdicted novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.