pith. machine review for the scientific record. sign in

arxiv: 1812.00568 · v1 · submitted 2018-12-03 · 💻 cs.RO · cs.AI· cs.CV· cs.LG

Recognition: unknown

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Authors on Pith no claims yet
classification 💻 cs.RO cs.AIcs.CVcs.LG
keywords deepgoaltasksimagelearningmanipulationroboticdesired
0
0 comments X
read the original abstract

Deep reinforcement learning (RL) algorithms can learn complex robotic skills from raw sensory inputs, but have yet to achieve the kind of broad generalization and applicability demonstrated by deep learning methods in supervised domains. We present a deep RL method that is practical for real-world robotics tasks, such as robotic manipulation, and generalizes effectively to never-before-seen tasks and objects. In these settings, ground truth reward signals are typically unavailable, and we therefore propose a self-supervised model-based approach, where a predictive model learns to directly predict the future from raw sensory readings, such as camera images. At test time, we explore three distinct goal specification methods: designated pixels, where a user specifies desired object manipulation tasks by selecting particular pixels in an image and corresponding goal positions, goal images, where the desired goal state is specified with an image, and image classifiers, which define spaces of goal states. Our deep predictive models are trained using data collected autonomously and continuously by a robot interacting with hundreds of objects, without human supervision. We demonstrate that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PlayWorld: Learning Robot World Models from Autonomous Play

    cs.RO 2026-03 unverdicted novelty 7.0

    PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy p...

  2. Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    cs.RO 2023-10 unverdicted novelty 7.0

    A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.

  3. Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

    cs.AI 2026-05 unverdicted novelty 6.0

    Hamiltonian World Models structure latent dynamics around energy-conserving Hamiltonian evolution to produce physically grounded, action-controllable predictions for embodied decision making.

  4. Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning

    cs.RO 2026-04 unverdicted novelty 6.0

    Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.

  5. V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    cs.AI 2025-06 unverdicted novelty 6.0

    V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 h...

  6. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    cs.RO 2024-03 accept novelty 6.0

    DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.

  7. Is Conditional Generative Modeling all you need for Decision-Making?

    cs.LG 2022-11 unverdicted novelty 6.0

    Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.

  8. Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

    cs.RO 2021-09 accept novelty 6.0

    A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.

  9. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

    cs.LG 2020-05 unverdicted novelty 2.0

    Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.