hub Canonical reference

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

· 2018 · cs.RO · arXiv 1812.00568

Canonical reference. 80% of citing Pith papers cite this work as background.

27 Pith papers citing it

Background 80% of classified citations

open full Pith review browse 27 citing papers arXiv PDF

abstract

Deep reinforcement learning (RL) algorithms can learn complex robotic skills from raw sensory inputs, but have yet to achieve the kind of broad generalization and applicability demonstrated by deep learning methods in supervised domains. We present a deep RL method that is practical for real-world robotics tasks, such as robotic manipulation, and generalizes effectively to never-before-seen tasks and objects. In these settings, ground truth reward signals are typically unavailable, and we therefore propose a self-supervised model-based approach, where a predictive model learns to directly predict the future from raw sensory readings, such as camera images. At test time, we explore three distinct goal specification methods: designated pixels, where a user specifies desired object manipulation tasks by selecting particular pixels in an image and corresponding goal positions, goal images, where the desired goal state is specified with an image, and image classifiers, which define spaces of goal states. Our deep predictive models are trained using data collected autonomously and continuously by a robot interacting with hundreds of objects, without human supervision. We demonstrate that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 4 unclear 1

representative citing papers

Pondering the Way: Spatial-perceiving World Action Model for Embodied Navigation

cs.RO · 2026-06-29 · unverdicted · novelty 7.0

SWAM jointly generates intermediate RGB-D sequences and action trajectories from monocular RGB start/goal observations for embodied navigation.

PlayWorld: Learning Robot World Models from Autonomous Play

cs.RO · 2026-03-09 · unverdicted · novelty 7.0

PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

cs.RO · 2023-10-16 · conditional · novelty 7.0

SuSIE uses a finetuned InstructPix2Pix diffusion model to propose subgoal images that guide a low-level goal-conditioned policy, achieving SOTA zero-shot performance on CALVIN and real-world manipulation.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

cs.RO · 2023-10-13 · unverdicted · novelty 7.0

A collaborative dataset spanning 22 robots and 527 skills enables RT-X models that transfer capabilities across different robot embodiments.

SKIP: Sparse Keyframe Interpolation Paradigm for Efficient Embodied World Models

cs.RO · 2026-05-30 · unverdicted · novelty 6.0

SKIP achieves 4.16x faster dense video rollouts for robot world models by synthesizing only multimodal-identified keyframes and interpolating the rest, preserving policy training effectiveness with minimal success rate drops.

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.

Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers

cs.RO · 2026-05-25 · unverdicted · novelty 6.0

A JAX-based differentiable reachability primitive for continuous- and discrete-time NN dynamics and controllers that supports certified training and sampling-based MPC with gradient refinement.

EgoExo-WM: Unlocking Exo Video for Ego World Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

Method converts exocentric videos to egocentric format via body-pose extraction and kinematics to improve egocentric world-model prediction and planning.

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

cs.RO · 2025-10-11 · unverdicted · novelty 6.0

A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

cs.RO · 2025-08-07 · unverdicted · novelty 6.0

Genie Envisioner unifies robotic policy learning, simulation, and evaluation inside one instruction-conditioned video diffusion framework using GE-Base, GE-Act, and GE-Sim.

ReSim: Reliable World Simulation for Autonomous Driving

cs.CV · 2025-06-11 · unverdicted · novelty 6.0

ReSim is a controllable video world model trained on heterogeneous real and simulated driving data that achieves higher fidelity and controllability for both expert and non-expert actions, plus a Video2Reward module for estimating action quality from simulated futures.

DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning

cs.RO · 2024-11-07 · unverdicted · novelty 6.0

DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.

Is Conditional Generative Modeling all you need for Decision-Making?

cs.LG · 2022-11-28 · unverdicted · novelty 6.0

Return-conditional diffusion models for policies outperform offline RL on benchmarks by circumventing dynamic programming and enable constraint or skill composition.

RoboNet: Large-Scale Multi-Robot Learning

cs.RO · 2019-10-24 · conditional · novelty 6.0

RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.

Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

cs.AI · 2025-06-11 · unverdicted · novelty 6.0

V-JEPA 2 pre-trained on massive unlabeled video achieves strong results on motion understanding and action anticipation, SOTA video QA at 8B scale, and enables zero-shot robotic planning on Franka arms using only 62 hours of unlabeled robot video.

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

cs.RO · 2024-03-19 · accept · novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

cs.RO · 2021-09-27 · accept · novelty 6.0

A large multi-task multi-domain robot dataset combined with 50 new demonstrations yields 2x higher success rates on never-before-seen tasks in new domains.

IMWM: Intuition Models Complement World Models for Latent Planning

cs.LG · 2026-06-01 · unverdicted · novelty 5.0

IMWM combines a world model with an intuition model from demonstrations to improve sample-based latent planning success rates over world-model-only baselines on pixel control tasks.

$\tau_0$-WM: A Unified Video-Action World Model for Robotic Manipulation

cs.RO · 2026-05-31 · unverdicted · novelty 5.0

A shared video diffusion backbone jointly predicts future latents and continuous actions while also rolling out candidate actions to predict dense task-progress scores, trained on 27,300 hours of mixed robot and human data.

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

cs.AI · 2026-05-01 · unverdicted · novelty 5.0 · 2 refs

Proposes Hamiltonian World Models as a physically grounded framework encoding observations into latent phase space and evolving them via Hamiltonian dynamics with control and dissipation for embodied prediction and planning.

WestWorld: A Knowledge-Encoded Scalable Trajectory World Model for Diverse Robotic Systems

cs.LG · 2026-03-15 · unverdicted · novelty 5.0

WestWorld introduces a scalable trajectory world model with Sys-MoE routing via system embeddings and structural embeddings for physical knowledge, pretrained on 89 environments to improve zero-shot prediction and real-robot control.

Reasoning and Generalization in RL: A Tool Use Perspective

cs.NE · 2019-07-03 · unverdicted · novelty 5.0

Proposes a tool-use inspired framework with multiple test sets to measure specified types of generalization in RL.

Learning to Cope with Adversarial Attacks

cs.LG · 2019-06-28 · unverdicted · novelty 5.0

MLAH agent in deep RL demonstrates hierarchical coping mechanisms and improved reward maintenance under spaced adversarial attacks, at the expense of stability.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer