Training models to generate videos of robot actions produces policies that generalize better to new objects and tasks while using far less demonstration data than standard behavior cloning.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.RO 2representative citing papers
A visual encoder pre-trained on diverse human videos with contrastive and language objectives improves simulated robot manipulation success by over 20% versus training from scratch and enables real Franka arm tasks from 20 demonstrations.
citing papers explorer
-
Video Generators are Robot Policies
Training models to generate videos of robot actions produces policies that generalize better to new objects and tasks while using far less demonstration data than standard behavior cloning.
-
R3M: A Universal Visual Representation for Robot Manipulation
A visual encoder pre-trained on diverse human videos with contrastive and language objectives improves simulated robot manipulation success by over 20% versus training from scratch and enables real Franka arm tasks from 20 demonstrations.