Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
Unsupervised learning of visual features by contrasting cluster assignments
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.