Seer, a transformer-based PIDM pre-trained on large robotic datasets like DROID, outperforms prior methods on simulation and real-world robotic manipulation benchmarks with gains up to 43%.
Unleashing large-scale video generative pre-training for visual robot manipulation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.RO 2roles
background 1polarities
background 1representative citing papers
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
citing papers explorer
-
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Seer, a transformer-based PIDM pre-trained on large robotic datasets like DROID, outperforms prior methods on simulation and real-world robotic manipulation benchmarks with gains up to 43%.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.