SpatialVLA adds 3D-aware position encoding and adaptive discretized action grids to visual-language-action models, enabling strong zero-shot performance and fine-tuning on new robot setups after pre-training on 1.1 million real-world episodes.
Roboagent: Generalization and efficiency in robot manip- ulation via semantic augmentations and action chunking
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.RO 1years
2025 1verdicts
UNVERDICTED 1roles
baseline 1polarities
baseline 1representative citing papers
citing papers explorer
-
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
SpatialVLA adds 3D-aware position encoding and adaptive discretized action grids to visual-language-action models, enabling strong zero-shot performance and fine-tuning on new robot setups after pre-training on 1.1 million real-world episodes.