The method uses multi-view diffusion priors and action manifold learning to resolve depth ambiguity and improve action prediction in VLA robotic manipulation models, reporting higher success rates than baselines on LIBERO, RoboTwin, and real-robot tasks.
Spatial forcing: Implicit spatial representation alignment for vision-language-action model
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1roles
baseline 1polarities
baseline 1representative citing papers
citing papers explorer
-
Learning Action Manifold with Multi-view Latent Priors for Robotic Manipulation
The method uses multi-view diffusion priors and action manifold learning to resolve depth ambiguity and improve action prediction in VLA robotic manipulation models, reporting higher success rates than baselines on LIBERO, RoboTwin, and real-robot tasks.