GeoAlign: Beyond Semantics with State-Guided Spatial Alignment in VLA Models

Cewu Lu; Jinming Yao; Keqi Zhu; Kun Wang; Liyun Yan; Shengcheng Fu; Tianyue Zhan; Xiaxi Si; Xinyi Peng; Xueyun Chen

arxiv: 2606.03240 · v1 · pith:JMDWV23Tnew · submitted 2026-06-02 · 💻 cs.RO

GeoAlign: Beyond Semantics with State-Guided Spatial Alignment in VLA Models

Yizhi Chen , Zhanxiang Cao , Xinyi Peng , Yixiao Zheng , Xiaxi Si , Yiheng Li , Liyun Yan , Keqi Zhu

show 9 more authors

Xueyun Chen Shengcheng Fu Tianyue Zhan Yufei Jia Jinming Yao Yan Xie Kun Wang Cewu Lu Yue Gao

This is my paper

classification 💻 cs.RO

keywords geoalignalignmentgeometryspatialmodelspolicystate-guidedtasks

0 comments

read the original abstract

Current Vision--Language--Action (VLA) models often optimize for semantic grounding, whereas executable manipulation requires geometry-aware spatial alignment and dynamic affordance selection. We introduce GeoAlign, a state-guided spatial alignment architecture for VLA policy learning. GeoAlign post-trains an RGB geometry branch with robot-domain RGB-D supervision, yielding RGB-derived Geometry-Enhanced Post-Trained (GEP) features for policy rollout. The robot's proprioceptive state queries the GEP feature grid, producing compact, phase-dependent geometry tokens for action prediction. GeoAlign achieves 99.0% on LIBERO, 85.3% across three SimplerEnv-Fractal tasks, and 78.8% on eight geometry-critical real-world ALOHA tasks, with ablations confirming the value of geometry post-training and proprioceptive-state-guided querying.

This paper has not been read by Pith yet.

GeoAlign: Beyond Semantics with State-Guided Spatial Alignment in VLA Models

discussion (0)