SpatialVLA: Exploring Spatial Representa- tions for Visual-Language-Action Models

Delin Qu, Haoming Song, Qizhi Chen, Yuanqi Yao, Xinyi Ye, Jiayuan Gu, Zhigang Wang, Yan Ding, Bin Zhao, Dong Wang, Xuelong Li · 2025 · DOI 10.15607/rss.2025.xxi.011

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

cs.RO · 2026-06-15 · unverdicted · novelty 6.0

ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.

$\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.

citing papers explorer

Showing 2 of 2 citing papers.

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining cs.RO · 2026-06-15 · unverdicted · none · ref 23
ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.
$\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models cs.LG · 2026-06-10 · unverdicted · none · ref 53
Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.

SpatialVLA: Exploring Spatial Representa- tions for Visual-Language-Action Models

fields

years

verdicts

representative citing papers

citing papers explorer