SpatialVLA: Exploring Spatial Representa- tions for Visual-Language-Action Models

· 2025 · DOI 10.15607/rss.2025.xxi.011

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining

cs.RO · 2026-06-15 · unverdicted · novelty 6.0

ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.

$\mu$VLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models

cs.LG · 2026-06-10 · unverdicted · novelty 6.0

Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.

TBD-VLA: Temporal Block Diffusion Vision Language Action Model

cs.CV · 2026-06-05 · unverdicted · novelty 5.0

TBD-VLA partitions action sequences into temporal blocks, performs masked discrete diffusion within blocks, and autoregressive generation across blocks to unify parallel decoding with temporal coherence in discrete VLA models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

TBD-VLA: Temporal Block Diffusion Vision Language Action Model cs.CV · 2026-06-05 · unverdicted · none · ref 40
TBD-VLA partitions action sequences into temporal blocks, performs masked discrete diffusion within blocks, and autoregressive generation across blocks to unify parallel decoding with temporal coherence in discrete VLA models.

SpatialVLA: Exploring Spatial Representa- tions for Visual-Language-Action Models

fields

years

verdicts

representative citing papers

citing papers explorer