ACE-Ego-0 is a VLA pretraining framework that turns egocentric human videos into robot-format pseudo-actions via a video-to-action pipeline and trains jointly with robot data under a reliability-aware objective.
SpatialVLA: Exploring Spatial Representa- tions for Visual-Language-Action Models
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Adding recurrent memory tokens to VLA models raises success rates on partially observable manipulation tasks from 0.42 to 0.84 on training and 0.07 to 0.23 on held-out tasks while preserving performance under full observability.
TBD-VLA partitions action sequences into temporal blocks, performs masked discrete diffusion within blocks, and autoregressive generation across blocks to unify parallel decoding with temporal coherence in discrete VLA models.
citing papers explorer
-
TBD-VLA: Temporal Block Diffusion Vision Language Action Model
TBD-VLA partitions action sequences into temporal blocks, performs masked discrete diffusion within blocks, and autoregressive generation across blocks to unify parallel decoding with temporal coherence in discrete VLA models.