Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2representative citing papers
XR-1 introduces Unified Vision-Motion Codes learned by dual-branch VQ-VAE and applies them in a three-stage training pipeline to outperform prior VLA models on 120+ real-world manipulation tasks across six robot embodiments.
citing papers explorer
-
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
-
XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
XR-1 introduces Unified Vision-Motion Codes learned by dual-branch VQ-VAE and applies them in a three-stage training pipeline to outperform prior VLA models on 120+ real-world manipulation tasks across six robot embodiments.