Visual trace prompting improves spatial-temporal awareness in VLA models, delivering 10% gains on SimplerEnv and 3.5x on real-robot tasks.
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ReFineVLA adds teacher-generated reasoning steps to VLA training and reports state-of-the-art success rates on SimplerEnv WidowX and Google Robot benchmarks.
S2P learns separate location and insertion primitives simultaneously via visual RL for peg-in-hole tasks, improving sample efficiency and success rates across polygon benchmarks in simulation and real-world tests.
A multimodal RGB-depth fusion backbone with vision transformer, masked-token contrastive learning, and curriculum domain randomization outperforms baselines in simulation and enables zero-shot real-world robot manipulation.
citing papers explorer
-
ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning
ReFineVLA adds teacher-generated reasoning steps to VLA training and reports state-of-the-art success rates on SimplerEnv WidowX and Google Robot benchmarks.