AxisGuide augments RGB images with rendered robot base-frame axis cues to improve generalization of visuomotor manipulation policies under distribution shifts.
org/abs/2506.09930
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
RoboSemanticBench reveals that representative VLA models grasp blocks successfully but select the semantically correct answer at near-random rates, indicating a gap between backbone semantics and action prediction.
A video transfer pipeline augments simulated VLA data into realistic videos while preserving actions, yielding consistent performance gains on robot benchmarks such as 8% on Robotwin 2.0.
citing papers explorer
-
AxisGuide: Grounding Robot Action Coordinate System in RGB Observations for Robust Visuomotor Manipulation
AxisGuide augments RGB images with rendered robot base-frame axis cues to improve generalization of visuomotor manipulation policies under distribution shifts.
-
RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models
RoboSemanticBench reveals that representative VLA models grasp blocks successfully but select the semantically correct answer at near-random rates, indicating a gap between backbone semantics and action prediction.
- Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot