org/abs/2506.09930

· 2025 · arXiv 2506.09930

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

EmbodimentSemantic: A Spatial Scene-Graph Dataset and Benchmark for Vision-Language Models on Embodied Manipulation Trajectories

cs.RO · 2026-06-06 · unverdicted · novelty 6.0

EmbodimentSemantic is a spatial scene-graph dataset and benchmark for evaluating relational grounding in vision-language models on embodied manipulation trajectories.

AxisGuide: Grounding Robot Action Coordinate System in RGB Observations for Robust Visuomotor Manipulation

cs.RO · 2026-06-04 · unverdicted · novelty 6.0

AxisGuide augments RGB images with rendered robot base-frame axis cues to improve generalization of visuomotor manipulation policies under distribution shifts.

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

RoboSemanticBench reveals that representative VLA models grasp blocks successfully but select the semantically correct answer at near-random rates, indicating a gap between backbone semantics and action prediction.

Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

A video transfer pipeline augments simulated VLA data into realistic videos while preserving actions, yielding consistent performance gains on robot benchmarks such as 8% on Robotwin 2.0.

Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

cs.RO · 2026-01-05

LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

cs.CV · 2025-10-04

citing papers explorer

Showing 2 of 2 citing papers after filters.

Seeing Realism from Simulation: Efficient Video Transfer for Vision-Language-Action Data Augmentation cs.CV · 2026-05-04 · unverdicted · none · ref 10
A video transfer pipeline augments simulated VLA data into realistic videos while preserving actions, yielding consistent performance gains on robot benchmarks such as 8% on Robotwin 2.0.
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization cs.CV · 2025-10-04 · unreviewed · ref 6

org/abs/2506.09930

fields

years

verdicts

representative citing papers

citing papers explorer