RoboSemanticBench reveals that representative VLA models grasp blocks successfully but select the semantically correct answer at near-random rates, indicating a gap between backbone semantics and action prediction.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Sharpness-aware minimization during VLA finetuning preserves instruction following and yields over 60% gains across simulation and real-world tasks.
citing papers explorer
-
RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models
RoboSemanticBench reveals that representative VLA models grasp blocks successfully but select the semantically correct answer at near-random rates, indicating a gap between backbone semantics and action prediction.
-
Flatness Preserves Instruction Following in Vision-Language-Action Models
Sharpness-aware minimization during VLA finetuning preserves instruction following and yields over 60% gains across simulation and real-world tasks.