When vision overrides language: Evaluating and mitigating counterfactual failures in vlas

When vision overrides language: Evaluating, mitigating counterfactual failures in vlas · 2026 · arXiv 2602.17659

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Event-Grounded Sparse Autoencoders for Vision-Language-Action Policies

cs.RO · 2026-05-17 · conditional · novelty 7.0

Event-grounded SAE analysis in VLA policies produces stronger causal effects on robot behavior than standard methods by anchoring features to clustered end-effector keyframes across simulations and real-robot tests.

LA4VLA: Learning to Act without Seeing via Language-Action Pretraining

cs.RO · 2026-06-25 · unverdicted · novelty 6.0

LA4VLA creates a 33K language-action dataset from existing demos and shows that pretraining on language-action pairs before or alongside vision-language-action training boosts success rates in sim and real robot tasks.

RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

RoboSemanticBench reveals that representative VLA models grasp blocks successfully but select the semantically correct answer at near-random rates, indicating a gap between backbone semantics and action prediction.

Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models

cs.RO · 2026-05-13 · unverdicted · novelty 6.0

VLAs-as-Tools pairs a VLM planner with specialized VLA executors via a new interface and Tool-Aligned Post-Training to raise long-horizon robot success rates on LIBERO-Long and RoboTwin benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

LA4VLA: Learning to Act without Seeing via Language-Action Pretraining cs.RO · 2026-06-25 · unverdicted · none · ref 15
LA4VLA creates a 33K language-action dataset from existing demos and shows that pretraining on language-action pairs before or alongside vision-language-action training boosts success rates in sim and real robot tasks.
RoboSemanticBench: Diagnosing Semantic Grounding in Action Prediction for VLA Models cs.RO · 2026-06-01 · unverdicted · none · ref 11
RoboSemanticBench reveals that representative VLA models grasp blocks successfully but select the semantically correct answer at near-random rates, indicating a gap between backbone semantics and action prediction.
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models cs.RO · 2026-05-13 · unverdicted · none · ref 6
VLAs-as-Tools pairs a VLM planner with specialized VLA executors via a new interface and Tool-Aligned Post-Training to raise long-horizon robot success rates on LIBERO-Long and RoboTwin benchmarks.

When vision overrides language: Evaluating and mitigating counterfactual failures in vlas

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer