Cast: Counterfactual labels improve instruction following in vision-language- action models

Catherine Glossop, William Chen, Arjun Bhorkar, Dhruv Shah, Sergey Levine · 2025 · arXiv 2508.13446

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

NORM-Nav: Zero-Shot Mobile Robot Navigation with Natural Language Behavioral Constraints

cs.RO · 2026-05-16 · unverdicted · novelty 6.0

NORM-Nav is a zero-shot framework that parses natural language behavioral constraints with an LLM, grounds them via vision-LiDAR, and encodes them as multi-layer costmaps for grid-based robot navigation.

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

cs.RO · 2026-02-13 · unverdicted · novelty 6.0

Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery

cs.RO · 2026-05-02

citing papers explorer

Showing 4 of 4 citing papers.

NORM-Nav: Zero-Shot Mobile Robot Navigation with Natural Language Behavioral Constraints cs.RO · 2026-05-16 · unverdicted · none · ref 22
NORM-Nav is a zero-shot framework that parses natural language behavioral constraints with an LLM, grounds them via vision-LiDAR, and encodes them as multi-layer costmaps for grid-based robot navigation.
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control cs.RO · 2026-02-13 · unverdicted · none · ref 19
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning cs.RO · 2026-02-09 · unverdicted · none · ref 91
R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.
Sentinel-VLA: A Metacognitive VLA Model with Active Status Monitoring for Dynamic Reasoning and Error Recovery cs.RO · 2026-05-02 · unreviewed · ref 9

Cast: Counterfactual labels improve instruction following in vision-language- action models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer