Humans and VLMs diverge in VQA responses on driving footage, with human answers consistent across origins and no strong geography modulation observed, likely due to high OOD nature.
arXiv preprint arXiv:2501.10453 (2025) 2
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City
Humans and VLMs diverge in VQA responses on driving footage, with human answers consistent across origins and no strong geography modulation observed, likely due to high OOD nature.