VENUSS evaluates 25+ VLMs across 2600+ sequential driving scenarios and finds top models reach only 57% accuracy versus 65% for humans, with good static detection but poor performance on vehicle dynamics and temporal relations.
Multi-frame, lightweight & efficient vision-language models for question answering in autonomous driving,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study
VENUSS evaluates 25+ VLMs across 2600+ sequential driving scenarios and finds top models reach only 57% accuracy versus 65% for humans, with good static detection but poor performance on vehicle dynamics and temporal relations.