Multi-frame, lightweight & efficient vision-language models for question answering in autonomous driving,

· 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study

cs.CV · 2026-04-08 · unverdicted · novelty 6.0 · 2 refs

VENUSS evaluates 25+ VLMs across 2600+ sequential driving scenarios and finds top models reach only 57% accuracy versus 65% for humans, with good static detection but poor performance on vehicle dynamics and temporal relations.

citing papers explorer

Showing 1 of 1 citing paper.

How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study cs.CV · 2026-04-08 · unverdicted · none · ref 7 · 2 links
VENUSS evaluates 25+ VLMs across 2600+ sequential driving scenarios and finds top models reach only 57% accuracy versus 65% for humans, with good static detection but poor performance on vehicle dynamics and temporal relations.

Multi-frame, lightweight & efficient vision-language models for question answering in autonomous driving,

fields

years

verdicts

representative citing papers

citing papers explorer