VLM-AutoDrive adapts pretrained VLMs via metadata captions, LLM descriptions, VQA, and CoT supervision, lifting collision F1 from 0.00 to 0.69 and accuracy from 35.35% to 77.27% on Nexar dashcam videos.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events
VLM-AutoDrive adapts pretrained VLMs via metadata captions, LLM descriptions, VQA, and CoT supervision, lifting collision F1 from 0.00 to 0.69 and accuracy from 35.35% to 77.27% on Nexar dashcam videos.