Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Alice Karnsund; Andrew James Willmott; Daniel Maund; Danny Birch; Jamie Shotton; Jan H\"unermann; Long Chen; Oleg Sinavski

arxiv: 2310.01957 · v2 · pith:OCZ4BE3Nnew · submitted 2023-10-03 · 💻 cs.RO · cs.AI· cs.CL· cs.CV

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen , Oleg Sinavski , Jan H\"unermann , Alice Karnsund , Andrew James Willmott , Danny Birch , Daniel Maund , Jamie Shotton This is my paper

classification 💻 cs.RO cs.AIcs.CLcs.CV

keywords drivingvectorautonomousintroducelanguagellmsmodalitiesnumeric

0 comments

read the original abstract

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
cs.CV 2025-03 unverdicted novelty 7.0

AlphaDrive uses GRPO-based RL rewards and two-stage SFT+RL training on VLMs to improve autonomous driving planning performance and efficiency while producing emergent multimodal capabilities.
Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs
cs.AI 2026-06 unverdicted novelty 6.0

The paper fine-tunes Qwen3.5-4B as a driving VLA using serialized decision traces from rule-based planners, reporting reduced ADE and miss rate on a simulator benchmark with camera inputs.
LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving
cs.CV 2026-04 unverdicted novelty 6.0

LMGenDrive unifies LLM-based multimodal understanding with generative world models to output both future driving videos and control signals for end-to-end closed-loop autonomous driving.
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
cs.CV 2024-10 conditional novelty 6.0

Senna decouples language-based high-level planning from an LVLM with low-level trajectory prediction from an E2E model, reporting 27% lower planning error and 33% lower collisions after pre-training on DriveX and fine...
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
cs.CV 2024-02 unverdicted novelty 6.0

VADv2 introduces a probabilistic planning model that discretizes the high-dimensional action space into tokens, interacts them with scene tokens to predict action distributions, and reports SOTA closed-loop results on...
Benchmark Data Contamination of Large Language Models: A Survey
cs.CL 2024-06 unverdicted novelty 3.0

A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.