GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving

Abhinav Valada; Fabian Schmidt; Markus Enzweiler

arxiv: 2511.11266 · v4 · pith:DCMQM62Vnew · submitted 2025-11-14 · 💻 cs.CV

GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving

Fabian Schmidt , Markus Enzweiler , Abhinav Valada This is my paper

classification 💻 cs.CV

keywords scenemodelsdrivinggraphrelationalautonomousconditioninggraphpilot

0 comments

read the original abstract

Vision-language models have recently emerged as promising planners for autonomous driving, where success hinges on topology-aware reasoning over spatial structure and dynamic interactions from multimodal input. However, existing models are typically trained without supervision that explicitly encodes these relational dependencies, limiting their ability to infer how agents and other traffic entities influence one another from raw sensor data. In this work, we bridge this gap with a novel model-agnostic method that conditions language-based driving models on structured relational context in the form of traffic scene graphs. We serialize scene graphs at various abstraction levels and formats, and incorporate them into models via structured prompt templates, enabling systematic analysis of when and how relational supervision is most beneficial and computationally efficient. Extensive evaluations on the LangAuto and Bench2Drive benchmarks show that scene graph conditioning yields large and persistent improvements. We observe a substantial performance increase in the Driving Score of our proposed approach versus competitive LMDrive, BEVDriver, and SimLingo baselines. These results indicate that diverse architectures can effectively internalize and ground relational priors through scene graph-conditioned training, even without requiring scene graph input at test-time. Code, fine-tuned models, and our scene graph dataset are publicly available at https://github.com/iis-esslingen/GraphPilot.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Bridging Structure and Language: Graph-Based Visual Reasoning for Autonomous Road Understanding
cs.CV 2026-05 unverdicted novelty 6.0

A graph-grounded Combined Road Substrate framework generates traceable QA pairs from road maps to improve small VLMs on compositional road reasoning tasks.