hub Canonical reference

GPT-Driver: Learning to Drive with GPT

Jiageng Mao, Yuxi Qian, Junjie Ye, Hang Zhao, Yue Wang · 2023 · cs.CV · arXiv 2310.01415

Canonical reference. 90% of citing Pith papers cite this work as background.

31 Pith papers citing it

Background 90% of classified citations

open full Pith review browse 31 citing papers arXiv PDF

abstract

We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. Motion planning is a core challenge in autonomous driving, aiming to plan a driving trajectory that is safe and comfortable. Existing motion planners predominantly leverage heuristic methods to forecast driving trajectories, yet these approaches demonstrate insufficient generalization capabilities in the face of novel and unseen driving scenarios. In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs). The fundamental insight of our approach is the reformulation of motion planning as a language modeling problem, a perspective not previously explored. Specifically, we represent the planner inputs and outputs as language tokens, and leverage the LLM to generate driving trajectories through a language description of coordinate positions. Furthermore, we propose a novel prompting-reasoning-finetuning strategy to stimulate the numerical reasoning potential of the LLM. With this strategy, the LLM can describe highly precise trajectory coordinates and also its internal decision-making process in natural language. We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner. Code is now available at https://github.com/PointsCoder/GPT-Driver.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 baseline 1

citation-polarity summary

background 9 baseline 1

representative citing papers

Grounding Driving VLA via Inverse Kinematics

cs.CV · 2026-05-20 · conditional · novelty 7.0

By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.

4DLidarOpen: An Open 4D FMCW Lidar Dataset for Motion-Aware Autonomous Driving

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

4DLidarOpen is a new open dataset providing synchronized 4D FMCW Lidar velocity measurements, multi-Lidar and camera data, and 3D bounding-box annotations with track IDs to support benchmarks on 3D detection, BEV segmentation, flow prediction, and motion forecasting.

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

cs.AI · 2026-03-31 · unverdicted · novelty 7.0

C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

cs.CV · 2025-06-09 · unverdicted · novelty 7.0

ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

cs.CV · 2024-11-27 · unverdicted · novelty 7.0

ADvLM is the first visual adversarial attack framework for VLMs in autonomous driving, using semantic-invariant induction via LLM-generated prompt libraries and scenario-associated attention-based enhancement to achieve SOTA attack effectiveness across benchmarks and real-world tests.

MVPruner: Dynamic Token Pruning for Accelerating Multi-view Vision-Language Models in Autonomous Driving

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

MVPruner is a two-stage dynamic token pruning technique that uses view diversity for initial budget allocation and instruction text for task-aligned selection, delivering 87.3% FLOPs reduction and 4.97x prefilling speedup while retaining 98.5% accuracy on DriveLM.

MAPLE: Latent Multi-Agent Play for End-to-End Autonomous Driving

cs.RO · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

MAPLE proposes latent multi-agent rollouts with supervised fine-tuning followed by reinforcement learning using safety, progress, interaction, and diversity rewards to enable scalable closed-loop training for end-to-end autonomous driving.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

ST-Prune is a training-free spatio-temporal token pruning framework for VLMs in autonomous driving that achieves near-lossless results at 90% token reduction by exploiting motion volatility, temporal recency, and multi-view geometry.

FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving

cs.RO · 2026-04-14 · unverdicted · novelty 6.0

FeaXDrive improves end-to-end autonomous driving by shifting diffusion planning to a trajectory-centric formulation with curvature-constrained training, drivable-area guidance, and GRPO post-training, yielding stronger closed-loop performance and feasibility on NAVSIM.

LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

LMGenDrive unifies LLM-based multimodal understanding with generative world models to output both future driving videos and control signals for end-to-end closed-loop autonomous driving.

ICR-Drive: Instruction Counterfactual Robustness for End-to-End Language-Driven Autonomous Driving

cs.CL · 2026-04-07 · unverdicted · novelty 6.0

ICR-Drive reveals substantial performance drops in end-to-end language-driven driving models when instructions are paraphrased, made ambiguous, noised, or misleading.

DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale

cs.CV · 2026-04-01 · unverdicted · novelty 6.0

DVGT-2 is a streaming vision-geometry-action model that jointly reconstructs dense 3D geometry and plans trajectories online, achieving better reconstruction than prior batch methods while transferring directly to planning benchmarks without fine-tuning.

DRIV-EX: Counterfactual Explanations for Driving LLMs

cs.CL · 2026-02-28 · unverdicted · novelty 6.0

DRIV-EX generates fluent counterfactual scene descriptions by using gradient-optimized embeddings only as a guide for controlled text decoding, producing more reliable explanations than baselines on transcribed highD driving data.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

cs.RO · 2025-10-30 · conditional · novelty 6.0

Alpamayo-R1 introduces a VLA model with a Chain of Causation dataset and multi-stage SFT-plus-RL training that reports 12% better planning accuracy and 35% fewer close encounters versus trajectory-only baselines in driving tasks.

CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving

cs.CV · 2025-08-31 · unverdicted · novelty 6.0

CogDriver-Agent with sparse temporal memory and spatiotemporal distillation on CogDriver-Data achieves 22% higher closed-loop Driving Score on Bench2Drive and 21% lower mean L2 error on nuScenes.

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

cs.CV · 2025-06-16 · unverdicted · novelty 6.0

AutoVLA unifies semantic reasoning and trajectory planning in one autoregressive VLA model for end-to-end autonomous driving by tokenizing trajectories into discrete actions and using GRPO reinforcement fine-tuning to adaptively reduce unnecessary reasoning.

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

cs.CV · 2024-02-20 · unverdicted · novelty 6.0

VADv2 introduces a probabilistic planning model that discretizes the high-dimensional action space into tokens, interacts them with scene tokens to predict action distributions, and reports SOTA closed-loop results on CARLA Town05 and Bench2Drive.

Decision-Making with Lightweight Confidence-Aware Language Model for Autonomous Driving

cs.RO · 2026-05-25 · unverdicted · novelty 5.0

Lightweight confidence-aware LM distilled from multi-agent CoT demonstrations achieves SOTA success rates on nuPlan benchmark for AD decision-making with low inference latency.

Steins;Gate Drive: Semantic Safety Arbitration over Structured Futures for Latency-Decoupled LLM Planning

cs.RO · 2026-05-21 · unverdicted · novelty 5.0

SteinsGateDrive decouples LLM inference latency from vehicle control by pre-selecting alpha, beta, and gamma worldline futures that a runtime validates against safety contracts until abort conditions trigger.

Driving Intents Amplify Planning-Oriented Reinforcement Learning

cs.RO · 2026-05-12 · unverdicted · novelty 5.0 · 2 refs

DIAL expands continuous-action driving policies via intent-conditioned flow matching and multi-intent GRPO, lifting best-of-N preference scores above human demonstrations for the first time on WOD-E2E.

Latency Analysis and Optimization of Alpamayo 1 via Efficient Trajectory Generation

cs.AI · 2026-05-09 · unverdicted · novelty 5.0

Redesigning Alpamayo 1 to single-reasoning and optimizing diffusion action generation cuts inference latency by 69.23% while preserving trajectory diversity and prediction quality.

SwarmDrive: Semantic V2V Coordination for Latency-Constrained Cooperative Autonomous Driving

cs.RO · 2026-04-22 · unverdicted · novelty 5.0

SwarmDrive uses local SLMs on vehicles for event-triggered semantic V2V intent sharing and consensus, improving occluded intersection success from 68.9% to 94.1% and cutting latency to 151.4 ms in a 5-seed simulation.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Grounding Driving VLA via Inverse Kinematics cs.CV · 2026-05-20 · conditional · none · ref 31 · internal anchor
By adding future visual state prediction and a dedicated inverse kinematics diffusion network that uses only visual boundary conditions, a 0.5B driving VLA recovers visual grounding and matches 7-8B models on NAVSIM-v2 and nuScenes.
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail cs.RO · 2025-10-30 · conditional · none · ref 58 · internal anchor
Alpamayo-R1 introduces a VLA model with a Chain of Causation dataset and multi-stage SFT-plus-RL training that reports 12% better planning accuracy and 35% fewer close encounters versus trajectory-only baselines in driving tasks.

GPT-Driver: Learning to Drive with GPT

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer