hub Baseline reference

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang · 2021 · cs.CV · arXiv 2106.11810

Baseline reference. 75% of citing Pith papers use this work as a benchmark or comparison.

59 Pith papers citing it

Baseline 75% of classified citations

open full Pith review browse 59 citing papers arXiv PDF

abstract

In this work, we propose the world's first closed-loop ML-based planning benchmark for autonomous driving. While there is a growing body of ML-based motion planners, the lack of established datasets and metrics has limited the progress in this area. Existing benchmarks for autonomous vehicle motion prediction have focused on short-term motion forecasting, rather than long-term planning. This has led previous works to use open-loop evaluation with L2-based metrics, which are not suitable for fairly evaluating long-term planning. Our benchmark overcomes these limitations by introducing a large-scale driving dataset, lightweight closed-loop simulator, and motion-planning-specific metrics. We provide a high-quality dataset with 1500h of human driving data from 4 cities across the US and Asia with widely varying traffic patterns (Boston, Pittsburgh, Las Vegas and Singapore). We will provide a closed-loop simulation framework with reactive agents and provide a large set of both general and scenario-specific planning metrics. We plan to release the dataset at NeurIPS 2021 and organize benchmark challenges starting in early 2022.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 8 background 3 baseline 1

citation-polarity summary

use dataset 8 background 3 baseline 1

representative citing papers

Generative Lane Topology Reasoning via Autoregressive Model with Geometry Prior

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

TopoGPT pre-trains an autoregressive transformer on serialized lane graphs from 3.3M scenes to learn geometry priors and uses a perception adapter to apply it to BEV features for improved lane graph prediction on OpenLane-V2.

FleetAgent: Teleoperation Assistant for Autonomous Fleets via Vectorized V2N Messages

cs.RO · 2026-06-19 · unverdicted · novelty 7.0

FleetAgent pairs a vector-to-embedding interface (VecFormer) with an MLLM to turn compact V2N messages into structured natural-language teleoperation assistance, cutting uplink payload 625x and improving Lingo-Judge score 16.8% on a new nuScenes-derived dataset.

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.

A global dataset of continuous urban dashcam driving

cs.CV · 2026-04-01 · accept · novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

cs.AI · 2026-03-31 · unverdicted · novelty 7.0

C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

cs.CV · 2026-03-24 · unverdicted · novelty 7.0

KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

cs.CV · 2025-06-09 · unverdicted · novelty 7.0

ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.

DriveVer: Lightweight Trajectory Evaluator as Test-Time Verifier for Autonomous Driving

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

DriveVer is a lightweight dual-head test-time verifier that predicts safety confidence scores and geometric refinement vectors for candidate trajectories, improving base planners on the NAVSIM benchmark.

Long-term Traffic Simulation via Structured Autoregressive Modeling

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

RosettaSim adapts frozen LLMs via structured autoregressive modeling of scene topology and agent states to reach SOTA short- and long-term traffic simulation on WOSAC, paired with RTE evaluation that correlates better with human-like fidelity.

Off the Rails: Hijacking the Scoring Head in Generative End-to-End Driving Planners with Safety-Violating Adversarial Perturbations

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Derail adversarial perturbations hijack the scoring head in generative E2E driving planners, flipping safe to unsafe trajectory selection with 39-80% score drops and up to 50% collision rates.

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

cs.AI · 2026-06-22 · unverdicted · novelty 6.0

The paper fine-tunes Qwen3.5-4B as a driving VLA using serialized decision traces from rule-based planners, reporting reduced ADE and miss rate on a simulator benchmark with camera inputs.

BadDreamer: Transferable Backdoor Attacks against Video World Models for Autonomous Driving

cs.CV · 2026-06-19 · unverdicted · novelty 6.0

Introduces BadDreamer, a backdoor attack that poisons the transition dynamics of video world models so that a trigger causes hallucination of obstacle-free futures, transferring to unsafe action predictions in autonomous driving.

Scaling Self-Play for End-to-End Driving

cs.RO · 2026-06-17 · unverdicted · novelty 6.0

Self-play DAgger training in a batched pixel renderer produces end-to-end driving policies that reach competitive performance on HUGSIM and NAVSIM-v2 after real-world adaptation and improve with more self-play compute.

SceneMiner: Identity-Preserving Multi-Task Fine-Tuning for Unified BEV Scene Mining

cs.CV · 2026-06-09 · unverdicted · novelty 6.0

SceneMiner shows that identity-preserving multi-task fine-tuning removes cross-task interference by zero-initializing new heads and freezing shared-stream parameters, enabling unified BEV scene mining with preserved original heads.

Dash2Sim: Closed-Loop Driving Simulation from in-the-wild Dashcam Videos

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

Dash2Sim recovers metric geo-referenced 4D scenes from in-the-wild monocular dashcam videos to enable the ROADWork4D benchmark, revealing that current closed-loop planners fail on work zone lane changes.

nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

nuReasoning is a new real-world dataset and benchmark extending nuScenes/nuPlan with 20k clips and multi-type reasoning annotations to evaluate and improve reasoning in long-tail autonomous driving.

Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

BeyondDrive augments imitation learning with synthesized safety-critical negative trajectories and a repulsive loss to improve safety in autonomous driving, reporting 89.7 PDMS on NAVSIMv1 and generalization to other models.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics

cs.MA · 2026-05-08 · accept · novelty 6.0

SceneFactory delivers a batched GPU platform for physics-based multi-agent autonomous driving simulation that achieves 127x higher throughput than non-vectorized PhysX while supporting articulated dynamics and road-condition friction.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

citing papers explorer

Showing 3 of 3 citing papers after filters.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving cs.CV · 2026-05-11 · unverdicted · none · ref 13 · 2 links · internal anchor
CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.
DriveFuture: Future-Aware Latent World Models for Autonomous Driving cs.CV · 2026-05-10 · unverdicted · none · ref 52 · internal anchor
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models cs.CV · 2026-04-20 · unverdicted · none · ref 3 · internal anchor
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer