hub Baseline reference

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang · 2021 · cs.CV · arXiv 2106.11810

Baseline reference. 75% of citing Pith papers use this work as a benchmark or comparison.

35 Pith papers citing it

Baseline 75% of classified citations

open full Pith review browse 35 citing papers arXiv PDF

abstract

In this work, we propose the world's first closed-loop ML-based planning benchmark for autonomous driving. While there is a growing body of ML-based motion planners, the lack of established datasets and metrics has limited the progress in this area. Existing benchmarks for autonomous vehicle motion prediction have focused on short-term motion forecasting, rather than long-term planning. This has led previous works to use open-loop evaluation with L2-based metrics, which are not suitable for fairly evaluating long-term planning. Our benchmark overcomes these limitations by introducing a large-scale driving dataset, lightweight closed-loop simulator, and motion-planning-specific metrics. We provide a high-quality dataset with 1500h of human driving data from 4 cities across the US and Asia with widely varying traffic patterns (Boston, Pittsburgh, Las Vegas and Singapore). We will provide a closed-loop simulation framework with reactive agents and provide a large set of both general and scenario-specific planning metrics. We plan to release the dataset at NeurIPS 2021 and organize benchmark challenges starting in early 2022.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 8 background 3 baseline 1

citation-polarity summary

use dataset 8 background 3 baseline 1

representative citing papers

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.

A global dataset of continuous urban dashcam driving

cs.CV · 2026-04-01 · accept · novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

cs.AI · 2026-03-31 · unverdicted · novelty 7.0

C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

cs.CV · 2026-03-24 · unverdicted · novelty 7.0

KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

cs.CV · 2025-06-09 · unverdicted · novelty 7.0

ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.

Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

BeyondDrive augments imitation learning with synthesized safety-critical negative trajectories and a repulsive loss to improve safety in autonomous driving, reporting 89.7 PDMS on NAVSIMv1 and generalization to other models.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics

cs.MA · 2026-05-08 · accept · novelty 6.0

SceneFactory delivers a batched GPU platform for physics-based multi-agent autonomous driving simulation that achieves 127x higher throughput than non-vectorized PhysX while supporting articulated dynamics and road-condition friction.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution

cs.RO · 2026-04-28 · unverdicted · novelty 6.0

ProDrive couples a query-centric planner with a BEV world model for end-to-end ego-environment co-evolution, enabling future-outcome assessment that improves safety and efficiency over reactive baselines on NAVSIM v1.

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.

Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

Mosaic integrates rule-based and learned planners via arbitration graphs to set new state-of-the-art scores on nuPlan and interPlan benchmarks while cutting at-fault collisions by 30%.

BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.

Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

E² uses transport-regularized sparse control on learned reverse-time SDEs with topology-driven selection and Topological Anchoring to generate realistic adversarial scenarios, improving collision discovery by 9.01% on nuScenes and up to 21.43% on nuPlan while enabling closed-loop robustness gains.

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

cs.RO · 2026-02-26 · unverdicted · novelty 6.0

The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.

DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

cs.CV · 2025-12-29 · unverdicted · novelty 6.0

DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.

Optimization-Guided Diffusion for Interactive Scene Generation

cs.CV · 2025-12-08 · unverdicted · novelty 6.0

OMEGA guides diffusion sampling with per-step constrained optimization and game-theoretic adversarial modeling to generate physically valid and interactive driving scenes, raising valid scene ratios from 32% to 72% and producing 5x more near-collisions.

Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving

cs.RO · 2025-05-31 · unverdicted · novelty 6.0

EnDfuser replaces point-estimate trajectory planning with ensemble diffusion in a single attention-pooling transformer module to model posterior trajectory uncertainty and improve safety in end-to-end autonomous driving.

LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios

cs.RO · 2025-05-22 · unverdicted · novelty 6.0

LiloDriver uses LLMs and memory-augmented planning in a four-stage pipeline to outperform rule-based and learning-based methods on both common and rare scenarios in the nuPlan benchmark.

Enhancing End-to-End Autonomous Driving with Latent World Model

cs.CV · 2024-06-12 · accept · novelty 6.0

LAW introduces a self-supervised prediction task on latent scene features that boosts end-to-end driving performance on nuScenes, NAVSIM, and CARLA benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

A global dataset of continuous urban dashcam driving cs.CV · 2026-04-01 · accept · none · ref 43 · internal anchor
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving cs.AI · 2026-03-31 · unverdicted · none · ref 68 · internal anchor
C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.
Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction cs.CV · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer