hub Baseline reference

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang · 2021 · cs.CV · arXiv 2106.11810

Baseline reference. 75% of citing Pith papers use this work as a benchmark or comparison.

34 Pith papers citing it

Baseline 75% of classified citations

open full Pith review browse 34 citing papers arXiv PDF

abstract

In this work, we propose the world's first closed-loop ML-based planning benchmark for autonomous driving. While there is a growing body of ML-based motion planners, the lack of established datasets and metrics has limited the progress in this area. Existing benchmarks for autonomous vehicle motion prediction have focused on short-term motion forecasting, rather than long-term planning. This has led previous works to use open-loop evaluation with L2-based metrics, which are not suitable for fairly evaluating long-term planning. Our benchmark overcomes these limitations by introducing a large-scale driving dataset, lightweight closed-loop simulator, and motion-planning-specific metrics. We provide a high-quality dataset with 1500h of human driving data from 4 cities across the US and Asia with widely varying traffic patterns (Boston, Pittsburgh, Las Vegas and Singapore). We will provide a closed-loop simulation framework with reactive agents and provide a large set of both general and scenario-specific planning metrics. We plan to release the dataset at NeurIPS 2021 and organize benchmark challenges starting in early 2022.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 8 background 3 baseline 1

citation-polarity summary

use dataset 8 background 3 baseline 1

representative citing papers

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.

A global dataset of continuous urban dashcam driving

cs.CV · 2026-04-01 · accept · novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

cs.AI · 2026-03-31 · unverdicted · novelty 7.0

C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

cs.CV · 2026-03-24 · unverdicted · novelty 7.0

KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

cs.CV · 2025-06-09 · unverdicted · novelty 7.0

ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.

Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

BeyondDrive augments imitation learning with synthesized safety-critical negative trajectories and a repulsive loss to improve safety in autonomous driving, reporting 89.7 PDMS on NAVSIMv1 and generalization to other models.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics

cs.MA · 2026-05-08 · accept · novelty 6.0

SceneFactory delivers a batched GPU platform for physics-based multi-agent autonomous driving simulation that achieves 127x higher throughput than non-vectorized PhysX while supporting articulated dynamics and road-condition friction.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution

cs.RO · 2026-04-28 · unverdicted · novelty 6.0

ProDrive couples a query-centric planner with a BEV world model for end-to-end ego-environment co-evolution, enabling future-outcome assessment that improves safety and efficiency over reactive baselines on NAVSIM v1.

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.

Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

Mosaic integrates rule-based and learned planners via arbitration graphs to set new state-of-the-art scores on nuPlan and interPlan benchmarks while cutting at-fault collisions by 30%.

BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.

Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

E² uses transport-regularized sparse control on learned reverse-time SDEs with topology-driven selection and Topological Anchoring to generate realistic adversarial scenarios, improving collision discovery by 9.01% on nuScenes and up to 21.43% on nuPlan while enabling closed-loop robustness gains.

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

cs.RO · 2026-02-26 · unverdicted · novelty 6.0

The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.

DriveLaW:Unifying Planning and Video Generation in a Latent Driving World

cs.CV · 2025-12-29 · unverdicted · novelty 6.0

DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.

Optimization-Guided Diffusion for Interactive Scene Generation

cs.CV · 2025-12-08 · unverdicted · novelty 6.0

OMEGA guides diffusion sampling with per-step constrained optimization and game-theoretic adversarial modeling to generate physically valid and interactive driving scenes, raising valid scene ratios from 32% to 72% and producing 5x more near-collisions.

Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving

cs.RO · 2025-05-31 · unverdicted · novelty 6.0

EnDfuser replaces point-estimate trajectory planning with ensemble diffusion in a single attention-pooling transformer module to model posterior trajectory uncertainty and improve safety in end-to-end autonomous driving.

LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios

cs.RO · 2025-05-22 · unverdicted · novelty 6.0

LiloDriver uses LLMs and memory-augmented planning in a four-stage pipeline to outperform rule-based and learning-based methods on both common and rare scenarios in the nuPlan benchmark.

Enhancing End-to-End Autonomous Driving with Latent World Model

cs.CV · 2024-06-12 · accept · novelty 6.0

LAW introduces a self-supervised prediction task on latent scene features that boosts end-to-end driving performance on nuScenes, NAVSIM, and CARLA benchmarks.

citing papers explorer

Showing 34 of 34 citing papers.

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations cs.RO · 2026-05-18 · unverdicted · none · ref 67 · internal anchor
Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.
MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems cs.RO · 2026-05-11 · unverdicted · none · ref 25 · internal anchor
MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.
A global dataset of continuous urban dashcam driving cs.CV · 2026-04-01 · accept · none · ref 43 · internal anchor
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving cs.AI · 2026-03-31 · unverdicted · none · ref 68 · internal anchor
C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset cs.CV · 2026-03-24 · unverdicted · none · ref 10 · internal anchor
KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving cs.CV · 2025-06-09 · unverdicted · none · ref 6 · internal anchor
ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.
Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives cs.RO · 2026-05-19 · unverdicted · none · ref 3 · internal anchor
BeyondDrive augments imitation learning with synthesized safety-critical negative trajectories and a repulsive loss to improve safety in autonomous driving, reporting 89.7 PDMS on NAVSIMv1 and generalization to other models.
CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving cs.CV · 2026-05-11 · unverdicted · none · ref 13 · 2 links · internal anchor
CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.
Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction cs.CV · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.
DriveFuture: Future-Aware Latent World Models for Autonomous Driving cs.CV · 2026-05-10 · unverdicted · none · ref 52 · internal anchor
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics cs.MA · 2026-05-08 · accept · none · ref 12 · internal anchor
SceneFactory delivers a batched GPU platform for physics-based multi-agent autonomous driving simulation that achieves 127x higher throughput than non-vectorized PhysX while supporting articulated dynamics and road-condition friction.
Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 128 · internal anchor
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving cs.RO · 2026-05-06 · unverdicted · none · ref 94 · 2 links · internal anchor
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution cs.RO · 2026-04-28 · unverdicted · none · ref 1 · internal anchor
ProDrive couples a query-centric planner with a BEV world model for end-to-end ego-environment co-evolution, enabling future-outcome assessment that improves safety and efficiency over reactive baselines on NAVSIM v1.
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models cs.CV · 2026-04-20 · unverdicted · none · ref 3 · internal anchor
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners cs.RO · 2026-04-15 · unverdicted · none · ref 28 · internal anchor
Mosaic integrates rule-based and learned planners via arbitration graphs to set new state-of-the-art scores on nuPlan and interPlan benchmarks while cutting at-fault collisions by 30%.
BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving cs.RO · 2026-04-12 · unverdicted · none · ref 12 · internal anchor
The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.
Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles cs.RO · 2026-04-08 · unverdicted · none · ref 37 · internal anchor
E² uses transport-regularized sparse control on learned reverse-time SDEs with topology-driven selection and Topological Anchoring to generate realistic adversarial scenarios, improving collision discovery by 9.01% on nuScenes and up to 21.43% on nuPlan while enabling closed-loop robustness gains.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving cs.RO · 2026-02-26 · unverdicted · none · ref 8 · internal anchor
The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World cs.CV · 2025-12-29 · unverdicted · none · ref 10 · internal anchor
DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.
Optimization-Guided Diffusion for Interactive Scene Generation cs.CV · 2025-12-08 · unverdicted · none · ref 2 · internal anchor
OMEGA guides diffusion sampling with per-step constrained optimization and game-theoretic adversarial modeling to generate physically valid and interactive driving scenes, raising valid scene ratios from 32% to 72% and producing 5x more near-collisions.
Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving cs.RO · 2025-05-31 · unverdicted · none · ref 3 · internal anchor
EnDfuser replaces point-estimate trajectory planning with ensemble diffusion in a single attention-pooling transformer module to model posterior trajectory uncertainty and improve safety in end-to-end autonomous driving.
LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios cs.RO · 2025-05-22 · unverdicted · none · ref 3 · internal anchor
LiloDriver uses LLMs and memory-augmented planning in a four-stage pipeline to outperform rule-based and learning-based methods on both common and rare scenarios in the nuPlan benchmark.
Enhancing End-to-End Autonomous Driving with Latent World Model cs.CV · 2024-06-12 · accept · none · ref 1 · internal anchor
LAW introduces a self-supervised prediction task on latent scene features that boosts end-to-end driving performance on nuScenes, NAVSIM, and CARLA benchmarks.
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation cs.CV · 2024-06-11 · unverdicted · none · ref 3 · internal anchor
Hydra-MDP uses multi-teacher distillation and a multi-head decoder to learn diverse, metric-specific trajectories in an end-to-end autonomous-driving planner, winning the Navsim challenge.
BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices cs.AI · 2026-05-28 · unverdicted · none · ref 5 · internal anchor
BitTP applies weight-only 1.58-bit quantization to LLM trajectory predictors, claiming improved ADE/FDE over BF16 baseline with reduced resource demands on edge devices.
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving cs.CV · 2026-05-20 · unverdicted · none · ref 1 · 2 links · internal anchor
CoPhy is a new RL framework that distills VLM cognition into BEV encoders, adds an auto-regressive BEV world model for action-conditioned future prediction, and optimizes policies via GRPO with dual physical-cognitive rewards, claiming SOTA on NAVSIM v1/v2.
HEAT: Heterogeneous End-to-End Autonomous Driving via Trajectory-Guided World Models cs.RO · 2026-05-19 · unverdicted · none · ref 9 · internal anchor
HEAT uses a trajectory-driven learning paradigm and a world model predicting future latent features from ego actions to enable a single unified end-to-end autonomous driving model to perform well across heterogeneous domains on nuScenes, NAVSIM, and Waymo benchmarks.
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning cs.RO · 2026-05-18 · unverdicted · none · ref 5 · internal anchor
RLFTSim uses RL fine-tuning on a pre-trained model with a balanced reward to align traffic simulator rollouts to real data distributions and distill goal-conditioned controllability, reporting SOTA realism on the Waymo Open Motion Dataset.
DriveSafer: End-to-End Autonomous Driving with Safety Guidance cs.RO · 2026-05-16 · unverdicted · none · ref 2 · internal anchor
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling cs.RO · 2026-05-13 · unverdicted · none · ref 2 · 2 links · internal anchor
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.
Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic cs.AI · 2026-04-14 · unverdicted · none · ref 25 · internal anchor
This survey synthesizes AI techniques for mixed autonomy traffic simulation and introduces a taxonomy spanning agent-level behavior models, environment-level methods, and cognitive/physics-informed approaches.
CHARMS: A Cognitive Hierarchical Agent for Reasoning and Motion Stylization in Autonomous Driving cs.RO · 2025-04-03 · unverdicted · none · ref 19 · internal anchor
CHARMS applies Level-k game theory and Poisson cognitive hierarchy theory to autonomous driving agents via a two-stage RL-then-SFT pipeline for human-like decisions and realistic scenario generation.
DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving cs.CV · 2025-07-05 · unreviewed · ref 57 · internal anchor

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer