hub Baseline reference

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang · 2021 · cs.CV · arXiv 2106.11810

Baseline reference. 75% of citing Pith papers use this work as a benchmark or comparison.

52 Pith papers citing it

Baseline 75% of classified citations

open full Pith review browse 52 citing papers arXiv PDF

abstract

In this work, we propose the world's first closed-loop ML-based planning benchmark for autonomous driving. While there is a growing body of ML-based motion planners, the lack of established datasets and metrics has limited the progress in this area. Existing benchmarks for autonomous vehicle motion prediction have focused on short-term motion forecasting, rather than long-term planning. This has led previous works to use open-loop evaluation with L2-based metrics, which are not suitable for fairly evaluating long-term planning. Our benchmark overcomes these limitations by introducing a large-scale driving dataset, lightweight closed-loop simulator, and motion-planning-specific metrics. We provide a high-quality dataset with 1500h of human driving data from 4 cities across the US and Asia with widely varying traffic patterns (Boston, Pittsburgh, Las Vegas and Singapore). We will provide a closed-loop simulation framework with reactive agents and provide a large set of both general and scenario-specific planning metrics. We plan to release the dataset at NeurIPS 2021 and organize benchmark challenges starting in early 2022.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 8 background 3 baseline 1

citation-polarity summary

use dataset 8 background 3 baseline 1

representative citing papers

Generative Lane Topology Reasoning via Autoregressive Model with Geometry Prior

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

TopoGPT pre-trains an autoregressive transformer on serialized lane graphs from 3.3M scenes to learn geometry priors and uses a perception adapter to apply it to BEV features for improved lane graph prediction on OpenLane-V2.

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

cs.RO · 2026-05-18 · unverdicted · novelty 7.0

Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.

MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.

A global dataset of continuous urban dashcam driving

cs.CV · 2026-04-01 · accept · novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

cs.AI · 2026-03-31 · unverdicted · novelty 7.0

C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

cs.CV · 2026-03-24 · unverdicted · novelty 7.0

KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

cs.CV · 2025-06-09 · unverdicted · novelty 7.0

ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.

DriveVer: Lightweight Trajectory Evaluator as Test-Time Verifier for Autonomous Driving

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

DriveVer is a lightweight dual-head test-time verifier that predicts safety confidence scores and geometric refinement vectors for candidate trajectories, improving base planners on the NAVSIM benchmark.

Long-term Traffic Simulation via Structured Autoregressive Modeling

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

RosettaSim adapts frozen LLMs via structured autoregressive modeling of scene topology and agent states to reach SOTA short- and long-term traffic simulation on WOSAC, paired with RTE evaluation that correlates better with human-like fidelity.

Off the Rails: Hijacking the Scoring Head in Generative End-to-End Driving Planners with Safety-Violating Adversarial Perturbations

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Derail adversarial perturbations hijack the scoring head in generative E2E driving planners, flipping safe to unsafe trajectory selection with 39-80% score drops and up to 50% collision rates.

SceneMiner: Identity-Preserving Multi-Task Fine-Tuning for Unified BEV Scene Mining

cs.CV · 2026-06-09 · unverdicted · novelty 6.0

SceneMiner shows that identity-preserving multi-task fine-tuning removes cross-task interference by zero-initializing new heads and freezing shared-stream parameters, enabling unified BEV scene mining with preserved original heads.

Dash2Sim: Closed-Loop Driving Simulation from in-the-wild Dashcam Videos

cs.CV · 2026-06-05 · unverdicted · novelty 6.0

Dash2Sim recovers metric geo-referenced 4D scenes from in-the-wild monocular dashcam videos to enable the ROADWork4D benchmark, revealing that current closed-loop planners fail on work zone lane changes.

nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

nuReasoning is a new real-world dataset and benchmark extending nuScenes/nuPlan with 20k clips and multi-type reasoning annotations to evaluate and improve reasoning in long-tail autonomous driving.

Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

BeyondDrive augments imitation learning with synthesized safety-critical negative trajectories and a repulsive loss to improve safety in autonomous driving, reporting 89.7 PDMS on NAVSIMv1 and generalization to other models.

CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving

cs.CV · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.

Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics

cs.MA · 2026-05-08 · accept · novelty 6.0

SceneFactory delivers a batched GPU platform for physics-based multi-agent autonomous driving simulation that achieves 127x higher throughput than non-vectorized PhysX while supporting articulated dynamics and road-condition friction.

Response Time Enhances Alignment with Heterogeneous Preferences

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

cs.RO · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution

cs.RO · 2026-04-28 · unverdicted · novelty 6.0

ProDrive couples a query-centric planner with a BEV world model for end-to-end ego-environment co-evolution, enabling future-outcome assessment that improves safety and efficiency over reactive baselines on NAVSIM v1.

OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.

Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

Mosaic integrates rule-based and learned planners via arbitration graphs to set new state-of-the-art scores on nuPlan and interPlan benchmarks while cutting at-fault collisions by 30%.

BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

cs.RO · 2026-04-12 · unverdicted · novelty 6.0

The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.

citing papers explorer

Showing 50 of 52 citing papers.

Generative Lane Topology Reasoning via Autoregressive Model with Geometry Prior cs.CV · 2026-06-30 · unverdicted · none · ref 2 · internal anchor
TopoGPT pre-trains an autoregressive transformer on serialized lane graphs from 3.3M scenes to learn geometry priors and uses a perception adapter to apply it to BEV features for improved lane graph prediction on OpenLane-V2.
Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations cs.RO · 2026-05-18 · unverdicted · none · ref 67 · internal anchor
Bench2Drive-Robust is a new closed-loop benchmark that evaluates end-to-end autonomous driving models under deployment perturbations from camera failures, ego-state errors, and compute delays, showing substantial performance degradation beyond image-level tests.
MDrive: Benchmarking Closed-Loop Cooperative Driving for End-to-End Multi-agent Systems cs.RO · 2026-05-11 · unverdicted · none · ref 25 · internal anchor
MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.
A global dataset of continuous urban dashcam driving cs.CV · 2026-04-01 · accept · none · ref 43 · internal anchor
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving cs.AI · 2026-03-31 · unverdicted · none · ref 68 · internal anchor
C-TRAIL combines LLM commonsense with a dual-trust mechanism and Dirichlet-weighted Monte Carlo Tree Search to improve trajectory planning accuracy and safety in autonomous driving.
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset cs.CV · 2026-03-24 · unverdicted · none · ref 10 · internal anchor
KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving cs.CV · 2025-06-09 · unverdicted · none · ref 6 · internal anchor
ReCogDrive unifies VLM scene understanding with a diffusion planner reinforced by DiffGRPO to reach state-of-the-art results on NAVSIM and Bench2Drive benchmarks.
DriveVer: Lightweight Trajectory Evaluator as Test-Time Verifier for Autonomous Driving cs.CV · 2026-07-01 · unverdicted · none · ref 29 · internal anchor
DriveVer is a lightweight dual-head test-time verifier that predicts safety confidence scores and geometric refinement vectors for candidate trajectories, improving base planners on the NAVSIM benchmark.
Long-term Traffic Simulation via Structured Autoregressive Modeling cs.AI · 2026-06-30 · unverdicted · none · ref 4 · internal anchor
RosettaSim adapts frozen LLMs via structured autoregressive modeling of scene topology and agent states to reach SOTA short- and long-term traffic simulation on WOSAC, paired with RTE evaluation that correlates better with human-like fidelity.
Off the Rails: Hijacking the Scoring Head in Generative End-to-End Driving Planners with Safety-Violating Adversarial Perturbations cs.RO · 2026-06-29 · unverdicted · none · ref 7 · internal anchor
Derail adversarial perturbations hijack the scoring head in generative E2E driving planners, flipping safe to unsafe trajectory selection with 39-80% score drops and up to 50% collision rates.
SceneMiner: Identity-Preserving Multi-Task Fine-Tuning for Unified BEV Scene Mining cs.CV · 2026-06-09 · unverdicted · none · ref 5 · internal anchor
SceneMiner shows that identity-preserving multi-task fine-tuning removes cross-task interference by zero-initializing new heads and freezing shared-stream parameters, enabling unified BEV scene mining with preserved original heads.
Dash2Sim: Closed-Loop Driving Simulation from in-the-wild Dashcam Videos cs.CV · 2026-06-05 · unverdicted · none · ref 9 · internal anchor
Dash2Sim recovers metric geo-referenced 4D scenes from in-the-wild monocular dashcam videos to enable the ROADWork4D benchmark, revealing that current closed-loop planners fail on work zone lane changes.
nuReasoning: A Reasoning-Centric Dataset and Benchmark for Long-Tail Autonomous Driving cs.CV · 2026-05-29 · unverdicted · none · ref 47 · internal anchor
nuReasoning is a new real-world dataset and benchmark extending nuScenes/nuPlan with 20k clips and multi-type reasoning annotations to evaluate and improve reasoning in long-tail autonomous driving.
Beyond Imitation: Learning Safe End-to-End Autonomous Driving from Hard Negatives cs.RO · 2026-05-19 · unverdicted · none · ref 3 · internal anchor
BeyondDrive augments imitation learning with synthesized safety-critical negative trajectories and a repulsive loss to improve safety in autonomous driving, reporting 89.7 PDMS on NAVSIMv1 and generalization to other models.
CoWorld-VLA: Thinking in a Multi-Expert World Model for Autonomous Driving cs.CV · 2026-05-11 · unverdicted · none · ref 13 · 2 links · internal anchor
CoWorld-VLA extracts semantic, geometric, dynamic, and trajectory expert tokens from multi-source supervision and feeds them into a diffusion-based hierarchical planner, achieving competitive collision avoidance and trajectory accuracy on the NAVSIM v1 benchmark.
Temporal Sampling Frequency Matters: A Capacity-Aware Study of End-to-End Driving Trajectory Prediction cs.CV · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
Smaller end-to-end autonomous driving models achieve optimal 3-second trajectory prediction accuracy at lower or intermediate temporal sampling frequencies, whereas larger VLA-style models perform best at the highest frequencies across Waymo, nuScenes, and PAVE datasets.
DriveFuture: Future-Aware Latent World Models for Autonomous Driving cs.CV · 2026-05-10 · unverdicted · none · ref 52 · internal anchor
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
SceneFactory: GPU-Accelerated Multi-Agent Driving Simulation with Physics-Based Vehicle Dynamics cs.MA · 2026-05-08 · accept · none · ref 12 · internal anchor
SceneFactory delivers a batched GPU platform for physics-based multi-agent autonomous driving simulation that achieves 127x higher throughput than non-vectorized PhysX while supporting articulated dynamics and road-condition friction.
Response Time Enhances Alignment with Heterogeneous Preferences cs.LG · 2026-05-07 · unverdicted · none · ref 128 · internal anchor
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving cs.RO · 2026-05-06 · unverdicted · none · ref 94 · 2 links · internal anchor
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution cs.RO · 2026-04-28 · unverdicted · none · ref 1 · internal anchor
ProDrive couples a query-centric planner with a BEV world model for end-to-end ego-environment co-evolution, enabling future-outcome assessment that improves safety and efficiency over reactive baselines on NAVSIM v1.
OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models cs.CV · 2026-04-20 · unverdicted · none · ref 3 · internal anchor
OneDrive unifies heterogeneous decoding in a single VLM transformer decoder for end-to-end driving, achieving 0.28 L2 error and 0.18 collision rate on nuScenes plus 86.8 PDMS on NAVSIM.
Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners cs.RO · 2026-04-15 · unverdicted · none · ref 28 · internal anchor
Mosaic integrates rule-based and learned planners via arbitration graphs to set new state-of-the-art scores on nuPlan and interPlan benchmarks while cutting at-fault collisions by 30%.
BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving cs.RO · 2026-04-12 · unverdicted · none · ref 12 · internal anchor
The primary OL-CL gap in end-to-end autonomous driving arises from objective mismatch creating structural inability to model reactive behaviors, which a test-time adaptation method can mitigate.
Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles cs.RO · 2026-04-08 · unverdicted · none · ref 37 · internal anchor
E² uses transport-regularized sparse control on learned reverse-time SDEs with topology-driven selection and Topological Anchoring to generate realistic adversarial scenarios, improving collision discovery by 9.01% on nuScenes and up to 21.43% on nuPlan while enabling closed-loop robustness gains.
Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving cs.RO · 2026-02-26 · unverdicted · none · ref 8 · internal anchor
The paper introduces Hyper Diffusion Planner (HDP), a diffusion-based E2E AD framework that identifies insights on loss space, trajectory representation and data scaling, adds RL post-training, and reports 10x performance gains over 200 km of real-world testing across 6 scenarios.
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World cs.CV · 2025-12-29 · unverdicted · none · ref 10 · internal anchor
DriveLaW unifies video world modeling and trajectory planning by injecting video-generator latents into a diffusion planner, achieving SOTA video prediction and a new record on the NAVSIM planning benchmark.
Optimization-Guided Diffusion for Interactive Scene Generation cs.CV · 2025-12-08 · unverdicted · none · ref 2 · internal anchor
OMEGA guides diffusion sampling with per-step constrained optimization and game-theoretic adversarial modeling to generate physically valid and interactive driving scenes, raising valid scene ratios from 32% to 72% and producing 5x more near-collisions.
Using Ensemble Diffusion to Estimate Uncertainty for End-to-End Autonomous Driving cs.RO · 2025-05-31 · unverdicted · none · ref 3 · internal anchor
EnDfuser replaces point-estimate trajectory planning with ensemble diffusion in a single attention-pooling transformer module to model posterior trajectory uncertainty and improve safety in end-to-end autonomous driving.
LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios cs.RO · 2025-05-22 · unverdicted · none · ref 3 · internal anchor
LiloDriver uses LLMs and memory-augmented planning in a four-stage pipeline to outperform rule-based and learning-based methods on both common and rare scenarios in the nuPlan benchmark.
Enhancing End-to-End Autonomous Driving with Latent World Model cs.CV · 2024-06-12 · accept · none · ref 1 · internal anchor
LAW introduces a self-supervised prediction task on latent scene features that boosts end-to-end driving performance on nuScenes, NAVSIM, and CARLA benchmarks.
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation cs.CV · 2024-06-11 · unverdicted · none · ref 3 · internal anchor
Hydra-MDP uses multi-teacher distillation and a multi-head decoder to learn diverse, metric-specific trajectories in an end-to-end autonomous-driving planner, winning the Navsim challenge.
Bridging Local Observation and Global Simulation in Closed-Loop Traffic Modeling cs.RO · 2026-06-30 · unverdicted · none · ref 3 · internal anchor
CRAFT reduces collisions by 31.2% and traffic violations by 33.2% in closed-loop traffic simulation by discovering context-induced failures in what-if rollouts and using a contextual preference evaluator to reweight autoregressive decoding toward globally coherent behaviors.
ReWorld: Learning Better Representations for World Action Models cs.CV · 2026-06-25 · unverdicted · none · ref 5 · internal anchor
ReWorld applies future-predictive, cross-modal, and hard-negative supervision directly to intermediate representations in Video and Action DiTs for WAMs, reporting 23.9% FVD improvement and PDMS rise from 89.1 to 90.4 on nuScenes and NAVSIM.
Zero-Label Driving Scenario Complexity Detection via Joint Embedding Predictive Architecture cs.CV · 2026-06-21 · unverdicted · none · ref 4 · internal anchor
A self-supervised JEPA model on nuPlan data uses temporal prediction error to score driving scenario complexity without labels, assigning higher scores to turns and pedestrian interactions and achieving AP 0.512 in anomaly detection.
Diffusion Forcing Planner: History-Annealed Planning with Time-Dependent Guidance for Autonomous Driving cs.RO · 2026-06-09 · unverdicted · none · ref 6 · internal anchor
Diffusion Forcing Planner applies heterogeneous joint diffusion with time-dependent noise and classifier-free guidance on history segments to generate stable, controllable motion plans for autonomous driving on nuPlan.
DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving cs.CV · 2026-06-07 · unverdicted · none · ref 23 · internal anchor
Creates DriveReward dataset with counterfactual annotations and a 1B VLM reward model that outperforms larger VLMs on driving tasks and matches rule-based rewards in RL and trajectory scoring.
Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning cs.RO · 2026-06-04 · unverdicted · none · ref 6 · internal anchor
Discrete-WAM unifies world modeling and policy learning for autonomous driving by representing observations, states, decisions, and actions as tokens in one space and using hierarchical token editing for planning.
Unified Driving Tokens: Representation- and Geometry-Guided Discrete Tokenizer for Driving World Models and Planning cs.CV · 2026-06-01 · unverdicted · none · ref 2 · internal anchor
A representation- and geometry-guided discrete tokenizer for driving scenes improves token quality for world models and planning on NAVSIM.
What to Test Next: Interpretable Coverage Gap Discovery in Driving VLMs cs.CV · 2026-06-01 · unverdicted · none · ref 43 · internal anchor
SliceScorer combines an exposure-based coverage prior and a neighbor-failure prior into a simple deterministic score for recommending coverage gaps in driving VLMs, embedded in the LLM-orchestrated SliceNav pipeline.
BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices cs.AI · 2026-05-28 · unverdicted · none · ref 5 · internal anchor
BitTP applies weight-only 1.58-bit quantization to LLM trajectory predictors, claiming improved ADE/FDE over BF16 baseline with reduced resource demands on edge devices.
Decision-Making with Lightweight Confidence-Aware Language Model for Autonomous Driving cs.RO · 2026-05-25 · unverdicted · none · ref 38 · internal anchor
Lightweight confidence-aware LM distilled from multi-agent CoT demonstrations achieves SOTA success rates on nuPlan benchmark for AD decision-making with low inference latency.
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving cs.CV · 2026-05-20 · unverdicted · none · ref 1 · 2 links · internal anchor
CoPhy is a new RL framework that distills VLM cognition into BEV encoders, adds an auto-regressive BEV world model for action-conditioned future prediction, and optimizes policies via GRPO with dual physical-cognitive rewards, claiming SOTA on NAVSIM v1/v2.
HEAT: Heterogeneous End-to-End Autonomous Driving via Trajectory-Guided World Models cs.RO · 2026-05-19 · unverdicted · none · ref 9 · internal anchor
HEAT uses a trajectory-driven learning paradigm and a world model predicting future latent features from ego actions to enable a single unified end-to-end autonomous driving model to perform well across heterogeneous domains on nuScenes, NAVSIM, and Waymo benchmarks.
RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning cs.RO · 2026-05-18 · unverdicted · none · ref 5 · internal anchor
RLFTSim uses RL fine-tuning on a pre-trained model with a balanced reward to align traffic simulator rollouts to real data distributions and distill goal-conditioned controllability, reporting SOTA realism on the Waymo Open Motion Dataset.
DriveSafer: End-to-End Autonomous Driving with Safety Guidance cs.RO · 2026-05-16 · unverdicted · none · ref 2 · internal anchor
DriveSafer reduces catastrophic failures (PDMS=0) by 48% and drivable-area compliance failures by over 65% versus DiffusionDrive on the NAVSIM benchmark by combining training-time safety constraints with inference-time guidance.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling cs.RO · 2026-05-13 · unverdicted · none · ref 2 · 2 links · internal anchor
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.
Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic cs.AI · 2026-04-14 · unverdicted · none · ref 25 · internal anchor
This survey synthesizes AI techniques for mixed autonomy traffic simulation and introduces a taxonomy spanning agent-level behavior models, environment-level methods, and cognitive/physics-informed approaches.
CHARMS: A Cognitive Hierarchical Agent for Reasoning and Motion Stylization in Autonomous Driving cs.RO · 2025-04-03 · unverdicted · none · ref 19 · internal anchor
CHARMS applies Level-k game theory and Poisson cognitive hierarchy theory to autonomous driving agents via a two-stage RL-then-SFT pipeline for human-like decisions and realistic scenario generation.
Language-Driven Cost Optimization for Autonomous Driving cs.RO · 2026-06-09 · unverdicted · none · ref 37 · internal anchor
LLM interprets user language to set parameters of a risk-aware MPPI controller, with human-in-the-loop validation for adaptive autonomous driving behavior.

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer