Evasive acceleration quantifies driving risk as the minimum 2D constant relative acceleration needed to avoid collision and outperforms time-to-collision on warning timing, discrimination, and information retention across crash datasets.
DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving
9 Pith papers cite this work. Polarity classification is still indexing.
abstract
Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
citation-role summary
citation-polarity summary
years
2026 9roles
baseline 2polarities
baseline 2representative citing papers
LWDrive uses future-frame supervision on VLMs to create world-model features that a multi-layer Foresight Cascade Planner refines into final trajectories, reporting 92.0 on NAVSIM and 89.6 on NAVSIM-v2.
TOAD applies test-time Cross-Entropy Method optimization to refine trajectories using the planner's scorer as a reward function, improving end-to-end autonomous driving performance without retraining.
D³-MoE disentangles style and physical axes with diffusion and self-supervised MoE experts to produce style-controllable trajectories, reporting SOTA 88.2 PDMS on NAVSIM.
CLOVER is a closed-loop generator-scorer framework that expands proposal coverage with pseudo-expert trajectories and performs conservative self-distillation to achieve state-of-the-art planning scores on NAVSIM and nuScenes.
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
R²LPL converts recoverable policy mistakes identified in closed-loop rollouts into corrective supervised targets for lifelong policy improvement, reaching SOTA on nuPlan benchmarks with few cycles.
Discrete-WAM unifies world modeling and policy learning for autonomous driving by representing observations, states, decisions, and actions as tokens in one space and using hierarchical token editing for planning.
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.
citing papers explorer
-
DriveFuture: Future-Aware Latent World Models for Autonomous Driving
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
-
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and PDMS 91.1 on Bench2Drive and NAVSIM.