DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving

Ziying Song , Lin Liu , Hongyu Pan , Bencheng Liao , Mingzhe Guo , Lei Yang , Yongchang Zhang , Shaoqing Xu

show 2 more authors

Caiyan Jia Yadan Luo

Authors on Pith no claims yet

classification 💻 cs.CV cs.RO

keywords diversitylearningdiverimitationdrivingend-to-endsingletrajectories

0 comments

read the original abstract

Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Driving risk emerges from the required two-dimensional joint evasive acceleration
cs.RO 2026-04 unverdicted novelty 7.0

Evasive acceleration quantifies driving risk as the minimum 2D constant relative acceleration needed to avoid collision and outperforms time-to-collision on warning timing, discrimination, and information retention ac...
DriveFuture: Future-Aware Latent World Models for Autonomous Driving
cs.CV 2026-05 unverdicted novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
cs.RO 2026-05 unverdicted novelty 5.0

CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and Success Rate 71.81 on Bench2Drive plus PDMS 91.1 on NAVSIM.