arxiv: 2507.04049 · v4 · pith:CZPRYVZNnew · submitted 2025-07-05 · 💻 cs.CV · cs.RO

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving

Ziying Song , Lin Liu , Hongyu Pan , Bencheng Liao , Mingzhe Guo , Lei Yang , Yongchang Zhang , Shaoqing Xu

show 2 more authors

Caiyan Jia Yadan Luo

This is my paper

Pith reviewed 2026-05-19 06:01 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords end-to-end autonomous drivingdiffusion modelsreinforcement learningimitation learningtrajectory generationmode collapsediversity evaluationclosed-loop benchmarks

0 comments p. Extension

Add this Pith Number to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{CZPRYVZN}

Prints a linked pith:CZPRYVZN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Reinforcement learning steers a diffusion model to turn one expert driving trace into multiple safe and varied trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard imitation learning in end-to-end driving copies single expert paths and therefore produces cautious, repetitive behavior that fails to generalize. DIVER instead runs a diffusion process that starts from one ground-truth trajectory and, conditioned on the map and nearby vehicles, produces several candidate paths. Reinforcement learning supplies rewards that push the diffusion steps toward both collision-free routes and greater spread among the options. If the method works, models can handle unseen traffic layouts without requiring new expert data for every variation. The paper also replaces simple L2 distance with a dedicated diversity score to judge whether the generated paths actually differ.

Core claim

The reinforced diffusion-based generation mechanism conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory. Reinforcement learning then guides the diffusion process by applying reward-based supervision that enforces safety and diversity constraints, improving practicality and generalization while addressing the mode collapse that arises when imitation learning relies on single demonstrations.

What carries the argument

The reinforced diffusion-based generation mechanism that expands one expert trajectory into several conditioned candidates and uses RL rewards to enforce safety plus diversity during denoising.

If this is right

End-to-end models can output several distinct responses to the same scene instead of always repeating the expert choice.
Closed-loop performance improves on benchmarks that test generalization because the generated paths include safer alternatives to the single demonstration.
A dedicated diversity metric replaces open-loop L2 scores and better reveals whether multi-mode predictions actually spread out.
The same conditioning on map and agent data can be reused across different driving scenes without collecting new expert traces for each variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could lower the volume of expert driving data needed to train capable systems by turning each existing trace into multiple useful examples.
Similar reward-guided diffusion steps might transfer to other imitation-learning settings such as robotic manipulation where single demonstrations are also limiting.
Explicit safety rewards during generation could let developers test constraint satisfaction in simulation before any real-world deployment.

Load-bearing premise

Reward signals added to the diffusion steps can reliably steer outputs toward safe and varied trajectories without creating invalid paths or destabilizing training.

What would settle it

Training runs that produce a high rate of colliding or off-road trajectories even after reward optimization, or that show no gain on the new diversity metric alongside worse closed-loop performance, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2507.04049 by Bencheng Liao, Caiyan Jia, Hongyu Pan, Lei Yang, Lin Liu, Mingzhe Guo, Shaoqing Xu, Yadan Luo, Yongchang Zhang, Ziying Song.

**Figure 1.** Figure 1: (a) Imitation-based Single-Mode Trajectory Planning [1, 2, 3, 4, 5, 6, 7] predicts deterministic trajectories but lacks action diversity, leading to potential safety risks. (b) Imitation-based Multi-Mode Trajectories Planning [3, 4, 8, 9] fails to address the diversity loss in imitation learning end-to-end autonomous driving, leading to mode collapse. The generated multi-mode trajectories overly depend… view at source ↗

**Figure 2.** Figure 2: Imitation learning-based multi-mode trajectories paradigm. Most IL-based multi-mode E2E-AD methods rely on L1 loss for training and L2 distance for evaluation, which emphasizes matching a single GT trajectory rather than modeling diversity. This misalignment limits the generation of truly diverse behaviors. Even with diffusion-based frameworks [4], such imitation-driven objectives constrain their capacity… view at source ↗

**Figure 3.** Figure 3: The overall architecture of DIVER. As a multi-mode trajectories E2E-AD framework, DIVER first encodes multiview images into feature maps to extract scene representations through a perception module. It then predicts the motion of surrounding agents and performs planning via a conditional diffusion model guided by reinforcement learning to generate diverse multi-intention trajectories. Our approach effecti… view at source ↗

**Figure 4.** Figure 4: The illustration of Policy-Aware Diffusion Generator. By incorporating the predicted trajectory, GT trajectory, and anchor trajectory as inputs, PADG reconstructs diverse multi-mode trajectories from noise through a conditional denoising process, guided by map and agent context. τ ref(m) to extract their spatial-temporal semantic features. The embedding process is defined as: F (m) τ = PE ϕ τ t ˜ (m) … view at source ↗

**Figure 5.** Figure 5: Impact of the Number of Reference GTs on Closed-Loop Performance (Bench2Drive). A value of 0 indicates no [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization results of DIVER compared with DiffusionDrive [ [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DIVER adds RL guidance to diffusion for multi-trajectory driving from single demos, but the integration step remains the main open question.

read the letter

The main point is that this work tries to loosen the single-expert imitation bottleneck in end-to-end driving by letting a diffusion model produce several reference trajectories from one ground-truth path, then using reinforcement learning rewards to push for both safety and diversity. The conditioning on maps and agents is straightforward, and the new diversity metric is a reasonable attempt to move past L2 open-loop scores that ignore mode coverage. Experiments on NAVSIM, Bench2Drive, and nuScenes give a broad test bed that mixes closed- and open-loop settings, which is better than many driving papers that stick to one regime. That combination is the clearest incremental step here. The paper does a fair job laying out why pure imitation often collapses to conservative behavior and why adding controlled diversity could help generalization in cluttered scenes. The reward-based supervision idea is presented cleanly enough that a reader can see the intended direction. The soft spot is the precise way the RL signal is folded into the diffusion reverse process. The abstract and available description stay high-level on the reward formulation, weighting, and any auxiliary losses or guidance tricks. Without those details it is hard to judge whether the method sidesteps the usual problems of reward hacking, unstable training, or trajectories that satisfy the proxy rewards yet violate vehicle kinematics. If the full manuscript contains clear equations, ablations on reward components, and closed-loop failure cases, that would tighten the claim; right now the integration looks like the part that still needs the most checking. This paper is mainly for people already working on imitation-learning limits in autonomous driving or on diffusion models for planning. A reader who wants to see how RL and generative models can be combined for multi-modal outputs will get something concrete to build on. It is worth sending to peer review because the problem is real, the benchmarks are appropriate, and the core mechanism is a legitimate extension even if the reward integration needs more evidence and scrutiny.

Referee Report

2 major / 3 minor

Summary. The paper proposes DIVER, an end-to-end autonomous driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories from single expert demonstrations. It conditions on map elements and surrounding agents to generate multiple reference trajectories, employs RL to enforce safety and diversity constraints during the diffusion process, introduces a novel Diversity metric to better evaluate multi-mode predictions beyond L2 open-loop metrics, and reports improvements on closed-loop NAVSIM and Bench2Drive benchmarks plus open-loop nuScenes.

Significance. If the central claims hold, the work offers a concrete mechanism for alleviating mode collapse and conservatism in imitation-learned driving policies, with potential for improved generalization in complex scenarios. The combination of conditioning on map/agent data with reward-guided diffusion and the proposed diversity metric directly targets evaluation gaps in multi-modal trajectory prediction.

major comments (2)

[Method section on reinforced diffusion] The core integration of RL into the diffusion reverse process (described in the reinforced diffusion-based generation mechanism) is load-bearing for the claim of reliable safety and diversity enforcement. The manuscript must provide the precise equations or algorithm (e.g., reward-weighted sampling, classifier guidance, or auxiliary loss) showing how rewards are injected without introducing instability, reward hacking, or trajectories that violate vehicle dynamics.
[Experiments] Experiments on NAVSIM and Bench2Drive report significant improvements in trajectory diversity and closed-loop performance, but the manuscript should include ablation results isolating the contribution of the RL component versus the diffusion conditioning alone, with quantitative metrics on safety violations and infeasible trajectory rates.

minor comments (3)

[Diversity metric definition] The novel Diversity metric is introduced to address limitations of L2-based open-loop metrics; include its explicit mathematical definition and comparison to existing multi-modal metrics such as minADE or entropy-based measures.
[Model architecture] Clarify the exact conditioning inputs (map elements, surrounding agents) and how they are encoded in the diffusion model architecture.
[Reward formulation] The abstract states that RL enforces both safety and diversity; ensure the reward formulation is detailed enough to allow reproduction, including any weighting between safety and diversity terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The comments highlight important aspects of the reinforced diffusion mechanism and experimental validation. We address each major comment below and have prepared revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Method section on reinforced diffusion] The core integration of RL into the diffusion reverse process (described in the reinforced diffusion-based generation mechanism) is load-bearing for the claim of reliable safety and diversity enforcement. The manuscript must provide the precise equations or algorithm (e.g., reward-weighted sampling, classifier guidance, or auxiliary loss) showing how rewards are injected without introducing instability, reward hacking, or trajectories that violate vehicle dynamics.

Authors: We agree that the precise integration of rewards into the diffusion reverse process requires explicit mathematical detail to support the safety and diversity claims. The original manuscript provided a high-level description of reward-based supervision. In the revised manuscript, we have expanded the Method section with the full set of equations for the RL-guided denoising process. This includes the modified reverse step that incorporates a scalar reward signal via an additive guidance term, the definition of the composite reward (safety via collision and dynamics penalties plus a diversity term based on pairwise trajectory distance), and the algorithm for sampling under the guided distribution. We also include a brief stability analysis and note that vehicle kinematics are enforced by projecting samples onto a feasible set after each denoising step, which prevents dynamics violations. revision: yes
Referee: [Experiments] Experiments on NAVSIM and Bench2Drive report significant improvements in trajectory diversity and closed-loop performance, but the manuscript should include ablation results isolating the contribution of the RL component versus the diffusion conditioning alone, with quantitative metrics on safety violations and infeasible trajectory rates.

Authors: We thank the referee for this suggestion, which helps clarify the source of the observed gains. The original experiments compared the full model against prior methods but did not isolate the RL guidance. In the revised manuscript we have added a new ablation study (Section 4.3) that directly compares (i) diffusion conditioning on map and agents alone versus (ii) the same conditioning plus RL reward guidance. The results quantify the incremental benefit of the RL component, reporting lower rates of safety violations (collisions and off-road events) and infeasible trajectories (measured by kinematic constraint violations) on both NAVSIM and Bench2Drive. These metrics are presented alongside the diversity score to show the trade-offs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper proposes a new end-to-end framework DIVER that combines diffusion-based trajectory generation conditioned on map and agent data with RL-based reward supervision for safety and diversity. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or prior self-citation chain. The generation of multiple trajectories from one ground-truth and the subsequent RL guidance are presented as architectural choices with external benchmarks (NAVSIM, Bench2Drive, nuScenes) for validation. The proposed Diversity metric addresses a stated limitation of L2 metrics but does not rename or tautologically reuse prior results. This matches the expected honest non-finding for a method-proposal paper whose central claims rest on independent mechanisms rather than internal redefinitions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Limited information from abstract; relies on standard assumptions in generative modeling and RL for trajectory planning, with likely free parameters in reward design and conditioning.

free parameters (1)

reward weights for safety and diversity
Weights balancing safety and diversity constraints in RL guidance of diffusion process, typical in such hybrid setups.

axioms (1)

domain assumption Diffusion models conditioned on map and agent states can generate multiple feasible trajectories from a single expert path
Core generative assumption invoked for the multi-reference trajectory mechanism.

pith-pipeline@v0.9.0 · 5762 in / 1171 out tokens · 57788 ms · 2026-05-19T06:01:26.478589+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We treat the diffusion process as a stochastic policy and employ Group Relative Policy Optimization (GPRO) objectives to guide the diffusion process. By optimizing trajectory-level rewards for both diversity and safety...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Driving risk emerges from the required two-dimensional joint evasive acceleration
cs.RO 2026-04 unverdicted novelty 7.0

Evasive acceleration quantifies driving risk as the minimum 2D constant relative acceleration needed to avoid collision and outperforms time-to-collision on warning timing, discrimination, and information retention ac...
DriveFuture: Future-Aware Latent World Models for Autonomous Driving
cs.CV 2026-05 unverdicted novelty 6.0

DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
cs.RO 2026-05 unverdicted novelty 5.0

CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and Success Rate 71.81 on Bench2Drive plus PDMS 91.1 on NAVSIM.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · cited by 3 Pith papers · 8 internal anchors

[1]

Planning-oriented autonomous driving,

Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y. Qiao, and H. Li, “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2023, pp. 17 853– 17 862

work page 2023
[2]

Vad: Vector- ized scene representation for efficient autonomous driv- ing,

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang, “Vad: Vector- ized scene representation for efficient autonomous driv- ing,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8340–8350

work page 2023
[3]

Sparsedrive: End-to-end autonomous driving via sparse scene representation,

W. Sun, X. Lin, Y. Shi, C. Zhang, H. Wu, and S. Zheng, “Sparsedrive: End-to-end autonomous driving via sparse scene representation,” arXiv preprint arXiv:2405.19620 , 2024

work page arXiv 2024
[4]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,

B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y. Zhang, Q. Zhang et al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” arXiv preprint arXiv:2411.15139, 2024

work page arXiv 2024
[5]

Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,

K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 11, pp. 12 878–12 895, 2023

work page 2023
[6]

M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving,

D. Xu, H. Li, Q. Wang, Z. Song, L. Chen, and H. Deng, “M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving,” arXiv preprint arXiv:2403.12552, 2024

work page arXiv 2024
[7]

St-p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,

S. Hu, L. Chen, P . Wu, H. Li, J. Yan, and D. Tao, “St-p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,” in European Conference on Com- puter Vision. Springer, 2022, pp. 533–549

work page 2022
[8]

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Vadv2: End-to-end vec- torized autonomous driving via probabilistic planning,” arXiv preprint arXiv:2402.13243, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y. Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu et al. , “Hydra-mdp: End-to-end mul- timodal planning with multi-target hydra-distillation,” arXiv preprint arXiv:2406.06978, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

End-to-end autonomous driving: Challenges and fron- tiers,

L. Chen, P . Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and fron- tiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[11]

Robustness-aware 3d object detection in autonomous driving: A review and outlook,

Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, and L. Wang, “Robustness-aware 3d object detection in autonomous driving: A review and outlook,” IEEE Trans- actions on Intelligent Transportation Systems, pp. 1–30, 2024

work page 2024
[12]

Dice: Diverse diffusion model with scoring for trajectory prediction,

Y. Choi, R. C. Mercurius, S. M. A. Shabestary, and A. Ra- souli, “Dice: Diverse diffusion model with scoring for trajectory prediction,” in 2024 IEEE Intelligent Vehicles Sym- posium (IV). IEEE, 2024, pp. 3023–3029

work page 2024
[13]

Int2planner: An intention-based multi-modal motion SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14 planner for integrated prediction and planning,

X. Chen, J. Yan, W. Liao, T. He, and P . Peng, “Int2planner: An intention-based multi-modal motion SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14 planner for integrated prediction and planning,” arXiv preprint arXiv:2501.12799, 2025

work page arXiv 2025
[14]

Denoising diffusion prob- abilistic models,

J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion prob- abilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020

work page 2020
[15]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Jour- nal of Robotics Research, p. 02783649241273668, 2023

work page 2023
[16]

Diffusion models: A comprehensive survey of methods and applications,

L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023

work page 2023
[17]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P . Wang, X. Bi et al. , “Deepseek-r1: In- centivizing reasoning capability in llms via reinforcement learning,” arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

B. Jiang, S. Chen, Q. Zhang, W. Liu, and X. Wang, “Al- phadrive: Unleashing the power of vlms in autonomous driving via reinforcement learning and reasoning,” arXiv preprint arXiv:2503.07608, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning,

H. Gao, S. Chen, B. Jiang, B. Liao, Y. Shi, X. Guo, Y. Pu, H. Yin, X. Li, X. Zhang et al., “Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning,” arXiv preprint arXiv:2502.13144, 2025

work page arXiv 2025
[20]

Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,

P . Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y. Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” in Advances in Neural Information Processing Systems , S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 6119–6132. [Online]. Availab...

work page 2022
[21]

End-to-end interpretable neural motion planner,

W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Jun 2019. [Online]. Available: http://dx.doi.org/10.1109/cvpr.2019.00886

work page doi:10.1109/cvpr.2019.00886 2019
[22]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017

work page 2017
[23]

Graphbev: Towards robust bev feature align- ment for multi-modal 3d object detection,

Z. Song, L. Yang, S. Xu, L. Liu, D. Xu, C. Jia, F. Jia, and L. Wang, “Graphbev: Towards robust bev feature align- ment for multi-modal 3d object detection,” arXiv preprint arXiv:2403.11848, 2024

work page arXiv 2024
[24]

Graphalign: Enhancing accurate feature alignment by graph matching for multi-modal 3d object detection,

Z. Song, H. Wei, L. Bai, L. Yang, and C. Jia, “Graphalign: Enhancing accurate feature alignment by graph matching for multi-modal 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3358–3369

work page 2023
[25]

Bevformer: Learning bird’s-eye-view repre- sentation from multi-camera images via spatiotemporal transformers,

Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view repre- sentation from multi-camera images via spatiotemporal transformers,” in European conference on computer vision . Springer, 2022, pp. 1–18

work page 2022
[26]

Trackformer: Multi-object tracking with transform- ers,

T. Meinhardt, A. Kirillov, L. Leal-Taixé, and C. Feichten- hofer, “Trackformer: Multi-object tracking with transform- ers,” Cornell University - arXiv,Cornell University - arXiv, Jan 2021

work page 2021
[27]

Maptr: Structured modeling and learning for online vectorized hd map construction,

B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “Maptr: Structured modeling and learning for online vectorized hd map construction,” arXiv preprint arXiv:2208.14437, 2022

work page arXiv 2022
[28]

Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

work page 2024
[29]

Sparseocc: Rethinking sparse latent representation for vision-based semantic occupancy prediction,

P . Tang, Z. Wang, G. Wang, J. Zheng, X. Ren, B. Feng, and C. Ma, “Sparseocc: Rethinking sparse latent representation for vision-based semantic occupancy prediction,” in Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 035–15 044

work page 2024
[30]

Closing the planning–learning loop with application to autonomous driving,

P . Cai and D. Hsu, “Closing the planning–learning loop with application to autonomous driving,” IEEE Transac- tions on Robotics, vol. 39, no. 2, pp. 998–1011, 2022

work page 2022
[31]

Dualad: Disentangling the dynamic and static world for end-to-end driving,

S. Doll, N. Hanselmann, L. Schneider, R. Schulz, M. Cordts, M. Enzweiler, and H. Lensch, “Dualad: Disentangling the dynamic and static world for end-to-end driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 728–14 737

work page 2024
[32]

Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,

Z. Chen, M. Ye, S. Xu, T. Cao, and Q. Chen, “Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,” in European Conference on Computer Vision. Springer, 2025, pp. 239–256

work page 2025
[33]

Don’t shake the wheel: Momentum-aware planning in end- to-end autonomous driving,

Z. Song, C. Jia, L. Liu, H. Pan, Y. Zhang, J. Wang, X. Zhang, S. Xu, L. Yang, and Y. Luo, “Don’t shake the wheel: Momentum-aware planning in end- to-end autonomous driving,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03125

work page arXiv 2025
[34]

Drivedreamer: Towards real-world-drive world models for autonomous driving,

X. Wang, Z. Zhu, G. Huang, X. Chen, J. Zhu, and J. Lu, “Drivedreamer: Towards real-world-drive world models for autonomous driving,” in European Conference on Com- puter Vision. Springer, 2024, pp. 55–72

work page 2024
[35]

Diffscene: Diffusion-based safety-critical scenario generation for au- tonomous vehicles,

C. Xu, A. Petiushko, D. Zhao, and B. Li, “Diffscene: Diffusion-based safety-critical scenario generation for au- tonomous vehicles,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 8, 2025, pp. 8797–8805

work page 2025
[36]

Diffbev: Con- ditional diffusion model for bird’s eye view perception,

J. Zou, K. Tian, Z. Zhu, Y. Ye, and X. Wang, “Diffbev: Con- ditional diffusion model for bird’s eye view perception,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 7, 2024, pp. 7846–7854

work page 2024
[37]

Motiondiffuser: Controllable multi- agent motion prediction using diffusion,

C. Jiang, A. Cornman, C. Park, B. Sapp, Y. Zhou, D. Anguelov et al. , “Motiondiffuser: Controllable multi- agent motion prediction using diffusion,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9644–9653

work page 2023
[38]

Vdt-auto: End-to-end autonomous driv- ing with vlm-guided diffusion transformers,

Z. Guo, K. Gubernatorov, S. Asfaw, Z. Yagudin, and D. Tsetserukou, “Vdt-auto: End-to-end autonomous driv- ing with vlm-guided diffusion transformers,” arXiv preprint arXiv:2502.20108, 2025

work page arXiv 2025
[39]

Diffad: A unified diffusion modeling approach for au- tonomous driving,

T. Wang, C. Zhang, X. Qu, K. Li, W. Liu, and C. Huang, “Diffad: A unified diffusion modeling approach for au- tonomous driving,” arXiv preprint arXiv:2503.12170, 2025

work page arXiv 2025
[40]

Difsd: Ego-centric fully sparse paradigm with uncertainty denoising and iterative refine- ment for efficient end-to-end autonomous driving,

H. Su, W. Wu, and J. Yan, “Difsd: Ego-centric fully sparse paradigm with uncertainty denoising and iterative refine- ment for efficient end-to-end autonomous driving,” arXiv preprint arXiv:2409.09777, 2024

work page arXiv 2024
[41]

Deep reinforcement learning: A survey,

X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, “Deep reinforcement learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5064–5078, 2022

work page 2022
[42]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctotet al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016

work page 2016
[43]

Mastering the game of go without human knowl- edge,

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowl- edge,” nature, vol. 550, no. 7676, pp. 354–359, 2017

work page 2017
[44]

Highly accurate protein structure prediction with alphafold,

J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko et al. , “Highly accurate protein structure prediction with alphafold,” nature, vol. 596, no. 7873, pp. 583–589, 2021

work page 2021
[45]

Learning to drive from a world on rails,

D. Chen, V . Koltun, and P . Krähenbühl, “Learning to drive from a world on rails,” in Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision , 2021, pp. 15 590– SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15 15 599

work page 2021
[46]

Solving motion planning tasks with a scalable generative model,

Y. Hu, S. Chai, Z. Yang, J. Qian, K. Li, W. Shao, H. Zhang, W. Xu, and Q. Liu, “Solving motion planning tasks with a scalable generative model,” in European Conference on Computer Vision. Springer, 2024, pp. 386–404

work page 2024
[47]

Imita- tion is not enough: Robustifying imitation with reinforce- ment learning for challenging driving scenarios,

Y. Lu, J. Fu, G. Tucker, X. Pan, E. Bronstein, R. Roelofs, B. Sapp, B. White, A. Faust, S. Whiteson et al. , “Imita- tion is not enough: Robustifying imitation with reinforce- ment learning for challenging driving scenarios,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7553–7560

work page 2023
[48]

End-to-end model-free reinforcement learning for urban driving using implicit affordances,

M. Toromanoff, E. Wirbel, and F. Moutarde, “End-to-end model-free reinforcement learning for urban driving using implicit affordances,” in Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , 2020, pp. 7153–7162

work page 2020
[49]

End-to-end urban driving by imitating a reinforcement learning coach,

Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-end urban driving by imitating a reinforcement learning coach,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15 222–15 232

work page 2021
[50]

Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,

X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,” arXiv preprint arXiv:2406.03877, 2024

work page arXiv 2024
[51]

nuscenes: A multimodal dataset for autonomous driv- ing,

H. Caesar, V . Bankiti, A. H. Lang, S. Vora, V . E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driv- ing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

work page 2020
[52]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P . Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P . Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. Li, Y. Wu et al. , “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[54]

CARLA: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 1–16. [Online]. Available: https://proceedings.mlr.press/...

work page 2017
[55]

Navsim: Data-driven non-reactive autonomous ve- hicle simulation and benchmarking,

D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone et al., “Navsim: Data-driven non-reactive autonomous ve- hicle simulation and benchmarking,” Advances in Neural Information Processing Systems , vol. 37, pp. 28 706–28 719, 2024

work page 2024
[56]

Openscene: The largest up-to-date 3d occupancy prediction benchmark in autonomous driving,

O. Contributors, “Openscene: The largest up-to-date 3d occupancy prediction benchmark in autonomous driving,” 2023

work page 2023
[57]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nu- plan: A closed-loop ml-based planning benchmark for au- tonomous vehicles,” arXiv preprint arXiv:2106.11810, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[58]

Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving,

Z. Song, C. Jia, L. Liu, H. Pan, Y. Zhang, J. Wang, X. Zhang, S. Xu, L. Yang, and Y. Luo, “Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving,” 2025

work page 2025
[59]

Challenger: Afford- able adversarial driving video generation,

Z. Xu, B. Li, H.-a. Gao, M. Gao, Y. Chen, M. Liu, C. Yan, H. Zhao, S. Feng, and H. Zhao, “Challenger: Afford- able adversarial driving video generation,” arXiv preprint arXiv:2505.15880, 2025

work page arXiv 2025
[60]

Benchmarking robustness of 3d object detection to common corruptions,

Y. Dong, C. Kang, J. Zhang, Z. Zhu, Y. Wang, X. Yang, H. Su, X. Wei, and J. Zhu, “Benchmarking robustness of 3d object detection to common corruptions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 1022–1032

work page 2023
[61]

Think twice before driving: Towards scalable decoders for end-to-end autonomous driving,

X. Jia, P . Wu, L. Chen, J. Xie, C. He, J. Yan, and H. Li, “Think twice before driving: Towards scalable decoders for end-to-end autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), June 2023, pp. 21 983–21 994

work page 2023
[62]

Driveadapter: Breaking the coupling barrier of percep- tion and planning in end-to-end autonomous driving,

X. Jia, Y. Gao, L. Chen, J. Yan, P . L. Liu, and H. Li, “Driveadapter: Breaking the coupling barrier of percep- tion and planning in end-to-end autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7953–7963

work page 2023
[63]

Drivetransformer: Unified transformer for scalable end-to-end autonomous driving,

X. Jia, J. You, Z. Zhang, and J. Yan, “Drivetransformer: Unified transformer for scalable end-to-end autonomous driving,” arXiv preprint arXiv:2503.07656, 2025

work page arXiv 2025
[64]

End- to-end driving with online trajectory evaluation via bev world model,

Y. Li, Y. Wang, Y. Liu, J. He, L. Fan, and Z. Zhang, “End- to-end driving with online trajectory evaluation via bev world model,” arXiv preprint arXiv:2504.01941, 2025

work page arXiv 2025
[65]

Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

J.-T. Zhai, Z. Feng, J. Du, Y. Mao, J.-J. Liu, Z. Tan, Y. Zhang, X. Ye, and J. Wang, “Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes,” arXiv preprint arXiv:2305.10430, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[66]

Genad: Gen- erative end-to-end autonomous driving,

W. Zheng, R. Song, X. Guo, and L. Chen, “Genad: Gen- erative end-to-end autonomous driving,” arXiv preprint arXiv:2402.11502, 2024

work page arXiv 2024
[67]

Para-drive: Parallelized architecture for real-time au- tonomous driving,

X. Weng, B. Ivanovic, Y. Wang, Y. Wang, and M. Pavone, “Para-drive: Parallelized architecture for real-time au- tonomous driving,” in Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458

work page 2024
[68]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778. Ziying Song was born in Xingtai, Hebei Province, China in 1997. He received the B.S. degree from Hebei Normal University of Science and Technology (China) in 2019. He rec...

work page 2016