pith. the verified trust layer for science. sign in

arxiv: 2507.04049 · v4 · pith:CZPRYVZNnew · submitted 2025-07-05 · 💻 cs.CV · cs.RO

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving

Pith reviewed 2026-05-19 06:01 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords end-to-end autonomous drivingdiffusion modelsreinforcement learningimitation learningtrajectory generationmode collapsediversity evaluationclosed-loop benchmarks
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{CZPRYVZN}

Prints a linked pith:CZPRYVZN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Reinforcement learning steers a diffusion model to turn one expert driving trace into multiple safe and varied trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard imitation learning in end-to-end driving copies single expert paths and therefore produces cautious, repetitive behavior that fails to generalize. DIVER instead runs a diffusion process that starts from one ground-truth trajectory and, conditioned on the map and nearby vehicles, produces several candidate paths. Reinforcement learning supplies rewards that push the diffusion steps toward both collision-free routes and greater spread among the options. If the method works, models can handle unseen traffic layouts without requiring new expert data for every variation. The paper also replaces simple L2 distance with a dedicated diversity score to judge whether the generated paths actually differ.

Core claim

The reinforced diffusion-based generation mechanism conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory. Reinforcement learning then guides the diffusion process by applying reward-based supervision that enforces safety and diversity constraints, improving practicality and generalization while addressing the mode collapse that arises when imitation learning relies on single demonstrations.

What carries the argument

The reinforced diffusion-based generation mechanism that expands one expert trajectory into several conditioned candidates and uses RL rewards to enforce safety plus diversity during denoising.

If this is right

  • End-to-end models can output several distinct responses to the same scene instead of always repeating the expert choice.
  • Closed-loop performance improves on benchmarks that test generalization because the generated paths include safer alternatives to the single demonstration.
  • A dedicated diversity metric replaces open-loop L2 scores and better reveals whether multi-mode predictions actually spread out.
  • The same conditioning on map and agent data can be reused across different driving scenes without collecting new expert traces for each variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could lower the volume of expert driving data needed to train capable systems by turning each existing trace into multiple useful examples.
  • Similar reward-guided diffusion steps might transfer to other imitation-learning settings such as robotic manipulation where single demonstrations are also limiting.
  • Explicit safety rewards during generation could let developers test constraint satisfaction in simulation before any real-world deployment.

Load-bearing premise

Reward signals added to the diffusion steps can reliably steer outputs toward safe and varied trajectories without creating invalid paths or destabilizing training.

What would settle it

Training runs that produce a high rate of colliding or off-road trajectories even after reward optimization, or that show no gain on the new diversity metric alongside worse closed-loop performance, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2507.04049 by Bencheng Liao, Caiyan Jia, Hongyu Pan, Lei Yang, Lin Liu, Mingzhe Guo, Shaoqing Xu, Yadan Luo, Yongchang Zhang, Ziying Song.

Figure 1
Figure 1. Figure 1: (a) Imitation-based Single-Mode Trajectory Plan￾ning [1, 2, 3, 4, 5, 6, 7] predicts deterministic trajecto￾ries but lacks action diversity, leading to potential safety risks. (b) Imitation-based Multi-Mode Trajectories Plan￾ning [3, 4, 8, 9] fails to address the diversity loss in imitation learning end-to-end autonomous driving, leading to mode collapse. The generated multi-mode trajectories overly de￾pend… view at source ↗
Figure 2
Figure 2. Figure 2: Imitation learning-based multi-mode trajectories paradigm. Most IL-based multi-mode E2E-AD methods rely on L1 loss for training and L2 distance for evaluation, which emphasizes matching a single GT trajectory rather than modeling diversity. This misalignment limits the generation of truly diverse behaviors. Even with diffusion-based frame￾works [4], such imitation-driven objectives constrain their capacity… view at source ↗
Figure 3
Figure 3. Figure 3: The overall architecture of DIVER. As a multi-mode trajectories E2E-AD framework, DIVER first encodes multi￾view images into feature maps to extract scene representations through a perception module. It then predicts the motion of surrounding agents and performs planning via a conditional diffusion model guided by reinforcement learning to generate diverse multi-intention trajectories. Our approach effecti… view at source ↗
Figure 4
Figure 4. Figure 4: The illustration of Policy-Aware Diffusion Genera￾tor. By incorporating the predicted trajectory, GT trajectory, and anchor trajectory as inputs, PADG reconstructs diverse multi-mode trajectories from noise through a conditional denoising process, guided by map and agent context. τ ref(m) to extract their spatial-temporal semantic features. The embedding process is defined as: F (m) τ = PE  ϕ  τ t ˜ (m) … view at source ↗
Figure 5
Figure 5. Figure 5: Impact of the Number of Reference GTs on Closed-Loop Performance (Bench2Drive). A value of 0 indicates no [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization results of DIVER compared with DiffusionDrive [ [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes DIVER, an end-to-end autonomous driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories from single expert demonstrations. It conditions on map elements and surrounding agents to generate multiple reference trajectories, employs RL to enforce safety and diversity constraints during the diffusion process, introduces a novel Diversity metric to better evaluate multi-mode predictions beyond L2 open-loop metrics, and reports improvements on closed-loop NAVSIM and Bench2Drive benchmarks plus open-loop nuScenes.

Significance. If the central claims hold, the work offers a concrete mechanism for alleviating mode collapse and conservatism in imitation-learned driving policies, with potential for improved generalization in complex scenarios. The combination of conditioning on map/agent data with reward-guided diffusion and the proposed diversity metric directly targets evaluation gaps in multi-modal trajectory prediction.

major comments (2)
  1. [Method section on reinforced diffusion] The core integration of RL into the diffusion reverse process (described in the reinforced diffusion-based generation mechanism) is load-bearing for the claim of reliable safety and diversity enforcement. The manuscript must provide the precise equations or algorithm (e.g., reward-weighted sampling, classifier guidance, or auxiliary loss) showing how rewards are injected without introducing instability, reward hacking, or trajectories that violate vehicle dynamics.
  2. [Experiments] Experiments on NAVSIM and Bench2Drive report significant improvements in trajectory diversity and closed-loop performance, but the manuscript should include ablation results isolating the contribution of the RL component versus the diffusion conditioning alone, with quantitative metrics on safety violations and infeasible trajectory rates.
minor comments (3)
  1. [Diversity metric definition] The novel Diversity metric is introduced to address limitations of L2-based open-loop metrics; include its explicit mathematical definition and comparison to existing multi-modal metrics such as minADE or entropy-based measures.
  2. [Model architecture] Clarify the exact conditioning inputs (map elements, surrounding agents) and how they are encoded in the diffusion model architecture.
  3. [Reward formulation] The abstract states that RL enforces both safety and diversity; ensure the reward formulation is detailed enough to allow reproduction, including any weighting between safety and diversity terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The comments highlight important aspects of the reinforced diffusion mechanism and experimental validation. We address each major comment below and have prepared revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method section on reinforced diffusion] The core integration of RL into the diffusion reverse process (described in the reinforced diffusion-based generation mechanism) is load-bearing for the claim of reliable safety and diversity enforcement. The manuscript must provide the precise equations or algorithm (e.g., reward-weighted sampling, classifier guidance, or auxiliary loss) showing how rewards are injected without introducing instability, reward hacking, or trajectories that violate vehicle dynamics.

    Authors: We agree that the precise integration of rewards into the diffusion reverse process requires explicit mathematical detail to support the safety and diversity claims. The original manuscript provided a high-level description of reward-based supervision. In the revised manuscript, we have expanded the Method section with the full set of equations for the RL-guided denoising process. This includes the modified reverse step that incorporates a scalar reward signal via an additive guidance term, the definition of the composite reward (safety via collision and dynamics penalties plus a diversity term based on pairwise trajectory distance), and the algorithm for sampling under the guided distribution. We also include a brief stability analysis and note that vehicle kinematics are enforced by projecting samples onto a feasible set after each denoising step, which prevents dynamics violations. revision: yes

  2. Referee: [Experiments] Experiments on NAVSIM and Bench2Drive report significant improvements in trajectory diversity and closed-loop performance, but the manuscript should include ablation results isolating the contribution of the RL component versus the diffusion conditioning alone, with quantitative metrics on safety violations and infeasible trajectory rates.

    Authors: We thank the referee for this suggestion, which helps clarify the source of the observed gains. The original experiments compared the full model against prior methods but did not isolate the RL guidance. In the revised manuscript we have added a new ablation study (Section 4.3) that directly compares (i) diffusion conditioning on map and agents alone versus (ii) the same conditioning plus RL reward guidance. The results quantify the incremental benefit of the RL component, reporting lower rates of safety violations (collisions and off-road events) and infeasible trajectories (measured by kinematic constraint violations) on both NAVSIM and Bench2Drive. These metrics are presented alongside the diversity score to show the trade-offs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper proposes a new end-to-end framework DIVER that combines diffusion-based trajectory generation conditioned on map and agent data with RL-based reward supervision for safety and diversity. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or prior self-citation chain. The generation of multiple trajectories from one ground-truth and the subsequent RL guidance are presented as architectural choices with external benchmarks (NAVSIM, Bench2Drive, nuScenes) for validation. The proposed Diversity metric addresses a stated limitation of L2 metrics but does not rename or tautologically reuse prior results. This matches the expected honest non-finding for a method-proposal paper whose central claims rest on independent mechanisms rather than internal redefinitions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Limited information from abstract; relies on standard assumptions in generative modeling and RL for trajectory planning, with likely free parameters in reward design and conditioning.

free parameters (1)
  • reward weights for safety and diversity
    Weights balancing safety and diversity constraints in RL guidance of diffusion process, typical in such hybrid setups.
axioms (1)
  • domain assumption Diffusion models conditioned on map and agent states can generate multiple feasible trajectories from a single expert path
    Core generative assumption invoked for the multi-reference trajectory mechanism.

pith-pipeline@v0.9.0 · 5762 in / 1171 out tokens · 57788 ms · 2026-05-19T06:01:26.478589+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Driving risk emerges from the required two-dimensional joint evasive acceleration

    cs.RO 2026-04 unverdicted novelty 7.0

    Evasive acceleration quantifies driving risk as the minimum 2D constant relative acceleration needed to avoid collision and outperforms time-to-collision on warning timing, discrimination, and information retention ac...

  2. DriveFuture: Future-Aware Latent World Models for Autonomous Driving

    cs.CV 2026-05 unverdicted novelty 6.0

    DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.

  3. Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling

    cs.RO 2026-05 unverdicted novelty 5.0

    CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and Success Rate 71.81 on Bench2Drive plus PDMS 91.1 on NAVSIM.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · cited by 3 Pith papers · 8 internal anchors

  1. [1]

    Planning-oriented autonomous driving,

    Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y. Qiao, and H. Li, “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2023, pp. 17 853– 17 862

  2. [2]

    Vad: Vector- ized scene representation for efficient autonomous driv- ing,

    B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang, “Vad: Vector- ized scene representation for efficient autonomous driv- ing,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8340–8350

  3. [3]

    Sparsedrive: End-to-end autonomous driving via sparse scene representation,

    W. Sun, X. Lin, Y. Shi, C. Zhang, H. Wu, and S. Zheng, “Sparsedrive: End-to-end autonomous driving via sparse scene representation,” arXiv preprint arXiv:2405.19620 , 2024

  4. [4]

    Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,

    B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y. Zhang, Q. Zhang et al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” arXiv preprint arXiv:2411.15139, 2024

  5. [5]

    Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,

    K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 11, pp. 12 878–12 895, 2023

  6. [6]

    M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving,

    D. Xu, H. Li, Q. Wang, Z. Song, L. Chen, and H. Deng, “M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving,” arXiv preprint arXiv:2403.12552, 2024

  7. [7]

    St-p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,

    S. Hu, L. Chen, P . Wu, H. Li, J. Yan, and D. Tao, “St-p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,” in European Conference on Com- puter Vision. Springer, 2022, pp. 533–549

  8. [8]

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Vadv2: End-to-end vec- torized autonomous driving via probabilistic planning,” arXiv preprint arXiv:2402.13243, 2024

  9. [9]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y. Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu et al. , “Hydra-mdp: End-to-end mul- timodal planning with multi-target hydra-distillation,” arXiv preprint arXiv:2406.06978, 2024

  10. [10]

    End-to-end autonomous driving: Challenges and fron- tiers,

    L. Chen, P . Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and fron- tiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  11. [11]

    Robustness-aware 3d object detection in autonomous driving: A review and outlook,

    Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, and L. Wang, “Robustness-aware 3d object detection in autonomous driving: A review and outlook,” IEEE Trans- actions on Intelligent Transportation Systems, pp. 1–30, 2024

  12. [12]

    Dice: Diverse diffusion model with scoring for trajectory prediction,

    Y. Choi, R. C. Mercurius, S. M. A. Shabestary, and A. Ra- souli, “Dice: Diverse diffusion model with scoring for trajectory prediction,” in 2024 IEEE Intelligent Vehicles Sym- posium (IV). IEEE, 2024, pp. 3023–3029

  13. [13]

    Int2planner: An intention-based multi-modal motion SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14 planner for integrated prediction and planning,

    X. Chen, J. Yan, W. Liao, T. He, and P . Peng, “Int2planner: An intention-based multi-modal motion SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14 planner for integrated prediction and planning,” arXiv preprint arXiv:2501.12799, 2025

  14. [14]

    Denoising diffusion prob- abilistic models,

    J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion prob- abilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020

  15. [15]

    Diffusion policy: Visuomotor policy learning via action diffusion,

    C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Jour- nal of Robotics Research, p. 02783649241273668, 2023

  16. [16]

    Diffusion models: A comprehensive survey of methods and applications,

    L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023

  17. [17]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P . Wang, X. Bi et al. , “Deepseek-r1: In- centivizing reasoning capability in llms via reinforcement learning,” arXiv preprint arXiv:2501.12948, 2025

  18. [18]

    AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

    B. Jiang, S. Chen, Q. Zhang, W. Liu, and X. Wang, “Al- phadrive: Unleashing the power of vlms in autonomous driving via reinforcement learning and reasoning,” arXiv preprint arXiv:2503.07608, 2025

  19. [19]

    Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning,

    H. Gao, S. Chen, B. Jiang, B. Liao, Y. Shi, X. Guo, Y. Pu, H. Yin, X. Li, X. Zhang et al., “Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning,” arXiv preprint arXiv:2502.13144, 2025

  20. [20]

    Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,

    P . Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y. Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” in Advances in Neural Information Processing Systems , S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 6119–6132. [Online]. Availab...

  21. [21]

    End-to-end interpretable neural motion planner,

    W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Jun 2019. [Online]. Available: http://dx.doi.org/10.1109/cvpr.2019.00886

  22. [22]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017

  23. [23]

    Graphbev: Towards robust bev feature align- ment for multi-modal 3d object detection,

    Z. Song, L. Yang, S. Xu, L. Liu, D. Xu, C. Jia, F. Jia, and L. Wang, “Graphbev: Towards robust bev feature align- ment for multi-modal 3d object detection,” arXiv preprint arXiv:2403.11848, 2024

  24. [24]

    Graphalign: Enhancing accurate feature alignment by graph matching for multi-modal 3d object detection,

    Z. Song, H. Wei, L. Bai, L. Yang, and C. Jia, “Graphalign: Enhancing accurate feature alignment by graph matching for multi-modal 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3358–3369

  25. [25]

    Bevformer: Learning bird’s-eye-view repre- sentation from multi-camera images via spatiotemporal transformers,

    Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view repre- sentation from multi-camera images via spatiotemporal transformers,” in European conference on computer vision . Springer, 2022, pp. 1–18

  26. [26]

    Trackformer: Multi-object tracking with transform- ers,

    T. Meinhardt, A. Kirillov, L. Leal-Taixé, and C. Feichten- hofer, “Trackformer: Multi-object tracking with transform- ers,” Cornell University - arXiv,Cornell University - arXiv, Jan 2021

  27. [27]

    Maptr: Structured modeling and learning for online vectorized hd map construction,

    B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “Maptr: Structured modeling and learning for online vectorized hd map construction,” arXiv preprint arXiv:2208.14437, 2022

  28. [28]

    Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,

    S. Shi, L. Jiang, D. Dai, and B. Schiele, “Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  29. [29]

    Sparseocc: Rethinking sparse latent representation for vision-based semantic occupancy prediction,

    P . Tang, Z. Wang, G. Wang, J. Zheng, X. Ren, B. Feng, and C. Ma, “Sparseocc: Rethinking sparse latent representation for vision-based semantic occupancy prediction,” in Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 035–15 044

  30. [30]

    Closing the planning–learning loop with application to autonomous driving,

    P . Cai and D. Hsu, “Closing the planning–learning loop with application to autonomous driving,” IEEE Transac- tions on Robotics, vol. 39, no. 2, pp. 998–1011, 2022

  31. [31]

    Dualad: Disentangling the dynamic and static world for end-to-end driving,

    S. Doll, N. Hanselmann, L. Schneider, R. Schulz, M. Cordts, M. Enzweiler, and H. Lensch, “Dualad: Disentangling the dynamic and static world for end-to-end driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 728–14 737

  32. [32]

    Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,

    Z. Chen, M. Ye, S. Xu, T. Cao, and Q. Chen, “Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,” in European Conference on Computer Vision. Springer, 2025, pp. 239–256

  33. [33]

    Don’t shake the wheel: Momentum-aware planning in end- to-end autonomous driving,

    Z. Song, C. Jia, L. Liu, H. Pan, Y. Zhang, J. Wang, X. Zhang, S. Xu, L. Yang, and Y. Luo, “Don’t shake the wheel: Momentum-aware planning in end- to-end autonomous driving,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03125

  34. [34]

    Drivedreamer: Towards real-world-drive world models for autonomous driving,

    X. Wang, Z. Zhu, G. Huang, X. Chen, J. Zhu, and J. Lu, “Drivedreamer: Towards real-world-drive world models for autonomous driving,” in European Conference on Com- puter Vision. Springer, 2024, pp. 55–72

  35. [35]

    Diffscene: Diffusion-based safety-critical scenario generation for au- tonomous vehicles,

    C. Xu, A. Petiushko, D. Zhao, and B. Li, “Diffscene: Diffusion-based safety-critical scenario generation for au- tonomous vehicles,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 8, 2025, pp. 8797–8805

  36. [36]

    Diffbev: Con- ditional diffusion model for bird’s eye view perception,

    J. Zou, K. Tian, Z. Zhu, Y. Ye, and X. Wang, “Diffbev: Con- ditional diffusion model for bird’s eye view perception,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 7, 2024, pp. 7846–7854

  37. [37]

    Motiondiffuser: Controllable multi- agent motion prediction using diffusion,

    C. Jiang, A. Cornman, C. Park, B. Sapp, Y. Zhou, D. Anguelov et al. , “Motiondiffuser: Controllable multi- agent motion prediction using diffusion,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9644–9653

  38. [38]

    Vdt-auto: End-to-end autonomous driv- ing with vlm-guided diffusion transformers,

    Z. Guo, K. Gubernatorov, S. Asfaw, Z. Yagudin, and D. Tsetserukou, “Vdt-auto: End-to-end autonomous driv- ing with vlm-guided diffusion transformers,” arXiv preprint arXiv:2502.20108, 2025

  39. [39]

    Diffad: A unified diffusion modeling approach for au- tonomous driving,

    T. Wang, C. Zhang, X. Qu, K. Li, W. Liu, and C. Huang, “Diffad: A unified diffusion modeling approach for au- tonomous driving,” arXiv preprint arXiv:2503.12170, 2025

  40. [40]

    Difsd: Ego-centric fully sparse paradigm with uncertainty denoising and iterative refine- ment for efficient end-to-end autonomous driving,

    H. Su, W. Wu, and J. Yan, “Difsd: Ego-centric fully sparse paradigm with uncertainty denoising and iterative refine- ment for efficient end-to-end autonomous driving,” arXiv preprint arXiv:2409.09777, 2024

  41. [41]

    Deep reinforcement learning: A survey,

    X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, “Deep reinforcement learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5064–5078, 2022

  42. [42]

    Mastering the game of go with deep neural networks and tree search,

    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctotet al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016

  43. [43]

    Mastering the game of go without human knowl- edge,

    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowl- edge,” nature, vol. 550, no. 7676, pp. 354–359, 2017

  44. [44]

    Highly accurate protein structure prediction with alphafold,

    J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko et al. , “Highly accurate protein structure prediction with alphafold,” nature, vol. 596, no. 7873, pp. 583–589, 2021

  45. [45]

    Learning to drive from a world on rails,

    D. Chen, V . Koltun, and P . Krähenbühl, “Learning to drive from a world on rails,” in Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision , 2021, pp. 15 590– SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15 15 599

  46. [46]

    Solving motion planning tasks with a scalable generative model,

    Y. Hu, S. Chai, Z. Yang, J. Qian, K. Li, W. Shao, H. Zhang, W. Xu, and Q. Liu, “Solving motion planning tasks with a scalable generative model,” in European Conference on Computer Vision. Springer, 2024, pp. 386–404

  47. [47]

    Imita- tion is not enough: Robustifying imitation with reinforce- ment learning for challenging driving scenarios,

    Y. Lu, J. Fu, G. Tucker, X. Pan, E. Bronstein, R. Roelofs, B. Sapp, B. White, A. Faust, S. Whiteson et al. , “Imita- tion is not enough: Robustifying imitation with reinforce- ment learning for challenging driving scenarios,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7553–7560

  48. [48]

    End-to-end model-free reinforcement learning for urban driving using implicit affordances,

    M. Toromanoff, E. Wirbel, and F. Moutarde, “End-to-end model-free reinforcement learning for urban driving using implicit affordances,” in Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , 2020, pp. 7153–7162

  49. [49]

    End-to-end urban driving by imitating a reinforcement learning coach,

    Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-end urban driving by imitating a reinforcement learning coach,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15 222–15 232

  50. [50]

    Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,

    X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,” arXiv preprint arXiv:2406.03877, 2024

  51. [51]

    nuscenes: A multimodal dataset for autonomous driv- ing,

    H. Caesar, V . Bankiti, A. H. Lang, S. Vora, V . E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driv- ing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

  52. [52]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P . Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017

  53. [53]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Z. Shao, P . Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. Li, Y. Wu et al. , “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv:2402.03300, 2024

  54. [54]

    CARLA: An open urban driving simulator,

    A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 1–16. [Online]. Available: https://proceedings.mlr.press/...

  55. [55]

    Navsim: Data-driven non-reactive autonomous ve- hicle simulation and benchmarking,

    D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone et al., “Navsim: Data-driven non-reactive autonomous ve- hicle simulation and benchmarking,” Advances in Neural Information Processing Systems , vol. 37, pp. 28 706–28 719, 2024

  56. [56]

    Openscene: The largest up-to-date 3d occupancy prediction benchmark in autonomous driving,

    O. Contributors, “Openscene: The largest up-to-date 3d occupancy prediction benchmark in autonomous driving,” 2023

  57. [57]

    NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

    H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nu- plan: A closed-loop ml-based planning benchmark for au- tonomous vehicles,” arXiv preprint arXiv:2106.11810, 2021

  58. [58]

    Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving,

    Z. Song, C. Jia, L. Liu, H. Pan, Y. Zhang, J. Wang, X. Zhang, S. Xu, L. Yang, and Y. Luo, “Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving,” 2025

  59. [59]

    Challenger: Afford- able adversarial driving video generation,

    Z. Xu, B. Li, H.-a. Gao, M. Gao, Y. Chen, M. Liu, C. Yan, H. Zhao, S. Feng, and H. Zhao, “Challenger: Afford- able adversarial driving video generation,” arXiv preprint arXiv:2505.15880, 2025

  60. [60]

    Benchmarking robustness of 3d object detection to common corruptions,

    Y. Dong, C. Kang, J. Zhang, Z. Zhu, Y. Wang, X. Yang, H. Su, X. Wei, and J. Zhu, “Benchmarking robustness of 3d object detection to common corruptions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 1022–1032

  61. [61]

    Think twice before driving: Towards scalable decoders for end-to-end autonomous driving,

    X. Jia, P . Wu, L. Chen, J. Xie, C. He, J. Yan, and H. Li, “Think twice before driving: Towards scalable decoders for end-to-end autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), June 2023, pp. 21 983–21 994

  62. [62]

    Driveadapter: Breaking the coupling barrier of percep- tion and planning in end-to-end autonomous driving,

    X. Jia, Y. Gao, L. Chen, J. Yan, P . L. Liu, and H. Li, “Driveadapter: Breaking the coupling barrier of percep- tion and planning in end-to-end autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7953–7963

  63. [63]

    Drivetransformer: Unified transformer for scalable end-to-end autonomous driving,

    X. Jia, J. You, Z. Zhang, and J. Yan, “Drivetransformer: Unified transformer for scalable end-to-end autonomous driving,” arXiv preprint arXiv:2503.07656, 2025

  64. [64]

    End- to-end driving with online trajectory evaluation via bev world model,

    Y. Li, Y. Wang, Y. Liu, J. He, L. Fan, and Z. Zhang, “End- to-end driving with online trajectory evaluation via bev world model,” arXiv preprint arXiv:2504.01941, 2025

  65. [65]

    Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

    J.-T. Zhai, Z. Feng, J. Du, Y. Mao, J.-J. Liu, Z. Tan, Y. Zhang, X. Ye, and J. Wang, “Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes,” arXiv preprint arXiv:2305.10430, 2023

  66. [66]

    Genad: Gen- erative end-to-end autonomous driving,

    W. Zheng, R. Song, X. Guo, and L. Chen, “Genad: Gen- erative end-to-end autonomous driving,” arXiv preprint arXiv:2402.11502, 2024

  67. [67]

    Para-drive: Parallelized architecture for real-time au- tonomous driving,

    X. Weng, B. Ivanovic, Y. Wang, Y. Wang, and M. Pavone, “Para-drive: Parallelized architecture for real-time au- tonomous driving,” in Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458

  68. [68]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778. Ziying Song was born in Xingtai, Hebei Province, China in 1997. He received the B.S. degree from Hebei Normal University of Science and Technology (China) in 2019. He rec...