DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving
Pith reviewed 2026-05-19 06:01 UTC · model grok-4.3
Add this Pith Number to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{CZPRYVZN}
Prints a linked pith:CZPRYVZN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Reinforcement learning steers a diffusion model to turn one expert driving trace into multiple safe and varied trajectories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The reinforced diffusion-based generation mechanism conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory. Reinforcement learning then guides the diffusion process by applying reward-based supervision that enforces safety and diversity constraints, improving practicality and generalization while addressing the mode collapse that arises when imitation learning relies on single demonstrations.
What carries the argument
The reinforced diffusion-based generation mechanism that expands one expert trajectory into several conditioned candidates and uses RL rewards to enforce safety plus diversity during denoising.
If this is right
- End-to-end models can output several distinct responses to the same scene instead of always repeating the expert choice.
- Closed-loop performance improves on benchmarks that test generalization because the generated paths include safer alternatives to the single demonstration.
- A dedicated diversity metric replaces open-loop L2 scores and better reveals whether multi-mode predictions actually spread out.
- The same conditioning on map and agent data can be reused across different driving scenes without collecting new expert traces for each variation.
Where Pith is reading between the lines
- The approach could lower the volume of expert driving data needed to train capable systems by turning each existing trace into multiple useful examples.
- Similar reward-guided diffusion steps might transfer to other imitation-learning settings such as robotic manipulation where single demonstrations are also limiting.
- Explicit safety rewards during generation could let developers test constraint satisfaction in simulation before any real-world deployment.
Load-bearing premise
Reward signals added to the diffusion steps can reliably steer outputs toward safe and varied trajectories without creating invalid paths or destabilizing training.
What would settle it
Training runs that produce a high rate of colliding or off-road trajectories even after reward optimization, or that show no gain on the new diversity metric alongside worse closed-loop performance, would falsify the central claim.
Figures
read the original abstract
Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DIVER, an end-to-end autonomous driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories from single expert demonstrations. It conditions on map elements and surrounding agents to generate multiple reference trajectories, employs RL to enforce safety and diversity constraints during the diffusion process, introduces a novel Diversity metric to better evaluate multi-mode predictions beyond L2 open-loop metrics, and reports improvements on closed-loop NAVSIM and Bench2Drive benchmarks plus open-loop nuScenes.
Significance. If the central claims hold, the work offers a concrete mechanism for alleviating mode collapse and conservatism in imitation-learned driving policies, with potential for improved generalization in complex scenarios. The combination of conditioning on map/agent data with reward-guided diffusion and the proposed diversity metric directly targets evaluation gaps in multi-modal trajectory prediction.
major comments (2)
- [Method section on reinforced diffusion] The core integration of RL into the diffusion reverse process (described in the reinforced diffusion-based generation mechanism) is load-bearing for the claim of reliable safety and diversity enforcement. The manuscript must provide the precise equations or algorithm (e.g., reward-weighted sampling, classifier guidance, or auxiliary loss) showing how rewards are injected without introducing instability, reward hacking, or trajectories that violate vehicle dynamics.
- [Experiments] Experiments on NAVSIM and Bench2Drive report significant improvements in trajectory diversity and closed-loop performance, but the manuscript should include ablation results isolating the contribution of the RL component versus the diffusion conditioning alone, with quantitative metrics on safety violations and infeasible trajectory rates.
minor comments (3)
- [Diversity metric definition] The novel Diversity metric is introduced to address limitations of L2-based open-loop metrics; include its explicit mathematical definition and comparison to existing multi-modal metrics such as minADE or entropy-based measures.
- [Model architecture] Clarify the exact conditioning inputs (map elements, surrounding agents) and how they are encoded in the diffusion model architecture.
- [Reward formulation] The abstract states that RL enforces both safety and diversity; ensure the reward formulation is detailed enough to allow reproduction, including any weighting between safety and diversity terms.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review. The comments highlight important aspects of the reinforced diffusion mechanism and experimental validation. We address each major comment below and have prepared revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method section on reinforced diffusion] The core integration of RL into the diffusion reverse process (described in the reinforced diffusion-based generation mechanism) is load-bearing for the claim of reliable safety and diversity enforcement. The manuscript must provide the precise equations or algorithm (e.g., reward-weighted sampling, classifier guidance, or auxiliary loss) showing how rewards are injected without introducing instability, reward hacking, or trajectories that violate vehicle dynamics.
Authors: We agree that the precise integration of rewards into the diffusion reverse process requires explicit mathematical detail to support the safety and diversity claims. The original manuscript provided a high-level description of reward-based supervision. In the revised manuscript, we have expanded the Method section with the full set of equations for the RL-guided denoising process. This includes the modified reverse step that incorporates a scalar reward signal via an additive guidance term, the definition of the composite reward (safety via collision and dynamics penalties plus a diversity term based on pairwise trajectory distance), and the algorithm for sampling under the guided distribution. We also include a brief stability analysis and note that vehicle kinematics are enforced by projecting samples onto a feasible set after each denoising step, which prevents dynamics violations. revision: yes
-
Referee: [Experiments] Experiments on NAVSIM and Bench2Drive report significant improvements in trajectory diversity and closed-loop performance, but the manuscript should include ablation results isolating the contribution of the RL component versus the diffusion conditioning alone, with quantitative metrics on safety violations and infeasible trajectory rates.
Authors: We thank the referee for this suggestion, which helps clarify the source of the observed gains. The original experiments compared the full model against prior methods but did not isolate the RL guidance. In the revised manuscript we have added a new ablation study (Section 4.3) that directly compares (i) diffusion conditioning on map and agents alone versus (ii) the same conditioning plus RL reward guidance. The results quantify the incremental benefit of the RL component, reporting lower rates of safety violations (collisions and off-road events) and infeasible trajectories (measured by kinematic constraint violations) on both NAVSIM and Bench2Drive. These metrics are presented alongside the diversity score to show the trade-offs. revision: yes
Circularity Check
No significant circularity; derivation self-contained
full rationale
The paper proposes a new end-to-end framework DIVER that combines diffusion-based trajectory generation conditioned on map and agent data with RL-based reward supervision for safety and diversity. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or prior self-citation chain. The generation of multiple trajectories from one ground-truth and the subsequent RL guidance are presented as architectural choices with external benchmarks (NAVSIM, Bench2Drive, nuScenes) for validation. The proposed Diversity metric addresses a stated limitation of L2 metrics but does not rename or tautologically reuse prior results. This matches the expected honest non-finding for a method-proposal paper whose central claims rest on independent mechanisms rather than internal redefinitions.
Axiom & Free-Parameter Ledger
free parameters (1)
- reward weights for safety and diversity
axioms (1)
- domain assumption Diffusion models conditioned on map and agent states can generate multiple feasible trajectories from a single expert path
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We treat the diffusion process as a stochastic policy and employ Group Relative Policy Optimization (GPRO) objectives to guide the diffusion process. By optimizing trajectory-level rewards for both diversity and safety...
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Driving risk emerges from the required two-dimensional joint evasive acceleration
Evasive acceleration quantifies driving risk as the minimum 2D constant relative acceleration needed to avoid collision and outperforms time-to-collision on warning timing, discrimination, and information retention ac...
-
DriveFuture: Future-Aware Latent World Models for Autonomous Driving
DriveFuture achieves SOTA results on NAVSIM by conditioning latent world model states on future predictions to directly inform trajectory planning.
-
Causality-Aware End-to-End Autonomous Driving via Ego-Centric Joint Scene Modeling
CaAD adds ego-centric joint-causal modeling and causality-aware policy alignment to end-to-end driving, reporting Driving Score 87.53 and Success Rate 71.81 on Bench2Drive plus PDMS 91.1 on NAVSIM.
Reference graph
Works this paper leans on
-
[1]
Planning-oriented autonomous driving,
Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y. Qiao, and H. Li, “Planning-oriented autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2023, pp. 17 853– 17 862
work page 2023
-
[2]
Vad: Vector- ized scene representation for efficient autonomous driv- ing,
B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang, “Vad: Vector- ized scene representation for efficient autonomous driv- ing,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8340–8350
work page 2023
-
[3]
Sparsedrive: End-to-end autonomous driving via sparse scene representation,
W. Sun, X. Lin, Y. Shi, C. Zhang, H. Wu, and S. Zheng, “Sparsedrive: End-to-end autonomous driving via sparse scene representation,” arXiv preprint arXiv:2405.19620 , 2024
-
[4]
Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,
B. Liao, S. Chen, H. Yin, B. Jiang, C. Wang, S. Yan, X. Zhang, X. Li, Y. Zhang, Q. Zhang et al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” arXiv preprint arXiv:2411.15139, 2024
-
[5]
Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,
K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 11, pp. 12 878–12 895, 2023
work page 2023
-
[6]
M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving,
D. Xu, H. Li, Q. Wang, Z. Song, L. Chen, and H. Deng, “M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving,” arXiv preprint arXiv:2403.12552, 2024
-
[7]
St-p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,
S. Hu, L. Chen, P . Wu, H. Li, J. Yan, and D. Tao, “St-p3: End-to-end vision-based autonomous driving via spatial- temporal feature learning,” in European Conference on Com- puter Vision. Springer, 2022, pp. 533–549
work page 2022
-
[8]
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
S. Chen, B. Jiang, H. Gao, B. Liao, Q. Xu, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Vadv2: End-to-end vec- torized autonomous driving via probabilistic planning,” arXiv preprint arXiv:2402.13243, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y. Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu et al. , “Hydra-mdp: End-to-end mul- timodal planning with multi-target hydra-distillation,” arXiv preprint arXiv:2406.06978, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
End-to-end autonomous driving: Challenges and fron- tiers,
L. Chen, P . Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and fron- tiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[11]
Robustness-aware 3d object detection in autonomous driving: A review and outlook,
Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, and L. Wang, “Robustness-aware 3d object detection in autonomous driving: A review and outlook,” IEEE Trans- actions on Intelligent Transportation Systems, pp. 1–30, 2024
work page 2024
-
[12]
Dice: Diverse diffusion model with scoring for trajectory prediction,
Y. Choi, R. C. Mercurius, S. M. A. Shabestary, and A. Ra- souli, “Dice: Diverse diffusion model with scoring for trajectory prediction,” in 2024 IEEE Intelligent Vehicles Sym- posium (IV). IEEE, 2024, pp. 3023–3029
work page 2024
-
[13]
X. Chen, J. Yan, W. Liao, T. He, and P . Peng, “Int2planner: An intention-based multi-modal motion SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 14 planner for integrated prediction and planning,” arXiv preprint arXiv:2501.12799, 2025
-
[14]
Denoising diffusion prob- abilistic models,
J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion prob- abilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020
work page 2020
-
[15]
Diffusion policy: Visuomotor policy learning via action diffusion,
C. Chi, Z. Xu, S. Feng, E. Cousineau, Y. Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Jour- nal of Robotics Research, p. 02783649241273668, 2023
work page 2023
-
[16]
Diffusion models: A comprehensive survey of methods and applications,
L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023
work page 2023
-
[17]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P . Wang, X. Bi et al. , “Deepseek-r1: In- centivizing reasoning capability in llms via reinforcement learning,” arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
B. Jiang, S. Chen, Q. Zhang, W. Liu, and X. Wang, “Al- phadrive: Unleashing the power of vlms in autonomous driving via reinforcement learning and reasoning,” arXiv preprint arXiv:2503.07608, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning,
H. Gao, S. Chen, B. Jiang, B. Liao, Y. Shi, X. Guo, Y. Pu, H. Yin, X. Li, X. Zhang et al., “Rad: Training an end-to-end driving policy via large-scale 3dgs-based reinforcement learning,” arXiv preprint arXiv:2502.13144, 2025
-
[20]
P . Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y. Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” in Advances in Neural Information Processing Systems , S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 6119–6132. [Online]. Availab...
work page 2022
-
[21]
End-to-end interpretable neural motion planner,
W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Jun 2019. [Online]. Available: http://dx.doi.org/10.1109/cvpr.2019.00886
-
[22]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[23]
Graphbev: Towards robust bev feature align- ment for multi-modal 3d object detection,
Z. Song, L. Yang, S. Xu, L. Liu, D. Xu, C. Jia, F. Jia, and L. Wang, “Graphbev: Towards robust bev feature align- ment for multi-modal 3d object detection,” arXiv preprint arXiv:2403.11848, 2024
-
[24]
Z. Song, H. Wei, L. Bai, L. Yang, and C. Jia, “Graphalign: Enhancing accurate feature alignment by graph matching for multi-modal 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3358–3369
work page 2023
-
[25]
Z. Li, W. Wang, H. Li, E. Xie, C. Sima, T. Lu, Y. Qiao, and J. Dai, “Bevformer: Learning bird’s-eye-view repre- sentation from multi-camera images via spatiotemporal transformers,” in European conference on computer vision . Springer, 2022, pp. 1–18
work page 2022
-
[26]
Trackformer: Multi-object tracking with transform- ers,
T. Meinhardt, A. Kirillov, L. Leal-Taixé, and C. Feichten- hofer, “Trackformer: Multi-object tracking with transform- ers,” Cornell University - arXiv,Cornell University - arXiv, Jan 2021
work page 2021
-
[27]
Maptr: Structured modeling and learning for online vectorized hd map construction,
B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “Maptr: Structured modeling and learning for online vectorized hd map construction,” arXiv preprint arXiv:2208.14437, 2022
-
[28]
Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,
S. Shi, L. Jiang, D. Dai, and B. Schiele, “Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
work page 2024
-
[29]
Sparseocc: Rethinking sparse latent representation for vision-based semantic occupancy prediction,
P . Tang, Z. Wang, G. Wang, J. Zheng, X. Ren, B. Feng, and C. Ma, “Sparseocc: Rethinking sparse latent representation for vision-based semantic occupancy prediction,” in Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 035–15 044
work page 2024
-
[30]
Closing the planning–learning loop with application to autonomous driving,
P . Cai and D. Hsu, “Closing the planning–learning loop with application to autonomous driving,” IEEE Transac- tions on Robotics, vol. 39, no. 2, pp. 998–1011, 2022
work page 2022
-
[31]
Dualad: Disentangling the dynamic and static world for end-to-end driving,
S. Doll, N. Hanselmann, L. Schneider, R. Schulz, M. Cordts, M. Enzweiler, and H. Lensch, “Dualad: Disentangling the dynamic and static world for end-to-end driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 728–14 737
work page 2024
-
[32]
Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,
Z. Chen, M. Ye, S. Xu, T. Cao, and Q. Chen, “Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving,” in European Conference on Computer Vision. Springer, 2025, pp. 239–256
work page 2025
-
[33]
Don’t shake the wheel: Momentum-aware planning in end- to-end autonomous driving,
Z. Song, C. Jia, L. Liu, H. Pan, Y. Zhang, J. Wang, X. Zhang, S. Xu, L. Yang, and Y. Luo, “Don’t shake the wheel: Momentum-aware planning in end- to-end autonomous driving,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03125
-
[34]
Drivedreamer: Towards real-world-drive world models for autonomous driving,
X. Wang, Z. Zhu, G. Huang, X. Chen, J. Zhu, and J. Lu, “Drivedreamer: Towards real-world-drive world models for autonomous driving,” in European Conference on Com- puter Vision. Springer, 2024, pp. 55–72
work page 2024
-
[35]
Diffscene: Diffusion-based safety-critical scenario generation for au- tonomous vehicles,
C. Xu, A. Petiushko, D. Zhao, and B. Li, “Diffscene: Diffusion-based safety-critical scenario generation for au- tonomous vehicles,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 8, 2025, pp. 8797–8805
work page 2025
-
[36]
Diffbev: Con- ditional diffusion model for bird’s eye view perception,
J. Zou, K. Tian, Z. Zhu, Y. Ye, and X. Wang, “Diffbev: Con- ditional diffusion model for bird’s eye view perception,” in Proceedings of the AAAI conference on artificial intelligence, vol. 38, no. 7, 2024, pp. 7846–7854
work page 2024
-
[37]
Motiondiffuser: Controllable multi- agent motion prediction using diffusion,
C. Jiang, A. Cornman, C. Park, B. Sapp, Y. Zhou, D. Anguelov et al. , “Motiondiffuser: Controllable multi- agent motion prediction using diffusion,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9644–9653
work page 2023
-
[38]
Vdt-auto: End-to-end autonomous driv- ing with vlm-guided diffusion transformers,
Z. Guo, K. Gubernatorov, S. Asfaw, Z. Yagudin, and D. Tsetserukou, “Vdt-auto: End-to-end autonomous driv- ing with vlm-guided diffusion transformers,” arXiv preprint arXiv:2502.20108, 2025
-
[39]
Diffad: A unified diffusion modeling approach for au- tonomous driving,
T. Wang, C. Zhang, X. Qu, K. Li, W. Liu, and C. Huang, “Diffad: A unified diffusion modeling approach for au- tonomous driving,” arXiv preprint arXiv:2503.12170, 2025
-
[40]
H. Su, W. Wu, and J. Yan, “Difsd: Ego-centric fully sparse paradigm with uncertainty denoising and iterative refine- ment for efficient end-to-end autonomous driving,” arXiv preprint arXiv:2409.09777, 2024
-
[41]
Deep reinforcement learning: A survey,
X. Wang, S. Wang, X. Liang, D. Zhao, J. Huang, X. Xu, B. Dai, and Q. Miao, “Deep reinforcement learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 4, pp. 5064–5078, 2022
work page 2022
-
[42]
Mastering the game of go with deep neural networks and tree search,
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctotet al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016
work page 2016
-
[43]
Mastering the game of go without human knowl- edge,
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowl- edge,” nature, vol. 550, no. 7676, pp. 354–359, 2017
work page 2017
-
[44]
Highly accurate protein structure prediction with alphafold,
J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko et al. , “Highly accurate protein structure prediction with alphafold,” nature, vol. 596, no. 7873, pp. 583–589, 2021
work page 2021
-
[45]
Learning to drive from a world on rails,
D. Chen, V . Koltun, and P . Krähenbühl, “Learning to drive from a world on rails,” in Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision , 2021, pp. 15 590– SUBMITTED TO IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 15 15 599
work page 2021
-
[46]
Solving motion planning tasks with a scalable generative model,
Y. Hu, S. Chai, Z. Yang, J. Qian, K. Li, W. Shao, H. Zhang, W. Xu, and Q. Liu, “Solving motion planning tasks with a scalable generative model,” in European Conference on Computer Vision. Springer, 2024, pp. 386–404
work page 2024
-
[47]
Y. Lu, J. Fu, G. Tucker, X. Pan, E. Bronstein, R. Roelofs, B. Sapp, B. White, A. Faust, S. Whiteson et al. , “Imita- tion is not enough: Robustifying imitation with reinforce- ment learning for challenging driving scenarios,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7553–7560
work page 2023
-
[48]
End-to-end model-free reinforcement learning for urban driving using implicit affordances,
M. Toromanoff, E. Wirbel, and F. Moutarde, “End-to-end model-free reinforcement learning for urban driving using implicit affordances,” in Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , 2020, pp. 7153–7162
work page 2020
-
[49]
End-to-end urban driving by imitating a reinforcement learning coach,
Z. Zhang, A. Liniger, D. Dai, F. Yu, and L. Van Gool, “End-to-end urban driving by imitating a reinforcement learning coach,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 15 222–15 232
work page 2021
-
[50]
Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,
X. Jia, Z. Yang, Q. Li, Z. Zhang, and J. Yan, “Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving,” arXiv preprint arXiv:2406.03877, 2024
-
[51]
nuscenes: A multimodal dataset for autonomous driv- ing,
H. Caesar, V . Bankiti, A. H. Lang, S. Vora, V . E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driv- ing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631
work page 2020
-
[52]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P . Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[53]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Z. Shao, P . Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. Li, Y. Wu et al. , “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
CARLA: An open urban driving simulator,
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 1–16. [Online]. Available: https://proceedings.mlr.press/...
work page 2017
-
[55]
Navsim: Data-driven non-reactive autonomous ve- hicle simulation and benchmarking,
D. Dauner, M. Hallgarten, T. Li, X. Weng, Z. Huang, Z. Yang, H. Li, I. Gilitschenski, B. Ivanovic, M. Pavone et al., “Navsim: Data-driven non-reactive autonomous ve- hicle simulation and benchmarking,” Advances in Neural Information Processing Systems , vol. 37, pp. 28 706–28 719, 2024
work page 2024
-
[56]
Openscene: The largest up-to-date 3d occupancy prediction benchmark in autonomous driving,
O. Contributors, “Openscene: The largest up-to-date 3d occupancy prediction benchmark in autonomous driving,” 2023
work page 2023
-
[57]
NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nu- plan: A closed-loop ml-based planning benchmark for au- tonomous vehicles,” arXiv preprint arXiv:2106.11810, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[58]
Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving,
Z. Song, C. Jia, L. Liu, H. Pan, Y. Zhang, J. Wang, X. Zhang, S. Xu, L. Yang, and Y. Luo, “Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving,” 2025
work page 2025
-
[59]
Challenger: Afford- able adversarial driving video generation,
Z. Xu, B. Li, H.-a. Gao, M. Gao, Y. Chen, M. Liu, C. Yan, H. Zhao, S. Feng, and H. Zhao, “Challenger: Afford- able adversarial driving video generation,” arXiv preprint arXiv:2505.15880, 2025
-
[60]
Benchmarking robustness of 3d object detection to common corruptions,
Y. Dong, C. Kang, J. Zhang, Z. Zhu, Y. Wang, X. Yang, H. Su, X. Wei, and J. Zhu, “Benchmarking robustness of 3d object detection to common corruptions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 1022–1032
work page 2023
-
[61]
Think twice before driving: Towards scalable decoders for end-to-end autonomous driving,
X. Jia, P . Wu, L. Chen, J. Xie, C. He, J. Yan, and H. Li, “Think twice before driving: Towards scalable decoders for end-to-end autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), June 2023, pp. 21 983–21 994
work page 2023
-
[62]
X. Jia, Y. Gao, L. Chen, J. Yan, P . L. Liu, and H. Li, “Driveadapter: Breaking the coupling barrier of percep- tion and planning in end-to-end autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7953–7963
work page 2023
-
[63]
Drivetransformer: Unified transformer for scalable end-to-end autonomous driving,
X. Jia, J. You, Z. Zhang, and J. Yan, “Drivetransformer: Unified transformer for scalable end-to-end autonomous driving,” arXiv preprint arXiv:2503.07656, 2025
-
[64]
End- to-end driving with online trajectory evaluation via bev world model,
Y. Li, Y. Wang, Y. Liu, J. He, L. Fan, and Z. Zhang, “End- to-end driving with online trajectory evaluation via bev world model,” arXiv preprint arXiv:2504.01941, 2025
-
[65]
Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes
J.-T. Zhai, Z. Feng, J. Du, Y. Mao, J.-J. Liu, Z. Tan, Y. Zhang, X. Ye, and J. Wang, “Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes,” arXiv preprint arXiv:2305.10430, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[66]
Genad: Gen- erative end-to-end autonomous driving,
W. Zheng, R. Song, X. Guo, and L. Chen, “Genad: Gen- erative end-to-end autonomous driving,” arXiv preprint arXiv:2402.11502, 2024
-
[67]
Para-drive: Parallelized architecture for real-time au- tonomous driving,
X. Weng, B. Ivanovic, Y. Wang, Y. Wang, and M. Pavone, “Para-drive: Parallelized architecture for real-time au- tonomous driving,” in Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458
work page 2024
-
[68]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778. Ziying Song was born in Xingtai, Hebei Province, China in 1997. He received the B.S. degree from Hebei Normal University of Science and Technology (China) in 2019. He rec...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.