pith. sign in

arxiv: 2602.22801 · v2 · pith:F6NAMQ7Wnew · submitted 2026-02-26 · 💻 cs.RO · cs.AI· cs.LG

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

Pith reviewed 2026-05-21 12:48 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords diffusion modelsautonomous drivingend-to-end planningtrajectory generationreinforcement learningreal-world evaluationurban scenarios
0
0 comments X

The pith

Diffusion models trained on real-vehicle data can plan end-to-end autonomous driving trajectories effectively in urban settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper explores using diffusion models to create planners for end-to-end autonomous driving. The authors run large-scale studies with real car data to find what matters in the loss function, how to represent paths, and how more data helps. They add a reinforcement learning step after training to make the system safer. The final planner is tested on a real car in six different city driving situations over 200 kilometers and shows ten times better results than the starting model. Readers might care because this points to a way for self-driving cars to handle real-world uncertainty without breaking the system into separate parts.

Core claim

The paper shows that diffusion models can be turned into practical planners for end-to-end autonomous driving by studying the loss space, trajectory representation, and data scaling effects, then using reinforcement learning to boost safety, resulting in the Hyper Diffusion Planner that achieves a tenfold performance gain in real-world tests.

What carries the argument

Hyper Diffusion Planner (HDP), a diffusion model framework for generating driving trajectories that incorporates insights from loss design, representation choices, data scaling, and post-training with reinforcement learning.

If this is right

  • Insights into the diffusion loss space improve planning accuracy in driving tasks.
  • Specific trajectory representations help manage the complexity of urban environments.
  • Increasing the amount of training data leads to substantial performance gains.
  • Reinforcement learning after initial training strengthens safety and robustness.
  • The overall framework supports deployment in varied real-world driving conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Diffusion approaches might extend to other decision-making problems in robotics where uncertainty is high.
  • This work suggests end-to-end methods could eventually replace traditional modular autonomous driving systems.
  • Testing the planner in more diverse weather or traffic conditions would reveal its limits.
  • Combining this with other AI techniques like vision models could further improve results.

Load-bearing premise

The collected real-vehicle data and the specific road-testing conditions represent the full range of complex urban driving situations that would be encountered in wider use.

What would settle it

Demonstrating poor performance or safety issues when the system encounters driving conditions outside the collected data distribution, such as rare events or new locations, would disprove the effectiveness for general real-world use.

Figures

Figures reproduced from arXiv: 2602.22801 by Bin Huang, Enguang Liu, Guang Chen, Hangjun Ye, Jianlin Zhang, Jianwei Cui, Jingjing Liu, Kun Ma, Long Chen, Ruiming Liang, Tianyi Tan, Xianyuan Zhan, Ya-Qin Zhang, Yinan Zheng.

Figure 1
Figure 1. Figure 1: Overview of Hyper Diffusion Planner (HDP). [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Model architecture. III. INVESTIGATION ROADMAP In this section, we first introduce the base model and evalu￾ation metrics for assessing model performance. Subsequently, we will briefly outline our investigation roadmap aimed at fully unleashing the potential of diffusion models for E2E AD. A. Base Model Scene Encoder. We consider an E2E AD system that can directly process multi-modal inputs, including came… view at source ↗
Figure 4
Figure 4. Figure 4: The open-loop visualization of planning trajecto [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The v-t curve of generated trajectories us￾ing different representa￾tions. Waypoint represen￾tation suffers severe jitter [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: The divergence score of HDP trained on different sizes of data. ~100K Frames (NAVSIM) ~100K Frames ~20M Frames [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 11
Figure 11. Figure 11: (a–b) Details the performance of success rate and stability score under different volumes. (c) Compares the success [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Closed-loop real-vehicle testing results. Two representative frames from the scenario are captured for illustration. [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visualization of open-loop data replay for bad cases [PITH_FULL_IMAGE:figures/full_fig_p008_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Closed-loop real-vehicle testing results. Each row contains representative frames from the scenario. [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Real vehicle test route [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗
read the original abstract

Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Autonomous Driving (E2E AD), remains underexplored. In this study, we conducted a systematic and large-scale investigation to unleash the potential of the diffusion models as planners for E2E AD, based on a tremendous amount of real-vehicle data and road testing. Through comprehensive and carefully controlled studies, we identify key insights into the diffusion loss space, trajectory representation, and data scaling that significantly impact E2E planning performance. Moreover, we also provide an effective reinforcement learning post-training strategy to further enhance the safety and robustness of the learned planner. The resulting diffusion-based learning framework, Hyper Diffusion Planner (HDP), is deployed on a real-vehicle platform and evaluated across 6 urban driving scenarios and 200 km of real-world testing, achieving a notable 10x performance improvement over the base model. Our work demonstrates that diffusion models, when properly designed and trained, can serve as effective and scalable E2E AD planners for complex, real-world autonomous driving tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Hyper Diffusion Planner (HDP), a diffusion-based end-to-end autonomous driving framework. It reports systematic controlled studies identifying key factors in diffusion loss space, trajectory representation, and data scaling from large-scale real-vehicle data, proposes an RL post-training strategy to improve safety and robustness, and describes real-vehicle deployment evaluated across 6 urban scenarios and 200 km of road testing with a claimed 10x performance gain over the base model.

Significance. If the empirical claims hold after clarification, the work would be significant for robotics and autonomous systems by providing large-scale real-world evidence for diffusion planners in E2E AD, along with actionable insights on training and representation choices. The shift from simulation-only settings and the inclusion of RL post-training are positive elements that could influence subsequent research.

major comments (2)
  1. [Abstract] Abstract: the central claim of a 'notable 10x performance improvement' over the base model is presented without defining the performance metric, specifying the base model architecture or training details, reporting error bars, or describing the exact computation of the 10x factor; this is load-bearing for the empirical contribution and prevents verification of the result.
  2. [Evaluation] Evaluation section describing the 200 km testing: the reported coverage across only 6 urban scenarios provides no quantitative metrics on long-tail event frequency, weather/traffic diversity, or failure-mode statistics; in safety-critical planning, performance is dominated by rare events, so this leaves the generalization claim open to the possibility that observed gains reflect easier test conditions rather than a robust property of HDP.
minor comments (2)
  1. Define all acronyms (E2E AD, HDP) on first use and ensure consistent terminology between the abstract and main text.
  2. [Methods] The description of trajectory representation in the diffusion model would benefit from an explicit equation or diagram showing the conditioning and denoising process.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity in our empirical claims and evaluation methodology. We address each major comment below and indicate revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of a 'notable 10x performance improvement' over the base model is presented without defining the performance metric, specifying the base model architecture or training details, reporting error bars, or describing the exact computation of the 10x factor; this is load-bearing for the empirical contribution and prevents verification of the result.

    Authors: We agree that the abstract requires additional context to make the central claim verifiable. In the revised manuscript, we have updated the abstract to define the performance metric as the human intervention rate (interventions per 100 km), specify the base model as the diffusion planner trained solely with the supervised loss without RL post-training, and describe the 10x factor as the ratio of intervention-free autonomous driving distance achieved by HDP relative to the base model over identical test routes. Error bars derived from three independent test runs are now referenced, with full details provided in the Evaluation section. These changes preserve abstract length while enabling verification. revision: yes

  2. Referee: [Evaluation] Evaluation section describing the 200 km testing: the reported coverage across only 6 urban scenarios provides no quantitative metrics on long-tail event frequency, weather/traffic diversity, or failure-mode statistics; in safety-critical planning, performance is dominated by rare events, so this leaves the generalization claim open to the possibility that observed gains reflect easier test conditions rather than a robust property of HDP.

    Authors: We concur that more granular characterization of test conditions would better support generalization claims. The revised Evaluation section now includes additional qualitative and semi-quantitative descriptions of the six scenarios, encompassing variations in traffic density, time-of-day conditions, and inclusion of challenging elements such as unprotected left turns and pedestrian-dense areas. The 200 km routes were chosen to reflect representative urban driving distributions based on prior fleet data. However, we did not collect exhaustive pre-specified logs for long-tail event frequencies or detailed failure-mode breakdowns during the original deployment. revision: partial

standing simulated objections not resolved
  • Quantitative metrics on long-tail event frequency, weather/traffic diversity, and failure-mode statistics from the 200 km real-world testing, as these were not systematically recorded in the original experiments.

Circularity Check

0 steps flagged

No circularity: empirical claims rest on real-vehicle deployment and testing

full rationale

The paper presents its contributions as the outcome of large-scale empirical studies on real-vehicle data, controlled experiments identifying insights on diffusion loss space, trajectory representation and data scaling, plus an RL post-training strategy. The 10x performance gain is reported directly from deployment on a real-vehicle platform across 6 scenarios and 200 km of road testing. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations are described that would reduce any claimed result to its own inputs by construction. The work is therefore self-contained as an empirical investigation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that real-vehicle data collected under the authors' testing protocol is representative of complex urban driving and that the identified insights on diffusion loss space and trajectory representation transfer to production use. No free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Real-vehicle data and road testing conditions are representative of the full distribution of complex urban driving scenarios
    The scaling claims and 10x improvement depend on this premise being true.

pith-pipeline@v0.9.0 · 5811 in / 1428 out tokens · 107287 ms · 2026-05-21T12:48:48.910627+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

    cs.RO 2026-05 unverdicted novelty 7.0

    ReflectDrive-2 achieves 91.0 PDMS on NAVSIM with camera input by training a discrete diffusion model to self-edit trajectories via RL-aligned AutoEdit.

  2. ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving

    cs.RO 2026-05 unverdicted novelty 6.0

    ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.

  3. CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

    cs.LG 2026-05 unverdicted novelty 5.0

    CRAFT is an on-policy RL fine-tuning framework that decomposes closed-loop policy gradients into a group-normalized counterfactual proxy plus residual correction from interaction events, achieving top closed-loop perf...

  4. RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

    cs.CV 2026-04 unverdicted novelty 5.0

    RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 3 Pith papers · 18 internal anchors

  1. [1]

    Cosmos World Foundation Model Platform for Physical AI

    Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Bal- aji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

  2. [2]

    Interpretable goal-based predic- tion and planning for autonomous driving

    Stefano V Albrecht, Cillian Brewitt, John Wilhelm, Balint Gyevnar, Francisco Eiras, Mihai Dobre, and Sub- ramanian Ramamoorthy. Interpretable goal-based predic- tion and planning for autonomous driving. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1043–1049. IEEE, 2021

  3. [3]

    Improving image generation with better captions.Computer Science

    James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jian- feng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, et al. Improving image generation with better captions.Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023

  4. [4]

    Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π 0: A vi...

  5. [5]

    Training diffusion models with reinforcement learning

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning. InInternational Conference on Learning Representations, 2024

  6. [6]

    End to End Learning for Self-Driving Cars

    Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016

  7. [7]

    nuScenes: A multimodal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving.arXiv preprint arXiv:1903.11027, 2019

  8. [8]

    NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

    Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810, 2021

  9. [9]

    End-to-end autonomous driving: Challenges and frontiers,

    Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end au- tonomous driving: Challenges and frontiers.arXiv preprint arXiv:2306.16927, 2023

  10. [10]

    Dif- fusion policy: Visuomotor policy learning via action diffusion

    Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems (RSS), 2023

  11. [11]

    Directly fine-tuning diffusion models on differen- tiable rewards

    Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differen- tiable rewards. InThe Twelfth International Conference on Learning Representations, 2023

  12. [12]

    Diffusion models in vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 45(9):10850–10869, 2023

    Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 45(9):10850–10869, 2023

  13. [13]

    Parting with misconceptions about learning-based vehicle motion planning

    Daniel Dauner, Marcel Hallgarten, Andreas Geiger, and Kashyap Chitta. Parting with misconceptions about learning-based vehicle motion planning. InConference on Robot Learning (CoRL), 2023

  14. [14]

    Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmark- ing

    Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmark- ing. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  15. [15]

    Baidu apollo em motion planner, 2018

    Haoyang Fan, Fan Zhu, Changchun Liu, Liangliang Zhang, Li Zhuang, Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, and Qi Kong. Baidu apollo em motion planner, 2018

  16. [16]

    Densetnt: End- to-end trajectory prediction from dense goal sets

    Junru Gu, Chen Sun, and Hang Zhao. Densetnt: End- to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15303–15312, 2021

  17. [17]

    Denoising diffusion probabilistic models.Advances in neural infor- mation processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural infor- mation processing systems, 33:6840–6851, 2020

  18. [18]

    Imagen Video: High Definition Video Generation with Diffusion Models

    Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models.arXiv preprint arXiv:2210.02303, 2022

  19. [19]

    GAIA-1: A Generative World Model for Autonomous Driving

    Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving.arXiv preprint arXiv:2309.17080, 2023

  20. [20]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853– 17862, 2023

  21. [21]

    $\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

    Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al. π0.5: a vision-language-action model with open-world generalization.arXiv preprint arXiv:2504.16054, 2025

  22. [22]

    Vad: Vectorized scene representation for efficient autonomous driving

    Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023

  23. [23]

    Efficient diffusion policies for offline reinforcement learning.Advances in Neural Information Processing Systems, 36:67195–67212, 2023

    Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, and Shuicheng Yan. Efficient diffusion policies for offline reinforcement learning.Advances in Neural Information Processing Systems, 36:67195–67212, 2023

  24. [24]

    Learning to drive in a day

    Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, and Amar Shah. Learning to drive in a day. In2019 international conference on robotics and automation (ICRA), pages 8248–8254. IEEE, 2019

  25. [25]

    Aligning Text-to-Image Models using Human Feedback

    Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, and Shixiang Shane Gu. Aligning text- to-image models using human feedback.arXiv preprint arXiv:2302.12192, 2023

  26. [26]

    Discrete diffusion for reflective vision- language-action models in autonomous driving.arXiv preprint arXiv:2509.20109, 2025

    Pengxiang Li, Yinan Zheng, Yue Wang, Huimin Wang, Hang Zhao, Jingjing Liu, Xianyuan Zhan, Kun Zhan, and Xianpeng Lang. Discrete diffusion for reflective vision- language-action models in autonomous driving.arXiv preprint arXiv:2509.20109, 2025

  27. [27]

    Back to Basics: Let Denoising Generative Models Denoise

    Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025

  28. [28]

    ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

    Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.arXiv preprint arXiv:2506.08052, 2025

  29. [29]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal plan- ning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024

  30. [30]

    Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2024

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2024

  31. [31]

    Dichotomous diffusion policy optimization

    Ruiming Liang, Yinan Zheng, Kexin Zheng, Tianyi Tan, Jianxiong Li, Liyuan Mao, Zhihao Wang, Guang Chen, Hangjun Ye, Jingjing Liu, Jinqiao Wang, and Xianyuan Zhan. Dichotomous diffusion policy optimization. In The Fourteenth International Conference on Learning Representations, 2026

  32. [32]

    Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving

    Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. In Proceedings of the Computer Vision and Pattern Recog- nition Conference, pages 12037–12047, 2025

  33. [33]

    Rdt-1b: a diffusion foundation model for bimanual manipulation

    Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. Rdt-1b: a diffusion foundation model for bimanual manipulation. InThe Thirteenth International Conference on Learning Representations, 2025

  34. [34]

    Skill expansion and composition in parameter space

    Tenglong Liu, Jianxiong Li, Yinan Zheng, Haoyi Niu, Yixing Lan, Xin Xu, and Xianyuan Zhan. Skill expansion and composition in parameter space. InThe Thirteenth International Conference on Learning Representations, 2025

  35. [35]

    Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

    Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024

  36. [36]

    Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongx- uan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022

  37. [37]

    Contrastive energy prediction for exact energy-guided diffusion sampling in offline rein- forcement learning.arXiv preprint arXiv:2304.12824, 2023

    Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongx- uan Li, and Jun Zhu. Contrastive energy prediction for exact energy-guided diffusion sampling in offline rein- forcement learning.arXiv preprint arXiv:2304.12824, 2023

  38. [38]

    AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

    Ashvin Nair, Abhishek Gupta, Murtaza Dalal, and Sergey Levine. Awac: Accelerating online reinforce- ment learning with offline datasets.arXiv preprint arXiv:2006.09359, 2020

  39. [39]

    Dctdiff: Intriguing properties of image gener- ative modeling in the dct space

    Mang Ning, Mingxiao Li, Jianlin Su, Haozhe Jia, Lan- miao Liu, Martin Bene ˇs, Albert Ali Salah, and Itir Onal Ertugrul. Dctdiff: Intriguing properties of image gener- ative modeling in the dct space. InThe Forty-Second International Conference on Machine Learning (ICML 2025), 2025

  40. [40]

    Scalable diffu- sion models with transformers

    William Peebles and Saining Xie. Scalable diffu- sion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 4195–4205, 2023

  41. [41]

    Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

    Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning.arXiv preprint arXiv:1910.00177, 2019

  42. [42]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022

  43. [43]

    Diffusion policy policy optimization

    Allen Z Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Ben- jamin Burchfiel, Hongkai Dai, and Max Simchowitz. Diffusion policy policy optimization. InInternational Conference on Learning Representations, 2025

  44. [44]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

  45. [45]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  46. [46]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  47. [47]

    Weiss, Niru Mah- eswaranathan, and Surya Ganguli

    Jascha Sohl-Dickstein, Eric A. Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics, 2015

  48. [48]

    Score- based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

  49. [49]

    Flow matching-based au- tonomous driving planning with advanced interactive be- havior modeling

    Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin Zheng, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, and Jingjing Liu. Flow matching-based au- tonomous driving planning with advanced interactive be- havior modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  50. [50]

    Tesla ai day 2022

    Tesla. Tesla ai day 2022. https://www.youtube.com/ watch?v=ODSJsviD SU, 2022

  51. [51]

    Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

    Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Diamond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable au- tonomous driving in the long tail.arXiv preprint arXiv:2511.00088, 2025

  52. [52]

    Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving.arXiv preprint arXiv:2503.05689, 2025

    Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving.arXiv preprint arXiv:2503.05689, 2025

  53. [53]

    Imagereward: Learning and evaluating human prefer- ences for text-to-image generation.Advances in Neural Information Processing Systems, 36:15903–15935, 2023

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human prefer- ences for text-to-image generation.Advances in Neural Information Processing Systems, 36:15903–15935, 2023

  54. [54]

    The emergence of reproducibility and generalizability in diffusion models

    Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Peng Wang, Liyue Shen, and Qing Qu. The emergence of reproducibility and generalizability in diffusion models. arXiv preprint arXiv:2310.05264, 2023

  55. [55]

    Towards robust zero-shot reinforce- ment learning

    Kexin ZHENG, Lauriane Teyssier, Yinan Zheng, Yu Luo, and Xianyuan Zhan. Towards robust zero-shot reinforce- ment learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  56. [56]

    Safe offline reinforcement learning with feasibility-guided dif- fusion model

    Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, and Jingjing Liu. Safe offline reinforcement learning with feasibility-guided dif- fusion model. InThe Twelfth International Conference on Learning Representations, 2024

  57. [57]

    Diffusion-based planning for autonomous driving with flexible guidance

    Yinan Zheng, Ruiming Liang, Kexin ZHENG, Jinliang Zheng, Liyuan Mao, Jianxiong Li, Weihao Gu, Rui Ai, Shengbo Eben Li, Xianyuan Zhan, and Jingjing Liu. Diffusion-based planning for autonomous driving with flexible guidance. InThe Thirteenth International Conference on Learning Representations, 2025. APPENDIX A. Visualization of Real-Vehicle Testing Result...

  58. [58]

    Given the diffusion timestept, the predefined noise scheduleα t,σ t, and noised sampleτ t, these quantities are mutually convertible

    Diffusion Loss Space:The diffusion models are trained to predict one of the following quantities:τ 0,v t, orϵ. Given the diffusion timestept, the predefined noise scheduleα t,σ t, and noised sampleτ t, these quantities are mutually convertible. This provides the freedom of combinations of model predic- tions (parameterization) and loss functions, as in TA...

  59. [59]

    Theorem IV .1.The hybrid loss in Eq

    Proofs of the Theorems:We provide the proofs of the theorems for the hybrid loss and the RL objectives. Theorem IV .1.The hybrid loss in Eq. (5) is equivalent to a diffusion score matching loss underP-norm: Lhybrid =E τ v 0 ,ϵ,t[||τ v θ −τ v 0 ||2 P], whereP=I+∆t 2·ωM T Mis positive-definite. The minimizer of the loss is the marginal score function in Eq....

  60. [60]

    •Open-Loop Metrics

    Evaluation Metric Design:We consider two types of evaluation metrics: open-loop metrics for assessing trajectory quality, and closed-loop metrics for evaluating performance during real-vehicle testing. •Open-Loop Metrics. To perform a comparable open- loop evaluation, we consider widely adopted open-loop measures and compute a final score as the aggregate...

  61. [61]

    •Hybrid Loss Implementations

    Implementation Details:We provide the pseudocode for the hybrid loss, as well as the experimental setup details for imitation learning and reinforcement learning training. •Hybrid Loss Implementations. The pseudocode for hybrid loss with detach is shown in Algorithm 1, im- plemented in torch. Algorithm 1Hybrid Loss with Detach defdetached_integral(v, W, d...