Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving
Pith reviewed 2026-05-21 12:48 UTC · model grok-4.3
The pith
Diffusion models trained on real-vehicle data can plan end-to-end autonomous driving trajectories effectively in urban settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that diffusion models can be turned into practical planners for end-to-end autonomous driving by studying the loss space, trajectory representation, and data scaling effects, then using reinforcement learning to boost safety, resulting in the Hyper Diffusion Planner that achieves a tenfold performance gain in real-world tests.
What carries the argument
Hyper Diffusion Planner (HDP), a diffusion model framework for generating driving trajectories that incorporates insights from loss design, representation choices, data scaling, and post-training with reinforcement learning.
If this is right
- Insights into the diffusion loss space improve planning accuracy in driving tasks.
- Specific trajectory representations help manage the complexity of urban environments.
- Increasing the amount of training data leads to substantial performance gains.
- Reinforcement learning after initial training strengthens safety and robustness.
- The overall framework supports deployment in varied real-world driving conditions.
Where Pith is reading between the lines
- Diffusion approaches might extend to other decision-making problems in robotics where uncertainty is high.
- This work suggests end-to-end methods could eventually replace traditional modular autonomous driving systems.
- Testing the planner in more diverse weather or traffic conditions would reveal its limits.
- Combining this with other AI techniques like vision models could further improve results.
Load-bearing premise
The collected real-vehicle data and the specific road-testing conditions represent the full range of complex urban driving situations that would be encountered in wider use.
What would settle it
Demonstrating poor performance or safety issues when the system encounters driving conditions outside the collected data distribution, such as rare events or new locations, would disprove the effectiveness for general real-world use.
Figures
read the original abstract
Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Autonomous Driving (E2E AD), remains underexplored. In this study, we conducted a systematic and large-scale investigation to unleash the potential of the diffusion models as planners for E2E AD, based on a tremendous amount of real-vehicle data and road testing. Through comprehensive and carefully controlled studies, we identify key insights into the diffusion loss space, trajectory representation, and data scaling that significantly impact E2E planning performance. Moreover, we also provide an effective reinforcement learning post-training strategy to further enhance the safety and robustness of the learned planner. The resulting diffusion-based learning framework, Hyper Diffusion Planner (HDP), is deployed on a real-vehicle platform and evaluated across 6 urban driving scenarios and 200 km of real-world testing, achieving a notable 10x performance improvement over the base model. Our work demonstrates that diffusion models, when properly designed and trained, can serve as effective and scalable E2E AD planners for complex, real-world autonomous driving tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Hyper Diffusion Planner (HDP), a diffusion-based end-to-end autonomous driving framework. It reports systematic controlled studies identifying key factors in diffusion loss space, trajectory representation, and data scaling from large-scale real-vehicle data, proposes an RL post-training strategy to improve safety and robustness, and describes real-vehicle deployment evaluated across 6 urban scenarios and 200 km of road testing with a claimed 10x performance gain over the base model.
Significance. If the empirical claims hold after clarification, the work would be significant for robotics and autonomous systems by providing large-scale real-world evidence for diffusion planners in E2E AD, along with actionable insights on training and representation choices. The shift from simulation-only settings and the inclusion of RL post-training are positive elements that could influence subsequent research.
major comments (2)
- [Abstract] Abstract: the central claim of a 'notable 10x performance improvement' over the base model is presented without defining the performance metric, specifying the base model architecture or training details, reporting error bars, or describing the exact computation of the 10x factor; this is load-bearing for the empirical contribution and prevents verification of the result.
- [Evaluation] Evaluation section describing the 200 km testing: the reported coverage across only 6 urban scenarios provides no quantitative metrics on long-tail event frequency, weather/traffic diversity, or failure-mode statistics; in safety-critical planning, performance is dominated by rare events, so this leaves the generalization claim open to the possibility that observed gains reflect easier test conditions rather than a robust property of HDP.
minor comments (2)
- Define all acronyms (E2E AD, HDP) on first use and ensure consistent terminology between the abstract and main text.
- [Methods] The description of trajectory representation in the diffusion model would benefit from an explicit equation or diagram showing the conditioning and denoising process.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of clarity in our empirical claims and evaluation methodology. We address each major comment below and indicate revisions made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of a 'notable 10x performance improvement' over the base model is presented without defining the performance metric, specifying the base model architecture or training details, reporting error bars, or describing the exact computation of the 10x factor; this is load-bearing for the empirical contribution and prevents verification of the result.
Authors: We agree that the abstract requires additional context to make the central claim verifiable. In the revised manuscript, we have updated the abstract to define the performance metric as the human intervention rate (interventions per 100 km), specify the base model as the diffusion planner trained solely with the supervised loss without RL post-training, and describe the 10x factor as the ratio of intervention-free autonomous driving distance achieved by HDP relative to the base model over identical test routes. Error bars derived from three independent test runs are now referenced, with full details provided in the Evaluation section. These changes preserve abstract length while enabling verification. revision: yes
-
Referee: [Evaluation] Evaluation section describing the 200 km testing: the reported coverage across only 6 urban scenarios provides no quantitative metrics on long-tail event frequency, weather/traffic diversity, or failure-mode statistics; in safety-critical planning, performance is dominated by rare events, so this leaves the generalization claim open to the possibility that observed gains reflect easier test conditions rather than a robust property of HDP.
Authors: We concur that more granular characterization of test conditions would better support generalization claims. The revised Evaluation section now includes additional qualitative and semi-quantitative descriptions of the six scenarios, encompassing variations in traffic density, time-of-day conditions, and inclusion of challenging elements such as unprotected left turns and pedestrian-dense areas. The 200 km routes were chosen to reflect representative urban driving distributions based on prior fleet data. However, we did not collect exhaustive pre-specified logs for long-tail event frequencies or detailed failure-mode breakdowns during the original deployment. revision: partial
- Quantitative metrics on long-tail event frequency, weather/traffic diversity, and failure-mode statistics from the 200 km real-world testing, as these were not systematically recorded in the original experiments.
Circularity Check
No circularity: empirical claims rest on real-vehicle deployment and testing
full rationale
The paper presents its contributions as the outcome of large-scale empirical studies on real-vehicle data, controlled experiments identifying insights on diffusion loss space, trajectory representation and data scaling, plus an RL post-training strategy. The 10x performance gain is reported directly from deployment on a real-vehicle platform across 6 scenarios and 200 km of road testing. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations are described that would reduce any claimed result to its own inputs by construction. The work is therefore self-contained as an empirical investigation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Real-vehicle data and road testing conditions are representative of the full distribution of complex urban driving scenarios
Forward citations
Cited by 4 Pith papers
-
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
ReflectDrive-2 achieves 91.0 PDMS on NAVSIM with camera input by training a discrete diffusion model to self-edit trajectories via RL-aligned AutoEdit.
-
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
ReflectDrive-2 combines masked discrete diffusion with RL-aligned self-editing to generate and refine driving trajectories, reaching 91.0 PDMS on NAVSIM camera-only and 94.8 in best-of-6.
-
CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies
CRAFT is an on-policy RL fine-tuning framework that decomposes closed-loop policy gradients into a group-normalized counterfactual proxy plus residual correction from interaction events, achieving top closed-loop perf...
-
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
RAD-2 uses a diffusion generator and RL discriminator to cut collision rates by 56% in closed-loop autonomous driving planning.
Reference graph
Works this paper leans on
-
[1]
Cosmos World Foundation Model Platform for Physical AI
Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Bal- aji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, et al. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Interpretable goal-based predic- tion and planning for autonomous driving
Stefano V Albrecht, Cillian Brewitt, John Wilhelm, Balint Gyevnar, Francisco Eiras, Mihai Dobre, and Sub- ramanian Ramamoorthy. Interpretable goal-based predic- tion and planning for autonomous driving. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1043–1049. IEEE, 2021
work page 2021
-
[3]
Improving image generation with better captions.Computer Science
James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jian- feng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, et al. Improving image generation with better captions.Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf, 2(3):8, 2023
work page 2023
-
[4]
Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Lucy Xiaoyang Shi, James Tanner, Quan Vuong, Anna Walling, Haohuan Wang, and Ury Zhilinsky.π 0: A vi...
work page 2024
-
[5]
Training diffusion models with reinforcement learning
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning. InInternational Conference on Learning Representations, 2024
work page 2024
-
[6]
End to End Learning for Self-Driving Cars
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[7]
nuScenes: A multimodal dataset for autonomous driving
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving.arXiv preprint arXiv:1903.11027, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[8]
NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles. arXiv preprint arXiv:2106.11810, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[9]
End-to-end autonomous driving: Challenges and frontiers,
Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end au- tonomous driving: Challenges and frontiers.arXiv preprint arXiv:2306.16927, 2023
-
[10]
Dif- fusion policy: Visuomotor policy learning via action diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion policy: Visuomotor policy learning via action diffusion. InProceedings of Robotics: Science and Systems (RSS), 2023
work page 2023
-
[11]
Directly fine-tuning diffusion models on differen- tiable rewards
Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differen- tiable rewards. InThe Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[12]
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. Diffusion models in vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 45(9):10850–10869, 2023
work page 2023
-
[13]
Parting with misconceptions about learning-based vehicle motion planning
Daniel Dauner, Marcel Hallgarten, Andreas Geiger, and Kashyap Chitta. Parting with misconceptions about learning-based vehicle motion planning. InConference on Robot Learning (CoRL), 2023
work page 2023
-
[14]
Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmark- ing
Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmark- ing. InAdvances in Neural Information Processing Systems (NeurIPS), 2024
work page 2024
-
[15]
Baidu apollo em motion planner, 2018
Haoyang Fan, Fan Zhu, Changchun Liu, Liangliang Zhang, Li Zhuang, Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, and Qi Kong. Baidu apollo em motion planner, 2018
work page 2018
-
[16]
Densetnt: End- to-end trajectory prediction from dense goal sets
Junru Gu, Chen Sun, and Hang Zhao. Densetnt: End- to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15303–15312, 2021
work page 2021
-
[17]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural infor- mation processing systems, 33:6840–6851, 2020
work page 2020
-
[18]
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models.arXiv preprint arXiv:2210.02303, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[19]
GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving.arXiv preprint arXiv:2309.17080, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853– 17862, 2023
work page 2023
-
[21]
$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, et al. π0.5: a vision-language-action model with open-world generalization.arXiv preprint arXiv:2504.16054, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
Vad: Vectorized scene representation for efficient autonomous driving
Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023
work page 2023
-
[23]
Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, and Shuicheng Yan. Efficient diffusion policies for offline reinforcement learning.Advances in Neural Information Processing Systems, 36:67195–67212, 2023
work page 2023
-
[24]
Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, and Amar Shah. Learning to drive in a day. In2019 international conference on robotics and automation (ICRA), pages 8248–8254. IEEE, 2019
work page 2019
-
[25]
Aligning Text-to-Image Models using Human Feedback
Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, and Shixiang Shane Gu. Aligning text- to-image models using human feedback.arXiv preprint arXiv:2302.12192, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Pengxiang Li, Yinan Zheng, Yue Wang, Huimin Wang, Hang Zhao, Jingjing Liu, Xianyuan Zhan, Kun Zhan, and Xianpeng Lang. Discrete diffusion for reflective vision- language-action models in autonomous driving.arXiv preprint arXiv:2509.20109, 2025
-
[27]
Back to Basics: Let Denoising Generative Models Denoise
Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.arXiv preprint arXiv:2506.08052, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal plan- ning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2024
work page 2024
-
[31]
Dichotomous diffusion policy optimization
Ruiming Liang, Yinan Zheng, Kexin Zheng, Tianyi Tan, Jianxiong Li, Liyuan Mao, Zhihao Wang, Guang Chen, Hangjun Ye, Jingjing Liu, Jinqiao Wang, and Xianyuan Zhan. Dichotomous diffusion policy optimization. In The Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[32]
Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving
Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. In Proceedings of the Computer Vision and Pattern Recog- nition Conference, pages 12037–12047, 2025
work page 2025
-
[33]
Rdt-1b: a diffusion foundation model for bimanual manipulation
Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, and Jun Zhu. Rdt-1b: a diffusion foundation model for bimanual manipulation. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[34]
Skill expansion and composition in parameter space
Tenglong Liu, Jianxiong Li, Yinan Zheng, Haoyi Niu, Yixing Lan, Xin Xu, and Xianyuan Zhan. Skill expansion and composition in parameter space. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[35]
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongx- uan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps.Advances in Neural Information Processing Systems, 35:5775–5787, 2022
work page 2022
-
[37]
Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongx- uan Li, and Jun Zhu. Contrastive energy prediction for exact energy-guided diffusion sampling in offline rein- forcement learning.arXiv preprint arXiv:2304.12824, 2023
-
[38]
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair, Abhishek Gupta, Murtaza Dalal, and Sergey Levine. Awac: Accelerating online reinforce- ment learning with offline datasets.arXiv preprint arXiv:2006.09359, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[39]
Dctdiff: Intriguing properties of image gener- ative modeling in the dct space
Mang Ning, Mingxiao Li, Jianlin Su, Haozhe Jia, Lan- miao Liu, Martin Bene ˇs, Albert Ali Salah, and Itir Onal Ertugrul. Dctdiff: Intriguing properties of image gener- ative modeling in the dct space. InThe Forty-Second International Conference on Machine Learning (ICML 2025), 2025
work page 2025
-
[40]
Scalable diffu- sion models with transformers
William Peebles and Saining Xie. Scalable diffu- sion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 4195–4205, 2023
work page 2023
-
[41]
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning.arXiv preprint arXiv:1910.00177, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[42]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[43]
Diffusion policy policy optimization
Allen Z Ren, Justin Lidard, Lars Lien Ankile, Anthony Simeonov, Pulkit Agrawal, Anirudha Majumdar, Ben- jamin Burchfiel, Hongkai Dai, and Max Simchowitz. Diffusion policy policy optimization. InInternational Conference on Learning Representations, 2025
work page 2025
-
[44]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
work page 2022
-
[45]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[46]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Weiss, Niru Mah- eswaranathan, and Surya Ganguli
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics, 2015
work page 2015
-
[48]
Score- based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021
work page 2021
-
[49]
Flow matching-based au- tonomous driving planning with advanced interactive be- havior modeling
Tianyi Tan, Yinan Zheng, Ruiming Liang, Zexu Wang, Kexin Zheng, Jinliang Zheng, Jianxiong Li, Xianyuan Zhan, and Jingjing Liu. Flow matching-based au- tonomous driving planning with advanced interactive be- havior modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[50]
Tesla. Tesla ai day 2022. https://www.youtube.com/ watch?v=ODSJsviD SU, 2022
work page 2022
-
[51]
Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Diamond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable au- tonomous driving in the long tail.arXiv preprint arXiv:2511.00088, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving.arXiv preprint arXiv:2503.05689, 2025
-
[53]
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human prefer- ences for text-to-image generation.Advances in Neural Information Processing Systems, 36:15903–15935, 2023
work page 2023
-
[54]
The emergence of reproducibility and generalizability in diffusion models
Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Peng Wang, Liyue Shen, and Qing Qu. The emergence of reproducibility and generalizability in diffusion models. arXiv preprint arXiv:2310.05264, 2023
-
[55]
Towards robust zero-shot reinforce- ment learning
Kexin ZHENG, Lauriane Teyssier, Yinan Zheng, Yu Luo, and Xianyuan Zhan. Towards robust zero-shot reinforce- ment learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[56]
Safe offline reinforcement learning with feasibility-guided dif- fusion model
Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, and Jingjing Liu. Safe offline reinforcement learning with feasibility-guided dif- fusion model. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[57]
Diffusion-based planning for autonomous driving with flexible guidance
Yinan Zheng, Ruiming Liang, Kexin ZHENG, Jinliang Zheng, Liyuan Mao, Jianxiong Li, Weihao Gu, Rui Ai, Shengbo Eben Li, Xianyuan Zhan, and Jingjing Liu. Diffusion-based planning for autonomous driving with flexible guidance. InThe Thirteenth International Conference on Learning Representations, 2025. APPENDIX A. Visualization of Real-Vehicle Testing Result...
work page 2025
-
[58]
Diffusion Loss Space:The diffusion models are trained to predict one of the following quantities:τ 0,v t, orϵ. Given the diffusion timestept, the predefined noise scheduleα t,σ t, and noised sampleτ t, these quantities are mutually convertible. This provides the freedom of combinations of model predic- tions (parameterization) and loss functions, as in TA...
-
[59]
Theorem IV .1.The hybrid loss in Eq
Proofs of the Theorems:We provide the proofs of the theorems for the hybrid loss and the RL objectives. Theorem IV .1.The hybrid loss in Eq. (5) is equivalent to a diffusion score matching loss underP-norm: Lhybrid =E τ v 0 ,ϵ,t[||τ v θ −τ v 0 ||2 P], whereP=I+∆t 2·ωM T Mis positive-definite. The minimizer of the loss is the marginal score function in Eq....
-
[60]
Evaluation Metric Design:We consider two types of evaluation metrics: open-loop metrics for assessing trajectory quality, and closed-loop metrics for evaluating performance during real-vehicle testing. •Open-Loop Metrics. To perform a comparable open- loop evaluation, we consider widely adopted open-loop measures and compute a final score as the aggregate...
-
[61]
Implementation Details:We provide the pseudocode for the hybrid loss, as well as the experimental setup details for imitation learning and reinforcement learning training. •Hybrid Loss Implementations. The pseudocode for hybrid loss with detach is shown in Algorithm 1, im- plemented in torch. Algorithm 1Hybrid Loss with Detach defdetached_integral(v, W, d...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.