G2DP: Diffusion Planning with Spatio-Temporal Grid Guidance

Alessandro Canevaro; Hang Yu; Johannes Betz; Julian Jordan; Julian Schmidt; Marc Kaufeld; Peizheng Li; Silvan Lindner; Wilhelm Stork; Ye Jin

arxiv: 2606.26017 · v2 · pith:DITIHYSLnew · submitted 2026-06-24 · 💻 cs.RO

G2DP: Diffusion Planning with Spatio-Temporal Grid Guidance

Hang Yu , Ye Jin , Alessandro Canevaro , Julian Schmidt , Julian Jordan , Peizheng Li , Marc Kaufeld , Silvan Lindner

show 2 more authors

Johannes Betz Wilhelm Stork

This is my paper

Pith reviewed 2026-06-26 05:11 UTC · model grok-4.3

classification 💻 cs.RO

keywords diffusion planningautonomous drivingspatio-temporal guidanceoccupancy distributionmotion planningclosed-loop evaluationnuPlan benchmark

0 comments

The pith

Diffusion planners guided by dense spatio-temporal cost grids from occupancy maps outperform imitation baselines on nuPlan reactive score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to make diffusion-based planners for autonomous driving more reliable in dense traffic by adding explicit safety guidance during the stochastic denoising process. It builds a differentiable cost volume that combines predicted future occupancy probabilities with route progress, then casts this volume as a continuous energy functional whose gradients steer generated trajectories toward safe regions. The guidance operates at inference time without retraining the base diffusion model. Closed-loop tests on nuPlan and zero-shot transfers to other benchmarks report gains in reactive scores and collision avoidance. A sympathetic reader would care because the method offers a way to enforce environmental constraints inside generative planners that otherwise produce unsafe samples.

Core claim

G2DP constructs a differentiable spatio-temporal cost volume by fusing probabilistic future occupancy distributions with a route-progress map. By formulating this volume as a continuous safety energy functional, it injects dense gradients directly into the denoising loop, actively steering trajectory generation toward collision-free and progress-optimal regions.

What carries the argument

The spatio-temporal cost volume, formed by fusing probabilistic occupancy distributions and route-progress maps and used as a differentiable safety energy functional to supply guidance gradients inside the diffusion denoising loop.

If this is right

The guided planner records a 7.2-point gain in reactive score over the strongest imitation-learning baseline on nuPlan.
Zero-shot transfer maintains top scores on interPlan and DeepScenario benchmarks.
Collision avoidance improves by 10.15 points over the unguided diffusion approach on interPlan.
Dense grid guidance enables robust closed-loop execution in interactive scenes without post-hoc refinement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same grid-construction approach could be inserted into other generative planners that also produce stochastic samples.
If occupancy models improve independently, the guidance signal would become stronger without changes to the planner itself.
The method may reduce reliance on hand-crafted geometric queries that current guidance techniques use.
Real-vehicle deployment would require checking whether sensor noise in occupancy estimates propagates through the energy functional.

Load-bearing premise

The probabilistic future occupancy distributions are accurate enough that the resulting safety energy functional can be differentiated and injected into the denoising loop without destabilizing trajectories or introducing artifacts.

What would settle it

A closed-loop test in which inaccurate occupancy predictions cause the grid-guided planner to record more collisions than the identical unguided diffusion baseline.

Figures

Figures reproduced from arXiv: 2606.26017 by Alessandro Canevaro, Hang Yu, Johannes Betz, Julian Jordan, Julian Schmidt, Marc Kaufeld, Peizheng Li, Silvan Lindner, Wilhelm Stork, Ye Jin.

**Figure 2.** Figure 2: Model architecture of the G2DP. The system utilizes a DiT-Decoder to process ego future noisy trajectories conditioned [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative closed-loop comparisons of G2DP (ours), PlanTF, PLUTO*, and Diffusion Planner with planned trajectories. Top (DeepScenario): Only G2DP maintains efficient forward progress without collisions in highly interactive traffic. Bottom (nuPlan): Only G2DP makes a timely lane change and safely overtakes the stopped vehicle. TABLE III: InterPlan evaluation via nuPlan metrics. Values in parentheses indic… view at source ↗

**Figure 5.** Figure 5: Cost grid guidance at denoising step t=9. Each panel shows the BEV cost grid ψτ and the guidance gradients evaluated at a selected trajectory timestep τ ∈ {7, 14, 21, 28}. White dots denote the current denoising trajectory xt, and the arrows visualize the corresponding guidance gradients that push the trajectory toward lower cost regions [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Guidance scale and window ablation on Test14-hard [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 6.** Figure 6: Impact of the occupancy weight on the Test14-hard. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

In autonomous driving, diffusion-based planners have emerged as a promising paradigm for robust motion planning in dense and interactive traffic, as they can effectively model diverse driving behaviors. However, their inherent stochasticity often requires explicit guidance during denoising to ensure safety and route adherence for robust closed-loop execution. Existing guidance typically relies on sparse, entity-centric geometric queries or post-hoc refinement, yielding limited situational awareness and fragile performance in interactive scenes. To address this issue, we propose G2DP (Grid-Guided Diffusion Planning), a diffusion-based planner that directly enforces dense environmental constraints through inference-time guidance. Specifically, G2DP constructs a differentiable spatio-temporal cost volume by fusing probabilistic future occupancy distributions with a route-progress map. By formulating this volume as a continuous safety energy functional, it injects dense gradients directly into the denoising loop, actively steering trajectory generation toward collision-free and progress-optimal regions. Extensive closed-loop evaluations show that G2DP achieves state-of-the-art performance on nuPlan, outperforming the strongest imitation-learning baseline by +7.2 points in reactive score. It further maintains top scores in zero-shot transfers to interPlan and DeepScenario benchmarks, with collision avoidance improving by +10.15 over the unguided approach on interPlan. These results demonstrate that spatio-temporal cost grids serve as an effective representation for robust guidance in diffusion-based planning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

G2DP adds a fused spatio-temporal grid cost volume for direct gradient guidance in diffusion driving planners, but the abstract supplies no ablations or occupancy validation to back the claimed gains.

read the letter

The main takeaway is that this paper gives diffusion planners a dense grid-based guidance signal built from probabilistic occupancy and route progress, injected as gradients during denoising.

What stands out as new is the construction of that continuous safety energy functional from the fused volume, meant to steer trajectories away from collisions while keeping progress. It moves past the sparse entity queries or post-hoc fixes mentioned in the abstract.

The reported numbers are the +7.2 reactive score lift on nuPlan over the top imitation baseline and the +10.15 collision improvement on interPlan transfer. Those are concrete if they hold.

The soft spots are clear from the abstract alone. No baseline details, no ablation on the occupancy fusion or guidance strength, and no check on whether the predicted distributions stay accurate enough in interactive cases or whether repeated differentiation stays stable. The stress-test note on occupancy quality and artifact risk lands directly because nothing in the summary addresses it.

This is aimed at people already working on generative planners for closed-loop driving. A reader who wants a practical mechanism for safety constraints in diffusion loops could pick up the grid idea and try it.

The work deserves peer review. The core mechanism is well-motivated and the benchmarks are relevant; a referee can check whether the experiments close the gaps the abstract leaves open.

Referee Report

2 major / 1 minor

Summary. The manuscript presents G2DP, a diffusion-based planner for autonomous driving that constructs a differentiable spatio-temporal cost volume by fusing probabilistic future occupancy distributions with a route-progress map. This volume is formulated as a continuous safety energy functional whose gradients are injected directly into the diffusion denoising loop to steer trajectories toward collision-free and progress-optimal regions. The authors claim state-of-the-art closed-loop performance on nuPlan, with a +7.2 point improvement in reactive score over the strongest imitation-learning baseline, plus strong zero-shot transfer to interPlan and DeepScenario benchmarks including a +10.15 collision-avoidance gain on interPlan.

Significance. If the reported gains prove robust, the use of dense, differentiable grid-based guidance could advance inference-time control of diffusion planners in interactive driving by replacing sparse geometric queries with continuous safety energies. The zero-shot transfer results would be notable if they hold without retraining, as they suggest the grid representation captures transferable environmental constraints.

major comments (2)

[Abstract] Abstract: The central performance claims (+7.2 reactive score on nuPlan, +10.15 collision improvement on interPlan) rest on the accuracy of the fused probabilistic occupancy distributions and the numerical stability of the continuous safety energy functional when its gradients are repeatedly injected into the denoising loop. The abstract supplies no validation metrics on occupancy prediction quality in interactive regimes, no ablations on guidance strength or weighting, and no analysis of potential trajectory artifacts or mode collapse, leaving the load-bearing assumption untested in the provided text.
[Abstract] Abstract: The comparison to the 'strongest imitation-learning baseline' and the zero-shot transfer claims require explicit details on baseline implementations, number of evaluation runs, statistical significance, and whether the unguided diffusion variant uses identical sampling budgets; without these, the magnitude of the reported gains cannot be assessed as load-bearing evidence for the grid-guidance contribution.

minor comments (1)

[Abstract] Abstract: The term 'reactive score' is used without definition or reference to its computation; a brief parenthetical or citation would improve clarity for readers outside the nuPlan community.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the two major comments below and will revise the manuscript to strengthen the presentation of supporting evidence.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claims (+7.2 reactive score on nuPlan, +10.15 collision improvement on interPlan) rest on the accuracy of the fused probabilistic occupancy distributions and the numerical stability of the continuous safety energy functional when its gradients are repeatedly injected into the denoising loop. The abstract supplies no validation metrics on occupancy prediction quality in interactive regimes, no ablations on guidance strength or weighting, and no analysis of potential trajectory artifacts or mode collapse, leaving the load-bearing assumption untested in the provided text.

Authors: We agree the abstract is concise and omits these supporting details. The full manuscript validates occupancy prediction quality in interactive regimes (Section 4.3), provides ablations on guidance strength/weighting (Section 5.2), and analyzes trajectory artifacts plus mode collapse (Section 5.4). We will revise the abstract to briefly reference these results and direct readers to the relevant sections. revision: yes
Referee: [Abstract] Abstract: The comparison to the 'strongest imitation-learning baseline' and the zero-shot transfer claims require explicit details on baseline implementations, number of evaluation runs, statistical significance, and whether the unguided diffusion variant uses identical sampling budgets; without these, the magnitude of the reported gains cannot be assessed as load-bearing evidence for the grid-guidance contribution.

Authors: The manuscript details baseline implementations in Section 4.1, reports results over multiple evaluation runs with standard deviations and significance testing, and confirms identical sampling budgets for the unguided variant. We will revise the abstract to explicitly note the use of identical sampling budgets and refer to the experimental section for run counts and statistical details. revision: yes

Circularity Check

0 steps flagged

No circularity: method introduces independent guidance components

full rationale

The paper presents G2DP as constructing a new differentiable spatio-temporal cost volume from fused probabilistic occupancy distributions and route-progress maps, then injecting its gradients as guidance into the diffusion denoising process. No equations, predictions, or performance claims in the provided text reduce to quantities defined by construction from fitted parameters of the same experiments, self-citations for uniqueness theorems, or renamed known results. The reported gains (+7.2 reactive score, +10.15 collision improvement) are framed as empirical outcomes of closed-loop evaluation on nuPlan and zero-shot transfers, with the central mechanism (dense grid guidance) adding independent content rather than tautologically following from inputs. This is the common case of a self-contained engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented physical entities are identifiable. The cost volume is a constructed representation rather than a postulated physical entity.

pith-pipeline@v0.9.1-grok · 5796 in / 1028 out tokens · 24732 ms · 2026-06-26T05:11:37.159354+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 3 linked inside Pith

[1]

Baidu apollo em motion planner,

H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong, “Baidu apollo em motion planner,”arXiv preprint arXiv:1807.08048, 2018

Pith/arXiv arXiv 2018
[2]

Parting with misconceptions about learning-based vehicle motion planning,

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” in Conference on Robot Learning (CoRL), 2023

2023
[3]

Urban driver: Learning to drive from real-world demonstrations using policy gradients,

O. Scheel, L. Bergamini, M. Wołczyk, B. Osi ´nski, and P. Ondruska, “Urban driver: Learning to drive from real-world demonstrations using policy gradients,” inConference on Robot Learning (CoRL), 2021

2021
[4]

Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,

M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,”arXiv preprint arXiv:1812.03079, 2018

Pith/arXiv arXiv 2018
[5]

Safetynet: Safe planning for real-world self-driving vehicles using machine-learned policies,

M. Vitelli, Y . Chang, Y . Ye, M. Wołczyk, B. Osi ´nski, M. Niendorf, H. Grimmett, Q. Huang, A. Jain, and P. Ondruska, “Safetynet: Safe planning for real-world self-driving vehicles using machine-learned policies,”arXiv preprint arXiv:2109.13602, 2021

arXiv 2021
[6]

From prediction to plan- ning with goal conditioned lane graph traversals,

M. Hallgarten, M. Stoll, and A. Zell, “From prediction to plan- ning with goal conditioned lane graph traversals,”arXiv preprint arXiv:2302.07753, 2023

arXiv 2023
[7]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning, 2022

2022
[8]

Diffusion-based planning for autonomous driving with flexible guidance,

Y . Zheng, R. Liang, K. Zheng, J. Zheng, L. Mao, J. Li, W. Gu, R. Ai, S. E. Li, X. Zhan, and J. Liu, “Diffusion-based planning for autonomous driving with flexible guidance,” inInternational Conference on Learning Representations (ICLR), 2025

2025
[9]

Diverse controllable diffusion policy with signal temporal logic,

Y . Meng and C. Fan, “Diverse controllable diffusion policy with signal temporal logic,”IEEE Robotics and Automation Letters, 2024

2024
[10]

Guided conditional diffusion for controllable traffic simu- lation,

Z. Zhong, D. Rempe, D. Xu, Y . Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simu- lation,” inIEEE International Conference on Robotics and Automation (ICRA), 2023

2023
[11]

Diffusion predictive control with constraints,

R. R ¨omer, A. v. Rohr, and A. Schoellig, “Diffusion predictive control with constraints,” inProceedings of the 7th Annual Learning for Dynamics & Control Conference. PMLR, 2025

2025
[12]

Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,

H. Caesar, J. Kabzan, K. Tan, and et al., “Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,” inCVPR ADP3 Workshop, 2021

2021
[13]

Flow matching-based autonomous driving planning with advanced interactive behavior modeling,

T. Tan, Y . Zheng, R. Liang, Z. Wang, K. Zheng, J. Zheng, J. Li, X. Zhan, and J. Liu, “Flow matching-based autonomous driving planning with advanced interactive behavior modeling,” inThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[14]

Can vehicle motion planning generalize to realistic long-tail scenarios?

M. Hallgarten, J. Zapata, M. Stoll, K. Renz, and A. Zell, “Can vehicle motion planning generalize to realistic long-tail scenarios?” 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

2024
[15]

Highly accurate and diverse traffic data: The deepscenario open 3d dataset,

O. Dhaouadi, J. Meier, L. Wahl, J. Kaiser, L. Scalerandi, N. Wan- delburg, Z. Zhou, N. Berinpanathan, H. Banzhaf, and D. Cremers, “Highly accurate and diverse traffic data: The deepscenario open 3d dataset,” in2025 IEEE Intelligent Vehicles Symposium (IV), 2025

2025
[16]

Congested traffic states in empirical observations and microscopic simulations,

M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,”Physical Review E, 2000

2000
[17]

Optimal trajectory generation for dynamic street scenarios in a fren ´et frame,

M. Werling, J. Ziegler, S. Kammel, and S. Thrun, “Optimal trajectory generation for dynamic street scenarios in a fren ´et frame,” in2010 IEEE International Conference on Robotics and Automation, 2010

2010
[18]

Search-based optimal motion planning for automated driving,

Z. Ajanovi ´c, B. Lacevi ´c, B. Shyrokau, M. Stolz, and M. Horn, “Search-based optimal motion planning for automated driving,” in IEEE International Conference on Intelligent Robots and Systems (IROS), 2018

2018
[19]

Pre- dictionnet: Real-time joint probabilistic traffic prediction for plan- ning, control, and simulation,

A. Kamenev, L. Wang, O. B. Boer, I. Kulkarni, B. Kartal, A. Molchanov, S. Birchfield, D. Nister, and N. Smolyanskiy, “Pre- dictionnet: Real-time joint probabilistic traffic prediction for plan- ning, control, and simulation,” inIEEE International Conference on Robotics and Automation (ICRA), 2022

2022
[20]

Urban driving with conditional imitation learning,

J. Hawke, R. Shen, C. Gurau, S. Sharma, D. Reda, N. Nikolov, P. Mazur, S. Micklethwaite, N. Griffiths, A. Shah, and A. Kendall, “Urban driving with conditional imitation learning,”2020 IEEE Inter- national Conference on Robotics and Automation (ICRA), 2019

2020
[21]

Learning to drive in a day,

A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V .-D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in2019 International Conference on Robotics and Automation (ICRA), 2019

2019
[22]

Rethink- ing imitation-based planner for autonomous driving,

J. Cheng, Y . Chen, X. Mei, B. Yang, B. Li, and M. Liu, “Rethink- ing imitation-based planner for autonomous driving,”arXiv preprint arXiv:2309.10443, 2023

arXiv 2023
[23]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,

Z. Huang, H. Liu, and C. Lv, “Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023
[24]

Powerbev: A powerful yet lightweight framework for instance pre- diction in bird’s-eye view,

P. Li, S. Ding, X. Chen, N. Hanselmann, M. Cordts, and J. Gall, “Powerbev: A powerful yet lightweight framework for instance pre- diction in bird’s-eye view,” inProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, 2023

2023
[25]

Ago: Adaptive grounding for open world 3d occupancy prediction,

P. Li, S. Ding, Y . Zhou, Q. Zhang, O. Inak, L. Triess, N. Hanselmann, M. Cordts, and A. Zell, “Ago: Adaptive grounding for open world 3d occupancy prediction,” 2025

2025
[26]

Spacedrive: Infusing spatial awareness into vlm-based autonomous driving,

P. Li, Z. Zhang, D. Holtz, H. Yu, Y . Yang, Y . Lai, R. Song, A. Geiger, and A. Zell, “Spacedrive: Infusing spatial awareness into vlm-based autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

2026
[27]

Traffic and safety rule compliance of humans in diverse driving situations,

M. Kurenkov, S. Marvi, J. Schmidt, C. B. Rist, A. Canevaro, H. Yu, J. Jordan, G. Schildbach, and A. Valada, “Traffic and safety rule compliance of humans in diverse driving situations,”arXiv preprint arxiv:2411.01909, 2024

arXiv 2024
[28]

DTPP: Differentiable Joint Conditional Prediction and Cost Evaluation for Tree Policy Planning in Autonomous Driving,

Z. Huang, P. Karkus, B. Ivanovic, Y . Chen, M. Pavone, and C. Lv, “DTPP: Differentiable Joint Conditional Prediction and Cost Evaluation for Tree Policy Planning in Autonomous Driving,” Feb. 2024, arXiv:2310.05885 [cs]. [Online]. Available: http://arxiv.org/abs/ 2310.05885

arXiv 2024
[29]

Pluto: Pushing the limit of imita- tion learning-based planning for autonomous driving,

J. Cheng, Y . Chen, and Q. Chen, “Pluto: Pushing the limit of imita- tion learning-based planning for autonomous driving,”arXiv preprint arXiv:2404.14327, 2024

arXiv 2024
[30]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”arXiv preprint arxiv:2006.11239, 2020

Pith/arXiv arXiv 2006
[31]

Deep unsupervised learning using nonequilibrium thermodynamics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inProceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei, Eds. PMLR, 2015

2015
[32]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

C. M. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, and D. Anguelov, “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[33]

Generalizing motion planners with mixture of experts for autonomous driving,

Q. Sun, H. Wang, J. Zhan, F. Nie, X. Wen, L. Xu, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Generalizing motion planners with mixture of experts for autonomous driving,”2025 IEEE International Conference on Robotics and Automation (ICRA), 2024

2025
[34]

Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers,

Y . Chen, Y . Wang, and Z. Zhang, “Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers,” arXiv preprint arxiv:2412.18607, 2024

arXiv 2024
[35]

Cobl-diffusion: Diffusion-based conditional robot planning in dynamic environments using control barrier and lya- punov functions,

K. Mizuta and K. Leung, “Cobl-diffusion: Diffusion-based conditional robot planning in dynamic environments using control barrier and lya- punov functions,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

2024
[36]

Safeflow: Safe robot motion planning with flow matching via control barrier functions,

X. Dai, Z. Yang, D. Yu, F. Liu, H. Sadeghian, S. Haddadin, and S. Hirche, “Safeflow: Safe robot motion planning with flow matching via control barrier functions,”arXiv preprint arxiv:2504.08661, 2025

arXiv 2025
[37]

Classifier-free diffusion guidance,

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” inNeurIPS 2021 Workshop on Deep Generative Models and Downstream Appli- cations, 2021

2021
[38]

Hype: Hybrid planning with ego proposal-conditioned predictions,

H. Yu, J. Jordan, J. Schmidt, S. Lindner, A. Canevaro, and W. Stork, “Hype: Hybrid planning with ego proposal-conditioned predictions,” in2025 IEEE 28th International Conference on Intelligent Transporta- tion Systems (ITSC), 2025

2025
[39]

Tree-structured policy planning with learned behavior models,

Y . Chen, P. Karkus, B. Ivanovic, X. Weng, and M. Pavone, “Tree-structured policy planning with learned behavior models,” in IEEE International Conference on Robotics and Automation (ICRA), 2023

2023
[40]

Navidiffusor: Cost-guided diffusion model for visual navigation,

Y . Zeng, H. Ren, S. Wang, J. Huang, and H. Cheng, “Navidiffusor: Cost-guided diffusion model for visual navigation,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

2025
[41]

Scalable diffusion models with transform- ers,

W. Peebles and S. Xie, “Scalable diffusion models with transform- ers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023
[42]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer-Assisted Intervention (MICCAI), 2015

2015
[43]

Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps,

C. Lu, Y . Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps,” inProceedings of the 36th International Conference on Neural Information Processing Systems, 2022

2022
[44]

Diffusion models beat GANs on image synthesis,

P. Dhariwal and A. Q. Nichol, “Diffusion models beat GANs on image synthesis,” inAdvances in Neural Information Processing Systems, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021

2021
[45]

Generalizing motion planners with mixture of experts for autonomous driving,

Q. Sun, H. Wang, J. Zhan, F. Nie, X. Wen, L. Xu, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Generalizing motion planners with mixture of experts for autonomous driving,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

2025

[1] [1]

Baidu apollo em motion planner,

H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong, “Baidu apollo em motion planner,”arXiv preprint arXiv:1807.08048, 2018

Pith/arXiv arXiv 2018

[2] [2]

Parting with misconceptions about learning-based vehicle motion planning,

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” in Conference on Robot Learning (CoRL), 2023

2023

[3] [3]

Urban driver: Learning to drive from real-world demonstrations using policy gradients,

O. Scheel, L. Bergamini, M. Wołczyk, B. Osi ´nski, and P. Ondruska, “Urban driver: Learning to drive from real-world demonstrations using policy gradients,” inConference on Robot Learning (CoRL), 2021

2021

[4] [4]

Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,

M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,”arXiv preprint arXiv:1812.03079, 2018

Pith/arXiv arXiv 2018

[5] [5]

Safetynet: Safe planning for real-world self-driving vehicles using machine-learned policies,

M. Vitelli, Y . Chang, Y . Ye, M. Wołczyk, B. Osi ´nski, M. Niendorf, H. Grimmett, Q. Huang, A. Jain, and P. Ondruska, “Safetynet: Safe planning for real-world self-driving vehicles using machine-learned policies,”arXiv preprint arXiv:2109.13602, 2021

arXiv 2021

[6] [6]

From prediction to plan- ning with goal conditioned lane graph traversals,

M. Hallgarten, M. Stoll, and A. Zell, “From prediction to plan- ning with goal conditioned lane graph traversals,”arXiv preprint arXiv:2302.07753, 2023

arXiv 2023

[7] [7]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning, 2022

2022

[8] [8]

Diffusion-based planning for autonomous driving with flexible guidance,

Y . Zheng, R. Liang, K. Zheng, J. Zheng, L. Mao, J. Li, W. Gu, R. Ai, S. E. Li, X. Zhan, and J. Liu, “Diffusion-based planning for autonomous driving with flexible guidance,” inInternational Conference on Learning Representations (ICLR), 2025

2025

[9] [9]

Diverse controllable diffusion policy with signal temporal logic,

Y . Meng and C. Fan, “Diverse controllable diffusion policy with signal temporal logic,”IEEE Robotics and Automation Letters, 2024

2024

[10] [10]

Guided conditional diffusion for controllable traffic simu- lation,

Z. Zhong, D. Rempe, D. Xu, Y . Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simu- lation,” inIEEE International Conference on Robotics and Automation (ICRA), 2023

2023

[11] [11]

Diffusion predictive control with constraints,

R. R ¨omer, A. v. Rohr, and A. Schoellig, “Diffusion predictive control with constraints,” inProceedings of the 7th Annual Learning for Dynamics & Control Conference. PMLR, 2025

2025

[12] [12]

Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,

H. Caesar, J. Kabzan, K. Tan, and et al., “Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles,” inCVPR ADP3 Workshop, 2021

2021

[13] [13]

Flow matching-based autonomous driving planning with advanced interactive behavior modeling,

T. Tan, Y . Zheng, R. Liang, Z. Wang, K. Zheng, J. Zheng, J. Li, X. Zhan, and J. Liu, “Flow matching-based autonomous driving planning with advanced interactive behavior modeling,” inThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025

2025

[14] [14]

Can vehicle motion planning generalize to realistic long-tail scenarios?

M. Hallgarten, J. Zapata, M. Stoll, K. Renz, and A. Zell, “Can vehicle motion planning generalize to realistic long-tail scenarios?” 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

2024

[15] [15]

Highly accurate and diverse traffic data: The deepscenario open 3d dataset,

O. Dhaouadi, J. Meier, L. Wahl, J. Kaiser, L. Scalerandi, N. Wan- delburg, Z. Zhou, N. Berinpanathan, H. Banzhaf, and D. Cremers, “Highly accurate and diverse traffic data: The deepscenario open 3d dataset,” in2025 IEEE Intelligent Vehicles Symposium (IV), 2025

2025

[16] [16]

Congested traffic states in empirical observations and microscopic simulations,

M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,”Physical Review E, 2000

2000

[17] [17]

Optimal trajectory generation for dynamic street scenarios in a fren ´et frame,

M. Werling, J. Ziegler, S. Kammel, and S. Thrun, “Optimal trajectory generation for dynamic street scenarios in a fren ´et frame,” in2010 IEEE International Conference on Robotics and Automation, 2010

2010

[18] [18]

Search-based optimal motion planning for automated driving,

Z. Ajanovi ´c, B. Lacevi ´c, B. Shyrokau, M. Stolz, and M. Horn, “Search-based optimal motion planning for automated driving,” in IEEE International Conference on Intelligent Robots and Systems (IROS), 2018

2018

[19] [19]

Pre- dictionnet: Real-time joint probabilistic traffic prediction for plan- ning, control, and simulation,

A. Kamenev, L. Wang, O. B. Boer, I. Kulkarni, B. Kartal, A. Molchanov, S. Birchfield, D. Nister, and N. Smolyanskiy, “Pre- dictionnet: Real-time joint probabilistic traffic prediction for plan- ning, control, and simulation,” inIEEE International Conference on Robotics and Automation (ICRA), 2022

2022

[20] [20]

Urban driving with conditional imitation learning,

J. Hawke, R. Shen, C. Gurau, S. Sharma, D. Reda, N. Nikolov, P. Mazur, S. Micklethwaite, N. Griffiths, A. Shah, and A. Kendall, “Urban driving with conditional imitation learning,”2020 IEEE Inter- national Conference on Robotics and Automation (ICRA), 2019

2020

[21] [21]

Learning to drive in a day,

A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V .-D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in2019 International Conference on Robotics and Automation (ICRA), 2019

2019

[22] [22]

Rethink- ing imitation-based planner for autonomous driving,

J. Cheng, Y . Chen, X. Mei, B. Yang, B. Li, and M. Liu, “Rethink- ing imitation-based planner for autonomous driving,”arXiv preprint arXiv:2309.10443, 2023

arXiv 2023

[23] [23]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,

Z. Huang, H. Liu, and C. Lv, “Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023

[24] [24]

Powerbev: A powerful yet lightweight framework for instance pre- diction in bird’s-eye view,

P. Li, S. Ding, X. Chen, N. Hanselmann, M. Cordts, and J. Gall, “Powerbev: A powerful yet lightweight framework for instance pre- diction in bird’s-eye view,” inProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, 2023

2023

[25] [25]

Ago: Adaptive grounding for open world 3d occupancy prediction,

P. Li, S. Ding, Y . Zhou, Q. Zhang, O. Inak, L. Triess, N. Hanselmann, M. Cordts, and A. Zell, “Ago: Adaptive grounding for open world 3d occupancy prediction,” 2025

2025

[26] [26]

Spacedrive: Infusing spatial awareness into vlm-based autonomous driving,

P. Li, Z. Zhang, D. Holtz, H. Yu, Y . Yang, Y . Lai, R. Song, A. Geiger, and A. Zell, “Spacedrive: Infusing spatial awareness into vlm-based autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

2026

[27] [27]

Traffic and safety rule compliance of humans in diverse driving situations,

M. Kurenkov, S. Marvi, J. Schmidt, C. B. Rist, A. Canevaro, H. Yu, J. Jordan, G. Schildbach, and A. Valada, “Traffic and safety rule compliance of humans in diverse driving situations,”arXiv preprint arxiv:2411.01909, 2024

arXiv 2024

[28] [28]

DTPP: Differentiable Joint Conditional Prediction and Cost Evaluation for Tree Policy Planning in Autonomous Driving,

Z. Huang, P. Karkus, B. Ivanovic, Y . Chen, M. Pavone, and C. Lv, “DTPP: Differentiable Joint Conditional Prediction and Cost Evaluation for Tree Policy Planning in Autonomous Driving,” Feb. 2024, arXiv:2310.05885 [cs]. [Online]. Available: http://arxiv.org/abs/ 2310.05885

arXiv 2024

[29] [29]

Pluto: Pushing the limit of imita- tion learning-based planning for autonomous driving,

J. Cheng, Y . Chen, and Q. Chen, “Pluto: Pushing the limit of imita- tion learning-based planning for autonomous driving,”arXiv preprint arXiv:2404.14327, 2024

arXiv 2024

[30] [30]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”arXiv preprint arxiv:2006.11239, 2020

Pith/arXiv arXiv 2006

[31] [31]

Deep unsupervised learning using nonequilibrium thermodynamics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inProceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei, Eds. PMLR, 2015

2015

[32] [32]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

C. M. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, and D. Anguelov, “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023

[33] [33]

Generalizing motion planners with mixture of experts for autonomous driving,

Q. Sun, H. Wang, J. Zhan, F. Nie, X. Wen, L. Xu, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Generalizing motion planners with mixture of experts for autonomous driving,”2025 IEEE International Conference on Robotics and Automation (ICRA), 2024

2025

[34] [34]

Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers,

Y . Chen, Y . Wang, and Z. Zhang, “Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers,” arXiv preprint arxiv:2412.18607, 2024

arXiv 2024

[35] [35]

Cobl-diffusion: Diffusion-based conditional robot planning in dynamic environments using control barrier and lya- punov functions,

K. Mizuta and K. Leung, “Cobl-diffusion: Diffusion-based conditional robot planning in dynamic environments using control barrier and lya- punov functions,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

2024

[36] [36]

Safeflow: Safe robot motion planning with flow matching via control barrier functions,

X. Dai, Z. Yang, D. Yu, F. Liu, H. Sadeghian, S. Haddadin, and S. Hirche, “Safeflow: Safe robot motion planning with flow matching via control barrier functions,”arXiv preprint arxiv:2504.08661, 2025

arXiv 2025

[37] [37]

Classifier-free diffusion guidance,

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” inNeurIPS 2021 Workshop on Deep Generative Models and Downstream Appli- cations, 2021

2021

[38] [38]

Hype: Hybrid planning with ego proposal-conditioned predictions,

H. Yu, J. Jordan, J. Schmidt, S. Lindner, A. Canevaro, and W. Stork, “Hype: Hybrid planning with ego proposal-conditioned predictions,” in2025 IEEE 28th International Conference on Intelligent Transporta- tion Systems (ITSC), 2025

2025

[39] [39]

Tree-structured policy planning with learned behavior models,

Y . Chen, P. Karkus, B. Ivanovic, X. Weng, and M. Pavone, “Tree-structured policy planning with learned behavior models,” in IEEE International Conference on Robotics and Automation (ICRA), 2023

2023

[40] [40]

Navidiffusor: Cost-guided diffusion model for visual navigation,

Y . Zeng, H. Ren, S. Wang, J. Huang, and H. Cheng, “Navidiffusor: Cost-guided diffusion model for visual navigation,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

2025

[41] [41]

Scalable diffusion models with transform- ers,

W. Peebles and S. Xie, “Scalable diffusion models with transform- ers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023

2023

[42] [42]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer-Assisted Intervention (MICCAI), 2015

2015

[43] [43]

Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps,

C. Lu, Y . Zhou, F. Bao, J. Chen, C. Li, and J. Zhu, “Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps,” inProceedings of the 36th International Conference on Neural Information Processing Systems, 2022

2022

[44] [44]

Diffusion models beat GANs on image synthesis,

P. Dhariwal and A. Q. Nichol, “Diffusion models beat GANs on image synthesis,” inAdvances in Neural Information Processing Systems, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021

2021

[45] [45]

Generalizing motion planners with mixture of experts for autonomous driving,

Q. Sun, H. Wang, J. Zhan, F. Nie, X. Wen, L. Xu, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Generalizing motion planners with mixture of experts for autonomous driving,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025

2025