ConsistencyPlanner: Real-time Planning with Fast-Sampling Consistency Models

Dongbin Zhao; Jiaqi Fang; Jie Ling; Qiankun Yu; Qichao Zhang; Xing Fang; Zhenwen Cai

arxiv: 2606.11569 · v1 · pith:3EEKIJ4Jnew · submitted 2026-06-10 · 💻 cs.RO · cs.AI

ConsistencyPlanner: Real-time Planning with Fast-Sampling Consistency Models

Qichao Zhang , Xing Fang , Jiaqi Fang , Zhenwen Cai , Jie Ling , Qiankun Yu , Dongbin Zhao This is my paper

Pith reviewed 2026-06-27 10:02 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords consistency modelsautonomous drivingreal-time planningtrajectory generationmultimodal samplingsafety evaluationdriving simulatorfeature fusion

0 comments

The pith

Fast-sampling consistency models enable real-time multimodal trajectory planning for safer autonomous driving decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a planning framework that uses consistency models to generate many possible future paths quickly instead of relying on slow iterative sampling or rigid rules. It adds an attention decoder to combine scene details with action information into one planning signal. The result is closed-loop control that explores different driving options in real time while avoiding the indecisiveness seen in earlier learning methods. Tests in a driving simulator show better safety numbers than prior approaches, especially when traffic is changing rapidly. If correct, this removes a key barrier between rich behavior modeling and the speed needed for actual vehicle deployment.

Core claim

ConsistencyPlanner shows that fast-sampling consistency models produce a diverse set of plausible trajectories at low computational cost, and that an attention-enhanced decoder can fuse heterogeneous scene and action features into a single representation; together these steps support real-time closed-loop planning that records higher safety metrics than existing methods in the Waymax simulator, particularly under dynamic conditions.

What carries the argument

Fast-sampling consistency models that produce multiple trajectories in few steps, together with an attention-enhanced decoder that merges scene features and action tokens.

If this is right

Real-time exploration of multimodal future actions becomes practical without the slowdown of iterative generative sampling.
Heterogeneous inputs can be fused on the fly to support more robust decisions than single-mode planners.
Safety metrics improve over rule-based and prior learning baselines, especially in rapidly changing traffic.
Closed-loop planning can operate continuously while still representing a range of possible driver behaviors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sampling-plus-fusion pattern could be tested on other real-time robotics tasks that require choosing among several plausible futures.
If simulator safety gains hold, the method could reduce the need for separate safety layers that slow down planning.
Extending the decoder to accept raw sensor streams would allow direct comparison against end-to-end learned controllers.

Load-bearing premise

The trajectories produced by the fast-sampling models will correspond to actions that remain safe and feasible when transferred from simulation to real vehicles.

What would settle it

A real-vehicle test in which the planner selects a trajectory rated safe in the simulator yet results in a collision or near-miss that the simulator did not predict.

Figures

Figures reproduced from arXiv: 2606.11569 by Dongbin Zhao, Jiaqi Fang, Jie Ling, Qiankun Yu, Qichao Zhang, Xing Fang, Zhenwen Cai.

**Figure 1.** Figure 1: Framework of the ConsistencyPlanner method with the scene feature encoder and consistency decoder. We incorporate [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Visualization results of ConsistencyPlanner against other three methods in complex driving scenarios. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Closed-loop planning in complex, real-world driving scenarios presents a critical challenge for autonomous driving systems. While traditional rule-based methods are interpretable, their predefined heuristics lack the adaptability for dynamic traffic environments. Learning-based approaches have shown considerable promise. Conversely, learning-based approaches, despite their promise, struggle to balance the modeling diverse and multimodal driving behaviors and real-time planning, often leading to indecisive or unsafe actions. To address this limitation, we propose Consistency Planner, a real-time planning framework with fast-sampling consistency models. Our approach is built upon two key technical contributions. Efficient Multimodal Sampling: We employ fast-sampling consistency models to generate a diverse set of plausible future trajectories. This enables efficient, real-time exploration of multimodal actions, overcoming the computational bottlenecks of previous iterative generative methods. Heterogeneous Feature Fusion: We introduce an attention-enhanced decoder that dynamically integrates heterogeneous input features (including scene feature and action token) into a cohesive representation for robust planning. Extensive evaluation in the Waymax simulator demonstrates superior performance in safety metrics compared to existing methods, with particularly strong results in challenging dynamic scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Consistency models speed up multimodal trajectory sampling for driving planners in simulation, but safety gains stay unproven outside Waymax.

read the letter

The main point is that ConsistencyPlanner takes consistency models—already known for cutting sampling steps versus standard diffusion—and uses them to generate diverse future trajectories fast enough for real-time closed-loop planning. It pairs that with an attention decoder to fuse scene features and action tokens. That combination targets the usual tradeoff between speed and handling multiple possible futures in traffic, which is a practical issue in the field.

The work does a clean job of laying out the two pieces: efficient sampling to avoid slow iterative generation, and the heterogeneous fusion step to make the planner robust to mixed inputs. The Waymax closed-loop tests are the main evidence offered, and the abstract claims better safety numbers especially in dynamic cases. If those numbers hold up with proper baselines and error bars in the full paper, it would be a useful incremental step for anyone building generative planners.

The soft spot is that all the safety claims sit inside the simulator. No hardware runs, no sensor noise tests, and no discussion of how well the generated trajectories survive model mismatch or actuation differences. That leaves the real-world transfer as an assumption rather than a checked result. The abstract also skips concrete metrics, so it is hard to judge how large the gains actually are or whether they beat the strongest recent baselines by a meaningful margin.

This paper is for people working on learning-based motion planning in autonomous driving who already follow generative-model approaches. It is not a foundational shift, but the method is straightforward enough that a reader could try the sampling trick on their own stack. It deserves a serious referee because the core idea is grounded in existing techniques and the evaluation setup is at least closed-loop, even if more validation would strengthen it.

Referee Report

2 major / 1 minor

Summary. The paper proposes ConsistencyPlanner, a real-time planning framework for autonomous driving that uses fast-sampling consistency models to generate diverse plausible trajectories and an attention-enhanced decoder for fusing heterogeneous features (scene and action tokens). It claims to overcome computational bottlenecks of prior iterative generative methods and reports superior safety metrics versus existing approaches in Waymax simulator evaluations, with strongest gains in challenging dynamic scenarios.

Significance. If the simulator results hold under additional scrutiny, the work could offer a practical advance in balancing trajectory diversity, multimodality, and real-time constraints for learning-based planners in robotics, with potential relevance to closed-loop autonomous driving systems.

major comments (2)

[Abstract] Abstract: the central claim of 'superior performance in safety metrics' (with 'particularly strong results in challenging dynamic scenarios') supplies no quantitative values, baselines, error bars, or statistical tests, rendering the claim unevaluable from the provided text.
[Evaluation] Evaluation (implied by abstract claims): all reported safety gains rest exclusively on Waymax closed-loop simulations; no sim-to-real transfer experiments, hardware deployment, or tests under sensor noise/model mismatch are described, which is load-bearing for any assertion of applicability to real-world driving.

minor comments (1)

[Abstract] Abstract: the consecutive sentences 'Learning-based approaches have shown considerable promise. Conversely, learning-based approaches, despite their promise, struggle...' contain redundant phrasing that reduces clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'superior performance in safety metrics' (with 'particularly strong results in challenging dynamic scenarios') supplies no quantitative values, baselines, error bars, or statistical tests, rendering the claim unevaluable from the provided text.

Authors: We agree that the abstract's claims would benefit from supporting quantitative details. In the revised manuscript, we will update the abstract to report specific safety metric improvements (e.g., collision rate reductions versus baselines), reference the number of evaluation runs, and note the presence of error bars or statistical significance where computed in the experiments. revision: yes
Referee: [Evaluation] Evaluation (implied by abstract claims): all reported safety gains rest exclusively on Waymax closed-loop simulations; no sim-to-real transfer experiments, hardware deployment, or tests under sensor noise/model mismatch are described, which is load-bearing for any assertion of applicability to real-world driving.

Authors: The paper's scope is a simulation-based study using the standard Waymax closed-loop benchmark to isolate planner performance under controlled conditions. We do not assert direct real-world deployment readiness. We will revise the text to explicitly limit claims to simulation results and add a limitations paragraph discussing the sim-to-real gap, including sensor noise and model mismatch as open challenges for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: framework proposal with simulator evaluation only

full rationale

The manuscript describes an applied planning architecture (fast-sampling consistency models + attention decoder) and reports Waymax closed-loop metrics. No equations, parameter-fitting steps, uniqueness theorems, or self-citations appear in the provided text that would reduce any claimed result to its own inputs by construction. The listed circularity patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.) are absent; the work is therefore self-contained against external benchmarks and receives the default non-finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no information on free parameters, axioms, or invented entities used by the central claim.

pith-pipeline@v0.9.1-grok · 5733 in / 1139 out tokens · 31256 ms · 2026-06-27T10:02:55.701088+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 10 canonical work pages · 4 internal anchors

[1]

Wayformer: Motion forecasting via simple & efficient at- tention networks,

N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp, “Wayformer: Motion forecasting via simple & efficient at- tention networks,” in2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980–2987, IEEE, 2023

2023
[2]

Parting with misconceptions about learning-based vehicle motion planning,

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” inPro- ceedings of the 7th Conference on Robot Learning, vol. 229, pp. 1268– 1281, PMLR, 2023

2023
[3]

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,

Y . Zheng, Z. Xing, Q. Zhang, B. Jin,et al., “Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,”IEEE Transactions on Cognitive and Developmental Systems, 2026

2026
[4]

Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,

C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y . Lu, J. Harb, X. Pan, Y . Wang, X. Chen, J. Co-Reyes, R. Agarwal, R. Roelofs, Y . Lu, N. Montali, P. Mougin, Z. Yang, B. White, A. Faust, R. McAllister, D. Anguelov, and B. Sapp, “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,” inAdvances in Neural Informatio...

2023
[5]

Baidu Apollo EM Motion Planner

H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong, “Baidu apollo em motion planner,”arXiv preprint arXiv:1807.08048, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,

Z. Huang, H. Liu, and C. Lv, “Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3903–3913, 2023

2023
[7]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[8]

Worldrft: Latent world model planning with reinforcement fine-tuning for autonomous driving,

P. Yang, B. Lu, Z. Xia, C. Han, Y . Gao, T. Zhang, K. Zhan, X. Lang, Y . Zheng, and Q. Zhang, “Worldrft: Latent world model planning with reinforcement fine-tuning for autonomous driving,”Proceedings of the AAAI conference on artificial intelligence, 2026

2026
[9]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model,

Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jia,et al., “World4drive: End-to-end autonomous driving via intention-aware physical latent world model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2025
[10]

Planning-inspired hierarchical trajectory prediction via lateral- longitudinal decomposition for autonomous driving,

D. Li, Q. Zhang, Z. Xia, Y . Zheng, K. Zhang, M. Yi, W. Jin, and D. Zhao, “Planning-inspired hierarchical trajectory prediction via lateral- longitudinal decomposition for autonomous driving,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 692–703, 2023

2023
[11]

Uncad: Towards safe end-to-end autonomous driving via online map uncertainty,

P. Yang, Y . Zheng, Q. Zhang, K. Zhu, Z. Xing, Q. Lin, Y .-F. Liu, Z. Su, and D. Zhao, “Uncad: Towards safe end-to-end autonomous driving via online map uncertainty,”2025 IEEE International Conference on Robotics and Automation, 2025

2025
[12]

Learning multiple probabilistic decisions from latent world model in autonomous driving,

L. Xiao, J.-J. Liu, S. Yang, X. Li, X. Ye, W. Yang, and J. Wang, “Learning multiple probabilistic decisions from latent world model in autonomous driving,”arXiv preprint arXiv:2409.15730, 2024

work page arXiv 2024
[13]

Diffusion-based planning for autonomous driving with flexible guidance,

Y . Zheng, R. Liang, K. ZHENG, J. Zheng, L. Mao, J. Li, W. Gu, R. Ai, S. E. Li, X. Zhan, and J. Liu, “Diffusion-based planning for autonomous driving with flexible guidance,” inThe Thirteenth International Confer- ence on Learning Representations, 2025

2025
[14]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuplan: A closed-loop ml- based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

Consistency mod- els,

Y . Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency mod- els,” inProceedings of the 40th International Conference on Machine Learning, vol. 202, pp. 32211–32252, PMLR, 2023

2023
[16]

Boosting continuous control with consistency policy,

Y . Chen, H. Li, and D. Zhao, “Boosting continuous control with consistency policy,”arXiv preprint arXiv:2310.06343, 2023

work page arXiv 2023
[17]

Consistency policy: Accelerated visuomotor policies via consistency distillation,

A. Prasad, K. Lin, J. Wu, L. Zhou, and J. Bohg, “Consistency policy: Accelerated visuomotor policies via consistency distillation,”arXiv preprint arXiv:2405.07503, 2024

work page arXiv 2024
[18]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4195–4205, 2023. Fig. 2: Visualization results of ConsistencyPlanner against other three methods in complex driving scenarios

2023
[19]

Denoising diffusion probabilistic mod- els,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic mod- els,” inAdvances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020

2020
[20]

Score-based generative modeling through stochastic differ- ential equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,” inThe Ninth International Conference on Learning Representations, 2021

2021
[21]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

Improved techniques for training consistency models,

Y . Song and P. Dhariwal, “Improved techniques for training consistency models,” inThe Twelfth International Conference on Learning Repre- sentations, 2024

2024
[23]

Videolcm: Video latent consistency model,

X. Wang, S. Zhang, H. Zhang, Y . Liu, Y . Zhang, C. Gao, and N. Sang, “Videolcm: Video latent consistency model,”arXiv preprint arXiv:2312.09109, 2023

work page arXiv 2023
[24]

Motionlcm: Real-time controllable motion generation via latent consistency model,

W. Dai, L.-H. Chen, J. Wang, J. Liu, B. Dai, and Y . Tang, “Motionlcm: Real-time controllable motion generation via latent consistency model,” inEuropean Conference on Computer Vision (ECCV), pp. 390–408, Springer, 2024

2024
[25]

Generalizing consistency policy to visual RL with prioritized proximal experience regularization,

H. Li, Z. Jiang, Y . CHEN, and D. Zhao, “Generalizing consistency policy to visual RL with prioritized proximal experience regularization,” in The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[26]

Stabilizing diffusion model for robotic control with dynamic programming and transition feasibility,

H. Li, Y . Zhang, H. Wen, Y . Zhu, and D. Zhao, “Stabilizing diffusion model for robotic control with dynamic programming and transition feasibility,”IEEE Transactions on Artificial Intelligence, vol. 5, no. 9, pp. 4585–4594, 2024

2024
[27]

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,

Y . Zheng, Z. Xing, Q. Zhang, B. Jin, P. Li, Y . Zheng, Z. Xia, K. Zhan, X. Lang, Y . Chen,et al., “Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,”arXiv preprint arXiv:2406.01587, 2024

work page arXiv 2024
[28]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

C. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, D. Anguelov, et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9644–9653, 2023

2023
[29]

Guided conditional diffusion for controllable traffic simulation,

Z. Zhong, D. Rempe, D. Xu, Y . Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3560–3566, IEEE, 2023

2023
[30]

Sledge: Synthesizing driving en- vironments with generative models and rule-based traffic,

K. Chitta, D. Dauener, and A. Geiger, “Sledge: Synthesizing driving en- vironments with generative models and rule-based traffic,” inEuropean Conference on Computer Vision (ECCV), pp. 57–74, Springer, 2024

2024
[31]

Diffusion-es: Gradient-free planning with diffusion for autonomous and instruction-guided driving,

B. Yang, H. Su, N. Gkanatsios, T.-W. Ke, A. Jain, J. Schneider, and K. Fragkiadaki, “Diffusion-es: Gradient-free planning with diffusion for autonomous and instruction-guided driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15342–15353, 2024

2024
[32]

Bert: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of NAACL-HIT, pp. 4171–4186, 2019

2019
[33]

Easychauffeur: A baseline advancing simplicity and efficiency on waymax,

L. Xiao, J.-J. Liu, X. Ye, W. Yang, and J. Wang, “Easychauffeur: A baseline advancing simplicity and efficiency on waymax,”arXiv preprint arXiv:2408.16375, 2024

work page arXiv 2024
[34]

An iterative procedure for the polygonal approximation of plane curves,

U. Ramer, “An iterative procedure for the polygonal approximation of plane curves,”Computer graphics and image processing, vol. 1, no. 3, pp. 244–256, 1972

1972
[35]

Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,

D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Cartographica: the international journal for geographic information and geovisualization, vol. 10, no. 2, pp. 112–122, 1973

1973
[36]

Mlp-mixer: An all-mlp architecture for vision,

I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Un- terthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovitskiy, “Mlp-mixer: An all-mlp architecture for vision,” in Advances in Neural Information Processing Systems, vol. 34, pp. 24261– 24272, 2021

2021
[37]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Y . Balaji, S. Nah, X. Huang, A. Vahdat, J. Song, K. Kreis, M. Aittala, T. Aila, S. Laine, B. Catanzaro, T. Karras, and M.-Y . Liu, “ediff-i: Text- to-image diffusion models with ensemble of expert denoisers,”arXiv preprint arXiv:2211.01324, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. R. Qi, Y . Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V . Vasudevan, A. McCauley, J. Shlens, and D. Anguelov, “Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,” inProceedings of the IEEE/CVF International Conference on Com...

2021
[39]

Congested traffic states in empirical observations and microscopic simulations,

M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,”Physical review E, vol. 62, no. 2, p. 1805, 2000

2000
[40]

Plant: Explainable planning transformers via object-level representa- tions,

K. Renz, K. Chitta, O.-B. Mercea, A. Koepke, Z. Akata, and A. Geiger, “Plant: Explainable planning transformers via object-level representa- tions,” inConference on Robot Learning, pp. 459–470, 2023

2023

[1] [1]

Wayformer: Motion forecasting via simple & efficient at- tention networks,

N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp, “Wayformer: Motion forecasting via simple & efficient at- tention networks,” in2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980–2987, IEEE, 2023

2023

[2] [2]

Parting with misconceptions about learning-based vehicle motion planning,

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” inPro- ceedings of the 7th Conference on Robot Learning, vol. 229, pp. 1268– 1281, PMLR, 2023

2023

[3] [3]

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,

Y . Zheng, Z. Xing, Q. Zhang, B. Jin,et al., “Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,”IEEE Transactions on Cognitive and Developmental Systems, 2026

2026

[4] [4]

Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,

C. Gulino, J. Fu, W. Luo, G. Tucker, E. Bronstein, Y . Lu, J. Harb, X. Pan, Y . Wang, X. Chen, J. Co-Reyes, R. Agarwal, R. Roelofs, Y . Lu, N. Montali, P. Mougin, Z. Yang, B. White, A. Faust, R. McAllister, D. Anguelov, and B. Sapp, “Waymax: An accelerated, data-driven simulator for large-scale autonomous driving research,” inAdvances in Neural Informatio...

2023

[5] [5]

Baidu Apollo EM Motion Planner

H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W. Zhu, J. Hu, H. Li, and Q. Kong, “Baidu apollo em motion planner,”arXiv preprint arXiv:1807.08048, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,

Z. Huang, H. Liu, and C. Lv, “Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3903–3913, 2023

2023

[7] [7]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017

[8] [8]

Worldrft: Latent world model planning with reinforcement fine-tuning for autonomous driving,

P. Yang, B. Lu, Z. Xia, C. Han, Y . Gao, T. Zhang, K. Zhan, X. Lang, Y . Zheng, and Q. Zhang, “Worldrft: Latent world model planning with reinforcement fine-tuning for autonomous driving,”Proceedings of the AAAI conference on artificial intelligence, 2026

2026

[9] [9]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model,

Y . Zheng, P. Yang, Z. Xing, Q. Zhang, Y . Zheng, Y . Gao, P. Li, T. Zhang, Z. Xia, P. Jia,et al., “World4drive: End-to-end autonomous driving via intention-aware physical latent world model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

2025

[10] [10]

Planning-inspired hierarchical trajectory prediction via lateral- longitudinal decomposition for autonomous driving,

D. Li, Q. Zhang, Z. Xia, Y . Zheng, K. Zhang, M. Yi, W. Jin, and D. Zhao, “Planning-inspired hierarchical trajectory prediction via lateral- longitudinal decomposition for autonomous driving,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 692–703, 2023

2023

[11] [11]

Uncad: Towards safe end-to-end autonomous driving via online map uncertainty,

P. Yang, Y . Zheng, Q. Zhang, K. Zhu, Z. Xing, Q. Lin, Y .-F. Liu, Z. Su, and D. Zhao, “Uncad: Towards safe end-to-end autonomous driving via online map uncertainty,”2025 IEEE International Conference on Robotics and Automation, 2025

2025

[12] [12]

Learning multiple probabilistic decisions from latent world model in autonomous driving,

L. Xiao, J.-J. Liu, S. Yang, X. Li, X. Ye, W. Yang, and J. Wang, “Learning multiple probabilistic decisions from latent world model in autonomous driving,”arXiv preprint arXiv:2409.15730, 2024

work page arXiv 2024

[13] [13]

Diffusion-based planning for autonomous driving with flexible guidance,

Y . Zheng, R. Liang, K. ZHENG, J. Zheng, L. Mao, J. Li, W. Gu, R. Ai, S. E. Li, X. Zhan, and J. Liu, “Diffusion-based planning for autonomous driving with flexible guidance,” inThe Thirteenth International Confer- ence on Learning Representations, 2025

2025

[14] [14]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuplan: A closed-loop ml- based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[15] [15]

Consistency mod- els,

Y . Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency mod- els,” inProceedings of the 40th International Conference on Machine Learning, vol. 202, pp. 32211–32252, PMLR, 2023

2023

[16] [16]

Boosting continuous control with consistency policy,

Y . Chen, H. Li, and D. Zhao, “Boosting continuous control with consistency policy,”arXiv preprint arXiv:2310.06343, 2023

work page arXiv 2023

[17] [17]

Consistency policy: Accelerated visuomotor policies via consistency distillation,

A. Prasad, K. Lin, J. Wu, L. Zhou, and J. Bohg, “Consistency policy: Accelerated visuomotor policies via consistency distillation,”arXiv preprint arXiv:2405.07503, 2024

work page arXiv 2024

[18] [18]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4195–4205, 2023. Fig. 2: Visualization results of ConsistencyPlanner against other three methods in complex driving scenarios

2023

[19] [19]

Denoising diffusion probabilistic mod- els,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic mod- els,” inAdvances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020

2020

[20] [20]

Score-based generative modeling through stochastic differ- ential equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,” inThe Ninth International Conference on Learning Representations, 2021

2021

[21] [21]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,”arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[22] [22]

Improved techniques for training consistency models,

Y . Song and P. Dhariwal, “Improved techniques for training consistency models,” inThe Twelfth International Conference on Learning Repre- sentations, 2024

2024

[23] [23]

Videolcm: Video latent consistency model,

X. Wang, S. Zhang, H. Zhang, Y . Liu, Y . Zhang, C. Gao, and N. Sang, “Videolcm: Video latent consistency model,”arXiv preprint arXiv:2312.09109, 2023

work page arXiv 2023

[24] [24]

Motionlcm: Real-time controllable motion generation via latent consistency model,

W. Dai, L.-H. Chen, J. Wang, J. Liu, B. Dai, and Y . Tang, “Motionlcm: Real-time controllable motion generation via latent consistency model,” inEuropean Conference on Computer Vision (ECCV), pp. 390–408, Springer, 2024

2024

[25] [25]

Generalizing consistency policy to visual RL with prioritized proximal experience regularization,

H. Li, Z. Jiang, Y . CHEN, and D. Zhao, “Generalizing consistency policy to visual RL with prioritized proximal experience regularization,” in The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[26] [26]

Stabilizing diffusion model for robotic control with dynamic programming and transition feasibility,

H. Li, Y . Zhang, H. Wen, Y . Zhu, and D. Zhao, “Stabilizing diffusion model for robotic control with dynamic programming and transition feasibility,”IEEE Transactions on Artificial Intelligence, vol. 5, no. 9, pp. 4585–4594, 2024

2024

[27] [27]

Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,

Y . Zheng, Z. Xing, Q. Zhang, B. Jin, P. Li, Y . Zheng, Z. Xia, K. Zhan, X. Lang, Y . Chen,et al., “Planagent: A multi-modal large language agent for closed-loop vehicle motion planning,”arXiv preprint arXiv:2406.01587, 2024

work page arXiv 2024

[28] [28]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

C. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, D. Anguelov, et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9644–9653, 2023

2023

[29] [29]

Guided conditional diffusion for controllable traffic simulation,

Z. Zhong, D. Rempe, D. Xu, Y . Chen, S. Veer, T. Che, B. Ray, and M. Pavone, “Guided conditional diffusion for controllable traffic simulation,” in2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3560–3566, IEEE, 2023

2023

[30] [30]

Sledge: Synthesizing driving en- vironments with generative models and rule-based traffic,

K. Chitta, D. Dauener, and A. Geiger, “Sledge: Synthesizing driving en- vironments with generative models and rule-based traffic,” inEuropean Conference on Computer Vision (ECCV), pp. 57–74, Springer, 2024

2024

[31] [31]

Diffusion-es: Gradient-free planning with diffusion for autonomous and instruction-guided driving,

B. Yang, H. Su, N. Gkanatsios, T.-W. Ke, A. Jain, J. Schneider, and K. Fragkiadaki, “Diffusion-es: Gradient-free planning with diffusion for autonomous and instruction-guided driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15342–15353, 2024

2024

[32] [32]

Bert: Pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of NAACL-HIT, pp. 4171–4186, 2019

2019

[33] [33]

Easychauffeur: A baseline advancing simplicity and efficiency on waymax,

L. Xiao, J.-J. Liu, X. Ye, W. Yang, and J. Wang, “Easychauffeur: A baseline advancing simplicity and efficiency on waymax,”arXiv preprint arXiv:2408.16375, 2024

work page arXiv 2024

[34] [34]

An iterative procedure for the polygonal approximation of plane curves,

U. Ramer, “An iterative procedure for the polygonal approximation of plane curves,”Computer graphics and image processing, vol. 1, no. 3, pp. 244–256, 1972

1972

[35] [35]

Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,

D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Cartographica: the international journal for geographic information and geovisualization, vol. 10, no. 2, pp. 112–122, 1973

1973

[36] [36]

Mlp-mixer: An all-mlp architecture for vision,

I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Un- terthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovitskiy, “Mlp-mixer: An all-mlp architecture for vision,” in Advances in Neural Information Processing Systems, vol. 34, pp. 24261– 24272, 2021

2021

[37] [37]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Y . Balaji, S. Nah, X. Huang, A. Vahdat, J. Song, K. Kreis, M. Aittala, T. Aila, S. Laine, B. Catanzaro, T. Karras, and M.-Y . Liu, “ediff-i: Text- to-image diffusion models with ensemble of expert denoisers,”arXiv preprint arXiv:2211.01324, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [38]

Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. R. Qi, Y . Zhou, Z. Yang, A. Chouard, P. Sun, J. Ngiam, V . Vasudevan, A. McCauley, J. Shlens, and D. Anguelov, “Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,” inProceedings of the IEEE/CVF International Conference on Com...

2021

[39] [39]

Congested traffic states in empirical observations and microscopic simulations,

M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in empirical observations and microscopic simulations,”Physical review E, vol. 62, no. 2, p. 1805, 2000

2000

[40] [40]

Plant: Explainable planning transformers via object-level representa- tions,

K. Renz, K. Chitta, O.-B. Mercea, A. Koepke, Z. Akata, and A. Geiger, “Plant: Explainable planning transformers via object-level representa- tions,” inConference on Robot Learning, pp. 459–470, 2023

2023