arxiv: 2604.25329 · v1 · submitted 2026-04-28 · 💻 cs.RO

Recognition: unknown

ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution

Chuyao Fu , Shengzhe Gan , Zhuoli Ouyang , Yuhan Rui , Xiaowei Chi , Sirui Han , Jiankun Wang , Hong Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:54 UTC · model grok-4.3

classification 💻 cs.RO

keywords autonomous drivingproactive planningworld modeltrajectory planningscene predictionend-to-end trainingBEV representation

0 comments

The pith

ProDrive lets a driving planner and world model co-evolve so future scene predictions directly refine trajectory choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous driving planners typically select paths from current observations alone, which can produce myopic and unsafe choices when the environment changes rapidly. ProDrive instead trains a trajectory planner and a scene prediction model together end-to-end. The planner proposes multiple candidate paths and special tokens describing its intent. The prediction model uses those inputs to forecast how the entire scene would evolve along each path. All candidates are scored in parallel and the resulting signals flow back through the network to improve the planner. This creates a closed loop in which planning and scene evolution inform each other, moving beyond reactive decisions. On the NAVSIM benchmark the method shows gains in safety and planning smoothness compared with prior approaches.

Core claim

The paper establishes that bidirectional coupling between a query-centric trajectory planner and a bird's-eye-view world model, obtained by injecting planner features into the world model and scoring all candidate trajectories in parallel while preserving end-to-end gradient flow, enables future outcome assessment to shape planning decisions directly rather than relying on current observations alone.

What carries the argument

The ego-environment co-evolution loop in which the planner supplies candidate trajectories and planning-aware tokens that condition the world model's future scene forecasts, with parallel evaluation allowing outcome scores to update the planner parameters.

If this is right

Planning decisions incorporate predicted future scene states rather than current observations alone.
End-to-end gradient flow lets the quality of simulated outcomes directly optimize planner parameters.
Parallel scoring of all candidate trajectories supports efficient selection of the best path.
Joint training yields measurable gains in safety and planning efficiency on NAVSIM v1.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the world model is reliable, the architecture could reduce the need for separate prediction modules in driving stacks.
Similar co-evolution of actor and simulator might apply to other sequential decision problems in dynamic settings.
Prediction errors over longer horizons could still cause failures, suggesting value in adding uncertainty estimates.
The method points toward policies that optimize multi-step outcomes without explicit reward engineering.

Load-bearing premise

The world model must generate sufficiently accurate predictions of future scenes when conditioned on the planner's candidate trajectories and tokens so that the outcome scores meaningfully improve planning decisions.

What would settle it

Showing that the world model's predicted future scenes diverge substantially from real observations, or that removing the planner-feature injection and parallel evaluation produces equal or better results on the same benchmark, would refute the claimed benefit of the coupling.

Figures

Figures reproduced from arXiv: 2604.25329 by Chuyao Fu, Hong Zhang, Jiankun Wang, Shengzhe Gan, Sirui Han, Xiaowei Chi, Yuhan Rui, Zhuoli Ouyang.

**Figure 1.** Figure 1: From reactive to proactive autonomous driving. (a) Conventional end-to-end planners are reactive, generating trajectories mainly from the current observation without explicitly modeling future scene evolution. (b) Some recent methods use a world model for trajectory reranking, but the planner and world model remain loosely coupled, limiting direct planner-side benefit from future reasoning. (c) In contrast… view at source ↗

**Figure 2.** Figure 2: Overview of ProDrive. Given multi-view images, LiDAR, and ego state, the Ego Module refines learnable ego queries through L Ego Refiner layers to obtain ego tokens, from which candidate trajectories are decoded and transformed into trajectory tokens. Conditioned on these tokens and the current BEV state, the Environment Module performs recurrent BEV future prediction and reward-based trajectory evaluation… view at source ↗

**Figure 3.** Figure 3: Qualitative examples of ProDrive. For each case, the top row shows the front-view observation with the planned trajectory overlaid, and the bottom row shows the corresponding bird’s-eye-view scene with the predicted plan compared against the human trajectory. Across diverse driving scenarios, ProDrive produces safe and foresighted behaviors by anticipating the future motion of surrounding agents and captur… view at source ↗

read the original abstract

End-to-end autonomous driving planners typically generate trajectories from current observations alone. However, real-world driving is highly dynamic, and such reactive planning cannot anticipate future scene evolution, often leading to myopic decisions and safety-critical failures. We propose ProDrive, a world-model-based proactive planning framework that enables ego-environment co-evolution for autonomous driving. ProDrive jointly trains a query-centric trajectory planner and a bird's-eye-view (BEV) world model end-to-end: the planner generates diverse candidate trajectories and planning-aware ego tokens, while the world model predicts future scene evolution conditioned on them. By injecting planner features into the world model and evaluating all candidates in parallel, ProDrive preserves end-to-end gradient flow and allows future outcome assessment to directly shape planning. This bidirectional coupling enables proactive planning beyond current-observation-driven decision-making. Experiments on NAVSIM v1 show that ProDrive outperforms strong baselines in both safety and planning efficiency, while ablations validate the effectiveness of the proposed ego-environment coupling design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProDrive couples a planner to a BEV world model via planner-aware tokens and parallel candidate evaluation to enable proactive decisions, but the central benefit still hinges on unshown prediction accuracy under that conditioning.

read the letter

The paper's main move is to train a query-centric planner and a BEV world model together so that the planner's candidate trajectories and ego tokens condition the future scene predictions, and those predictions in turn shape the final plan. Parallel evaluation of candidates keeps the whole thing differentiable. This is presented as a way to escape purely reactive planning that only looks at the current frame. On NAVSIM v1 the method reports gains in safety and efficiency over baselines, and the ablations are meant to isolate the value of the coupling itself. That joint-training design and the token injection are the concrete pieces that feel new relative to separate prediction-then-plan pipelines. The parallel rollout of candidates is a practical way to keep gradients moving without breaking the loop. Those elements are worth noting if you work on learned world models for driving. The soft spot is exactly the one the stress test flags. The proactive claim requires that the world model stays accurate when its inputs come from the planner's own outputs rather than ground-truth future states. The abstract gives no separate numbers on future-frame error, object trajectory deviation, or collision prediction quality under this closed conditioning. Without those, it is hard to tell whether the reported planning gains come from better foresight or from other parts of the architecture. In driving world models this kind of conditioning often produces optimistic or mode-collapsed forecasts, which would weaken the outcome-assessment step. If the full paper has those metrics and they look reasonable, the story strengthens; if not, the improvement may be overstated. This is aimed at groups already building end-to-end BEV planners that incorporate scene prediction. A reader who needs a worked example of planner-aware tokens and joint training could pull useful implementation details even if the results need more checking. The paper shows clear engagement with the myopic-planning problem and offers a structured mechanism rather than hand-waving. It deserves a serious referee because the idea addresses a recognized gap and the experiments give a concrete testbed, though any review would need to press on the prediction-fidelity numbers. I would send it to review rather than desk reject.

Referee Report

2 major / 0 minor

Summary. The paper proposes ProDrive, a world-model-based proactive planning framework for autonomous driving. It jointly trains a query-centric trajectory planner and a BEV world model end-to-end: the planner generates diverse candidate trajectories and planning-aware ego tokens that condition the world model's future scene predictions. This bidirectional coupling is intended to enable future outcome assessment to shape planning decisions, preserve end-to-end gradient flow via parallel candidate evaluation, and outperform reactive baselines. Experiments on NAVSIM v1 are claimed to show gains in safety and planning efficiency, supported by ablations on the ego-environment coupling.

Significance. If the empirical claims hold and the world-model predictions remain accurate under planner conditioning, ProDrive could meaningfully advance end-to-end driving by enabling proactive rather than purely reactive planning. The joint training that maintains gradient flow through feature injection and parallel evaluation is a technically attractive design choice, and the explicit ablations on the coupling mechanism provide a clear way to isolate its contribution.

major comments (2)

[Abstract] Abstract: The central claim that ProDrive 'outperforms strong baselines in both safety and planning efficiency' on NAVSIM v1 is presented without any quantitative metrics, baseline names, error bars, or data-exclusion rules. This absence is load-bearing because the proactive benefit is asserted to arise from the co-evolution mechanism, yet no evidence is supplied to quantify the improvement or rule out confounds.
[Abstract] Abstract (and implied methods): The framework relies on the assumption that the BEV world model, when conditioned on planner-generated candidate trajectories plus planning-aware tokens, produces sufficiently accurate future scene forecasts for outcome evaluation to improve planning. No metrics on prediction fidelity (future-frame mIoU, object trajectory error, or collision-prediction AUC) under closed-loop planner conditioning versus open-loop or ground-truth conditioning are reported, leaving the weakest assumption untested and exposing the method to risks of optimistic or mode-collapsed predictions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the technical appeal of the ego-environment co-evolution design. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of results and assumptions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that ProDrive 'outperforms strong baselines in both safety and planning efficiency' on NAVSIM v1 is presented without any quantitative metrics, baseline names, error bars, or data-exclusion rules. This absence is load-bearing because the proactive benefit is asserted to arise from the co-evolution mechanism, yet no evidence is supplied to quantify the improvement or rule out confounds.

Authors: We agree that the abstract should supply concrete quantitative support for the performance claims. In the revised manuscript we will expand the abstract to include the primary NAVSIM v1 metrics (e.g., safety score, planning efficiency), the specific baselines compared, and any available error bars or statistical details from the experimental section. This will make the evidence for the co-evolution benefit explicit and address potential confounds. revision: yes
Referee: [Abstract] Abstract (and implied methods): The framework relies on the assumption that the BEV world model, when conditioned on planner-generated candidate trajectories plus planning-aware tokens, produces sufficiently accurate future scene forecasts for outcome evaluation to improve planning. No metrics on prediction fidelity (future-frame mIoU, object trajectory error, or collision-prediction AUC) under closed-loop planner conditioning versus open-loop or ground-truth conditioning are reported, leaving the weakest assumption untested and exposing the method to risks of optimistic or mode-collapsed predictions.

Authors: We acknowledge that the current manuscript does not report dedicated prediction-fidelity metrics for the world model under planner-specific conditioning. While the joint training and coupling ablations provide indirect support for the overall framework, we agree that direct evaluation of forecast accuracy is necessary to substantiate the core assumption. In the revision we will add experiments reporting future-frame mIoU, object trajectory error, and related metrics under closed-loop planner conditioning, open-loop, and ground-truth settings. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation remains self-contained

full rationale

The paper presents a joint end-to-end training architecture in which a planner generates candidate trajectories and tokens that condition a BEV world model, with parallel evaluation preserving gradient flow. No equations, fitted parameters, or self-citations are shown that would reduce the claimed proactive benefit or future-outcome shaping to a quantity defined by the same inputs. The central mechanism (conditioning + parallel evaluation) is an architectural choice whose validity is tested empirically on NAVSIM v1 rather than asserted by definition or prior self-work. The weakest assumption (world-model fidelity under planner conditioning) is acknowledged as external to the derivation chain and does not create a self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the framework implicitly relies on standard supervised learning assumptions for world-model training and trajectory supervision that are not detailed here.

pith-pipeline@v0.9.0 · 5495 in / 1134 out tokens · 43586 ms · 2026-05-07T15:54:50.861084+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 16 canonical work pages · 4 internal anchors

[1]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based plan- ning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021. 6

work page internal anchor Pith review arXiv 2021
[2]

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.arXiv preprint arXiv:2402.13243,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving

Weihuang Chen, Fangfang Wang, and Hongbin Sun. S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving. InAsian conference on machine learning, pages 454–469. PMLR, 2021. 3

2021
[4]

Transfuser: Imitation with transformer-based sensor fusion for autonomous driv- ing.IEEE transactions on pattern analysis and machine in- telligence, 45(11):12878–12895, 2022

Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driv- ing.IEEE transactions on pattern analysis and machine in- telligence, 45(11):12878–12895, 2022. 2, 3, 7

2022
[5]

Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024. 2, 6

2024
[6]

Carla: An open urban driv- ing simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto- nio Lopez, and Vladlen Koltun. Carla: An open urban driv- ing simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 3

2017
[7]

Orion: A holistic end-to- end autonomous driving framework by vision-language in- structed action generation

Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to- end autonomous driving framework by vision-language in- structed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823– 24834, 2025. 2

2025
[8]

Learning to drive from a world model

Mitchell Goff, Greg Hogan, George Hotz, Armand du Parc Locmaria, Kacper Raczy, Harald Sch ¨afer, Adeeb Shi- hadeh, Weixing Zhang, and Yassine Yousfi. Learning to drive from a world model. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 1964–1973,

1964
[9]

ipad: Iterative proposal-centric end-to-end autonomous driving

Ke Guo, Haochen Liu, Xiaojun Wu, Jia Pan, and Chen Lv. ipad: Iterative proposal-centric end-to-end autonomous driv- ing.arXiv preprint arXiv:2505.15111, 2025. 2

work page arXiv 2025
[10]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 4

2016
[11]

GAIA-1: A Generative World Model for Autonomous Driving

Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gian- luca Corrado. Gaia-1: A generative world model for au- tonomous driving.arXiv preprint arXiv:2309.17080, 2023. 2, 3

work page internal anchor Pith review arXiv 2023
[12]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023. 2, 3, 7

2023
[13]

Subjectdrive: Scaling generative data in autonomous driving via subject control

Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, et al. Subjectdrive: Scaling generative data in autonomous driving via subject control. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 3617–3625, 2025. 2

2025
[14]

Available: https://arxiv.org/abs/2311.13549

Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, and Tiancai Wang. Adriver-i: A general world model for autonomous driving. arXiv preprint arXiv:2311.13549, 2023. 3

work page arXiv 2023
[15]

Imagidrive: A unified imagination-and-planning framework for autonomous driving.arXiv preprint arXiv:2508.11428, 2025

Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, and Li Zhang. Imagidrive: A unified imagination-and- planning framework for autonomous driving.arXiv preprint arXiv:2508.11428, 2025. 2, 3

work page arXiv 2025
[16]

Think2drive: Efficient reinforcement learning by thinking with latent world model for autonomous driving (in carla- v2)

Qifeng Li, Xiaosong Jia, Shaobo Wang, and Junchi Yan. Think2drive: Efficient reinforcement learning by thinking with latent world model for autonomous driving (in carla- v2). InEuropean conference on computer vision, pages 142–
[17]

Enhancing end-to-end autonomous driving with latent world model.arXiv preprint arXiv:2406.08481, 2024

Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, and Tieniu Tan. Enhancing end-to-end autonomous driving with latent world model.arXiv preprint arXiv:2406.08481, 2024. 3, 5, 7

work page arXiv 2024
[18]

End-to-end driving with online trajec- tory evaluation via bev world model

Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online trajec- tory evaluation via bev world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27137–27146, 2025. 2

2025
[19]

Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers.arXiv preprint arXiv:2203.17270, 2022

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bev- former: Learning bird’s-eye-view representation from multi- camera images via spatiotemporal transformers.(2022).URL https://arxiv. org/abs/2203.17270, 10, 2022. 4

work page arXiv 2022
[20]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi- target hydra-distillation.arXiv preprint arXiv:2406.06978,

work page internal anchor Pith review arXiv
[21]

Unleashing generalization of end- to-end autonomous driving with controllable long video generation,

Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, et al. Unleashing generalization of end-to-end autonomous driving with controllable long video generation. arXiv preprint arXiv:2406.01349, 2024. 2

work page arXiv 2024
[22]

Trajectory prediction for autonomous driving: Progress, lim- itations, and future directions.Information Fusion, 126: 103588, 2026

Nadya Abdel Madjid, Abdulrahman Ahmad, Murad Me- brahtu, Yousef Babaa, Abdelmoamen Nasser, Sumbal Malik, Bilal Hassan, Naoufel Werghi, Jorge Dias, and Majid Khonji. Trajectory prediction for autonomous driving: Progress, lim- itations, and future directions.Information Fusion, 126: 103588, 2026. 2

2026
[23]

Openscene: 3d scene understanding with open vocabularies

Songyou Peng, Kyle Genova, Chiyu Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser, et al. Openscene: 3d scene understanding with open vocabularies. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 815–824, 2023. 6

2023
[24]

Multi- modal fusion transformer for end-to-end autonomous driv- ing

Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi- modal fusion transformer for end-to-end autonomous driv- ing. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 7077–7087,
[25]

Precog: Prediction conditioned on goals in visual multi-agent settings

Nicholas Rhinehart, Rowan McAllister, Kris Kitani, and Sergey Levine. Precog: Prediction conditioned on goals in visual multi-agent settings. InProceedings of the IEEE/CVF international conference on computer vision, pages 2821– 2830, 2019. 2, 3

2019
[26]

Trajectron++: Dynamically-feasible trajec- tory forecasting with heterogeneous data

Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++: Dynamically-feasible trajec- tory forecasting with heterogeneous data. InEuropean con- ference on computer vision, pages 683–700. Springer, 2020. 3

2020
[27]

Pip: Planning- informed trajectory prediction for autonomous driving

Haoran Song, Wenchao Ding, Yuxuan Chen, Shaojie Shen, Michael Yu Wang, and Qifeng Chen. Pip: Planning- informed trajectory prediction for autonomous driving. In European conference on computer vision, pages 598–614. Springer, 2020. 2, 3

2020
[28]

Learning to predict vehicle trajectories with model-based planning

Haoran Song, Di Luan, Wenchao Ding, Michael Y Wang, and Qifeng Chen. Learning to predict vehicle trajectories with model-based planning. InConference on Robot Learn- ing, pages 1035–1045. PMLR, 2022. 2, 3

2022
[29]

Sparsedrive: End-to-end au- tonomous driving via sparse scene representation

Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Hao- ran Wu, and Sifa Zheng. Sparsedrive: End-to-end au- tonomous driving via sparse scene representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8795–8801. IEEE, 2025. 2

2025
[30]

Multipath++: Efficient information fu- sion and trajectory aggregation for behavior prediction

Balakrishnan Varadarajan, Ahmed Hefny, Avikalp Srivas- tava, Khaled S Refaat, Nigamaa Nayakanti, Andre Cornman, Kan Chen, Bertrand Douillard, Chi Pang Lam, Dragomir Anguelov, et al. Multipath++: Efficient information fu- sion and trajectory aggregation for behavior prediction. In 2022 international conference on robotics and automation (ICRA), pages 7814–...

2022
[31]

Drivedreamer: Towards real-world- drive world models for autonomous driving

Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jia- gang Zhu, and Jiwen Lu. Drivedreamer: Towards real-world- drive world models for autonomous driving. InEuropean conference on computer vision, pages 55–72. Springer, 2024. 2, 3

2024
[32]

Para-drive: Parallelized architecture for real- time autonomous driving

Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Parallelized architecture for real- time autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15449–15458, 2024. 2, 3, 7

2024
[33]

arXiv preprint arXiv:2408.03601 (2024) 13

Chengran Yuan, Zhanqi Zhang, Jiawei Sun, Shuo Sun, Ze- fan Huang, Christina Dao Wen Lee, Dongen Li, Yuhang Han, Anthony Wong, Keng Peng Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024. 7

work page arXiv 2024
[34]

Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving

Shuang Zeng, Xinyuan Chang, Mengwei Xie, Xinran Liu, Yifan Bai, Zheng Pan, Mu Xu, Xing Wei, and Ning Guo. Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving.arXiv preprint arXiv:2505.17685,

work page arXiv
[35]

Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430,

Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, and Jingdong Wang. Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430, 2023. 7

work page arXiv 2023
[36]

Future-aware end-to-end driving: Bidirectional modeling of trajectory planning and scene evolution.arXiv preprint arXiv:2510.11092, 2025

Bozhou Zhang, Nan Song, Jingyu Li, Xiatian Zhu, Jiankang Deng, and Li Zhang. Future-aware end-to-end driving: Bidi- rectional modeling of trajectory planning and scene evolu- tion.arXiv preprint arXiv:2510.11092, 2025. 2, 3

work page arXiv 2025
[37]

Epona: Autoregressive dif- fusion world model for autonomous driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, et al. Epona: Autoregressive dif- fusion world model for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 27220–27230, 2025. 2, 7

2025
[38]

Drivedreamer-2: Llm-enhanced world models for diverse driving video generation

Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, and Xingang Wang. Drivedreamer-2: Llm-enhanced world models for diverse driving video generation. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 10412–10420, 2025. 2

2025
[39]

Diffe2e: Rethinking end-to-end driving with a hy- brid action diffusion and supervised policy.arXiv preprint arXiv:2505.19516, 2025

Rui Zhao, Yuze Fan, Ziguo Chen, Fei Gao, and Zhenhai Gao. Diffe2e: Rethinking end-to-end driving with a hy- brid action diffusion and supervised policy.arXiv preprint arXiv:2505.19516, 2025. 2

work page arXiv 2025
[40]

From forecasting to planning: Policy world model for collaborative state-action prediction.arXiv preprint arXiv:2510.19654, 2025

Zhida Zhao, Talas Fu, Yifan Wang, Lijun Wang, and Huchuan Lu. From forecasting to planning: Policy world model for collaborative state-action prediction.arXiv preprint arXiv:2510.19654, 2025. 2, 3

work page arXiv 2025
[41]

World4drive: End-to-end au- tonomous driving via intention-aware physical latent world model

Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end au- tonomous driving via intention-aware physical latent world model. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 28632–28642, 2025. 2, 7

2025