pith. machine review for the scientific record. sign in

arxiv: 2604.25329 · v1 · submitted 2026-04-28 · 💻 cs.RO

Recognition: unknown

ProDrive: Proactive Planning for Autonomous Driving via Ego-Environment Co-Evolution

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:54 UTC · model grok-4.3

classification 💻 cs.RO
keywords autonomous drivingproactive planningworld modeltrajectory planningscene predictionend-to-end trainingBEV representation
0
0 comments X

The pith

ProDrive lets a driving planner and world model co-evolve so future scene predictions directly refine trajectory choices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous driving planners typically select paths from current observations alone, which can produce myopic and unsafe choices when the environment changes rapidly. ProDrive instead trains a trajectory planner and a scene prediction model together end-to-end. The planner proposes multiple candidate paths and special tokens describing its intent. The prediction model uses those inputs to forecast how the entire scene would evolve along each path. All candidates are scored in parallel and the resulting signals flow back through the network to improve the planner. This creates a closed loop in which planning and scene evolution inform each other, moving beyond reactive decisions. On the NAVSIM benchmark the method shows gains in safety and planning smoothness compared with prior approaches.

Core claim

The paper establishes that bidirectional coupling between a query-centric trajectory planner and a bird's-eye-view world model, obtained by injecting planner features into the world model and scoring all candidate trajectories in parallel while preserving end-to-end gradient flow, enables future outcome assessment to shape planning decisions directly rather than relying on current observations alone.

What carries the argument

The ego-environment co-evolution loop in which the planner supplies candidate trajectories and planning-aware tokens that condition the world model's future scene forecasts, with parallel evaluation allowing outcome scores to update the planner parameters.

If this is right

  • Planning decisions incorporate predicted future scene states rather than current observations alone.
  • End-to-end gradient flow lets the quality of simulated outcomes directly optimize planner parameters.
  • Parallel scoring of all candidate trajectories supports efficient selection of the best path.
  • Joint training yields measurable gains in safety and planning efficiency on NAVSIM v1.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the world model is reliable, the architecture could reduce the need for separate prediction modules in driving stacks.
  • Similar co-evolution of actor and simulator might apply to other sequential decision problems in dynamic settings.
  • Prediction errors over longer horizons could still cause failures, suggesting value in adding uncertainty estimates.
  • The method points toward policies that optimize multi-step outcomes without explicit reward engineering.

Load-bearing premise

The world model must generate sufficiently accurate predictions of future scenes when conditioned on the planner's candidate trajectories and tokens so that the outcome scores meaningfully improve planning decisions.

What would settle it

Showing that the world model's predicted future scenes diverge substantially from real observations, or that removing the planner-feature injection and parallel evaluation produces equal or better results on the same benchmark, would refute the claimed benefit of the coupling.

Figures

Figures reproduced from arXiv: 2604.25329 by Chuyao Fu, Hong Zhang, Jiankun Wang, Shengzhe Gan, Sirui Han, Xiaowei Chi, Yuhan Rui, Zhuoli Ouyang.

Figure 1
Figure 1. Figure 1: From reactive to proactive autonomous driving. (a) Conventional end-to-end planners are reactive, generating trajectories mainly from the current observation without explicitly modeling future scene evolution. (b) Some recent methods use a world model for trajectory reranking, but the planner and world model remain loosely coupled, limiting direct planner-side benefit from future reasoning. (c) In contrast… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ProDrive. Given multi-view images, LiDAR, and ego state, the Ego Module refines learnable ego queries through L Ego Refiner layers to obtain ego tokens, from which candidate trajectories are decoded and transformed into trajectory tokens. Condi￾tioned on these tokens and the current BEV state, the Environment Module performs recurrent BEV future prediction and reward-based trajectory evaluation… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative examples of ProDrive. For each case, the top row shows the front-view observation with the planned trajectory overlaid, and the bottom row shows the corresponding bird’s-eye-view scene with the predicted plan compared against the human trajectory. Across diverse driving scenarios, ProDrive produces safe and foresighted behaviors by anticipating the future motion of surrounding agents and captur… view at source ↗
read the original abstract

End-to-end autonomous driving planners typically generate trajectories from current observations alone. However, real-world driving is highly dynamic, and such reactive planning cannot anticipate future scene evolution, often leading to myopic decisions and safety-critical failures. We propose ProDrive, a world-model-based proactive planning framework that enables ego-environment co-evolution for autonomous driving. ProDrive jointly trains a query-centric trajectory planner and a bird's-eye-view (BEV) world model end-to-end: the planner generates diverse candidate trajectories and planning-aware ego tokens, while the world model predicts future scene evolution conditioned on them. By injecting planner features into the world model and evaluating all candidates in parallel, ProDrive preserves end-to-end gradient flow and allows future outcome assessment to directly shape planning. This bidirectional coupling enables proactive planning beyond current-observation-driven decision-making. Experiments on NAVSIM v1 show that ProDrive outperforms strong baselines in both safety and planning efficiency, while ablations validate the effectiveness of the proposed ego-environment coupling design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes ProDrive, a world-model-based proactive planning framework for autonomous driving. It jointly trains a query-centric trajectory planner and a BEV world model end-to-end: the planner generates diverse candidate trajectories and planning-aware ego tokens that condition the world model's future scene predictions. This bidirectional coupling is intended to enable future outcome assessment to shape planning decisions, preserve end-to-end gradient flow via parallel candidate evaluation, and outperform reactive baselines. Experiments on NAVSIM v1 are claimed to show gains in safety and planning efficiency, supported by ablations on the ego-environment coupling.

Significance. If the empirical claims hold and the world-model predictions remain accurate under planner conditioning, ProDrive could meaningfully advance end-to-end driving by enabling proactive rather than purely reactive planning. The joint training that maintains gradient flow through feature injection and parallel evaluation is a technically attractive design choice, and the explicit ablations on the coupling mechanism provide a clear way to isolate its contribution.

major comments (2)
  1. [Abstract] Abstract: The central claim that ProDrive 'outperforms strong baselines in both safety and planning efficiency' on NAVSIM v1 is presented without any quantitative metrics, baseline names, error bars, or data-exclusion rules. This absence is load-bearing because the proactive benefit is asserted to arise from the co-evolution mechanism, yet no evidence is supplied to quantify the improvement or rule out confounds.
  2. [Abstract] Abstract (and implied methods): The framework relies on the assumption that the BEV world model, when conditioned on planner-generated candidate trajectories plus planning-aware tokens, produces sufficiently accurate future scene forecasts for outcome evaluation to improve planning. No metrics on prediction fidelity (future-frame mIoU, object trajectory error, or collision-prediction AUC) under closed-loop planner conditioning versus open-loop or ground-truth conditioning are reported, leaving the weakest assumption untested and exposing the method to risks of optimistic or mode-collapsed predictions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the technical appeal of the ego-environment co-evolution design. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of results and assumptions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that ProDrive 'outperforms strong baselines in both safety and planning efficiency' on NAVSIM v1 is presented without any quantitative metrics, baseline names, error bars, or data-exclusion rules. This absence is load-bearing because the proactive benefit is asserted to arise from the co-evolution mechanism, yet no evidence is supplied to quantify the improvement or rule out confounds.

    Authors: We agree that the abstract should supply concrete quantitative support for the performance claims. In the revised manuscript we will expand the abstract to include the primary NAVSIM v1 metrics (e.g., safety score, planning efficiency), the specific baselines compared, and any available error bars or statistical details from the experimental section. This will make the evidence for the co-evolution benefit explicit and address potential confounds. revision: yes

  2. Referee: [Abstract] Abstract (and implied methods): The framework relies on the assumption that the BEV world model, when conditioned on planner-generated candidate trajectories plus planning-aware tokens, produces sufficiently accurate future scene forecasts for outcome evaluation to improve planning. No metrics on prediction fidelity (future-frame mIoU, object trajectory error, or collision-prediction AUC) under closed-loop planner conditioning versus open-loop or ground-truth conditioning are reported, leaving the weakest assumption untested and exposing the method to risks of optimistic or mode-collapsed predictions.

    Authors: We acknowledge that the current manuscript does not report dedicated prediction-fidelity metrics for the world model under planner-specific conditioning. While the joint training and coupling ablations provide indirect support for the overall framework, we agree that direct evaluation of forecast accuracy is necessary to substantiate the core assumption. In the revision we will add experiments reporting future-frame mIoU, object trajectory error, and related metrics under closed-loop planner conditioning, open-loop, and ground-truth settings. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation remains self-contained

full rationale

The paper presents a joint end-to-end training architecture in which a planner generates candidate trajectories and tokens that condition a BEV world model, with parallel evaluation preserving gradient flow. No equations, fitted parameters, or self-citations are shown that would reduce the claimed proactive benefit or future-outcome shaping to a quantity defined by the same inputs. The central mechanism (conditioning + parallel evaluation) is an architectural choice whose validity is tested empirically on NAVSIM v1 rather than asserted by definition or prior self-work. The weakest assumption (world-model fidelity under planner conditioning) is acknowledged as external to the derivation chain and does not create a self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the framework implicitly relies on standard supervised learning assumptions for world-model training and trajectory supervision that are not detailed here.

pith-pipeline@v0.9.0 · 5495 in / 1134 out tokens · 43586 ms · 2026-05-07T15:54:50.861084+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 16 canonical work pages · 4 internal anchors

  1. [1]

    NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

    Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based plan- ning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021. 6

  2. [2]

    VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

    Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.arXiv preprint arXiv:2402.13243,

  3. [3]

    S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving

    Weihuang Chen, Fangfang Wang, and Hongbin Sun. S2tnet: Spatio-temporal transformer networks for trajectory predic- tion in autonomous driving. InAsian conference on machine learning, pages 454–469. PMLR, 2021. 3

  4. [4]

    Transfuser: Imitation with transformer-based sensor fusion for autonomous driv- ing.IEEE transactions on pattern analysis and machine in- telligence, 45(11):12878–12895, 2022

    Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driv- ing.IEEE transactions on pattern analysis and machine in- telligence, 45(11):12878–12895, 2022. 2, 3, 7

  5. [5]

    Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024

    Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024. 2, 6

  6. [6]

    Carla: An open urban driv- ing simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto- nio Lopez, and Vladlen Koltun. Carla: An open urban driv- ing simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 3

  7. [7]

    Orion: A holistic end-to- end autonomous driving framework by vision-language in- structed action generation

    Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to- end autonomous driving framework by vision-language in- structed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823– 24834, 2025. 2

  8. [8]

    Learning to drive from a world model

    Mitchell Goff, Greg Hogan, George Hotz, Armand du Parc Locmaria, Kacper Raczy, Harald Sch ¨afer, Adeeb Shi- hadeh, Weixing Zhang, and Yassine Yousfi. Learning to drive from a world model. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 1964–1973,

  9. [9]

    ipad: Iterative proposal-centric end-to-end autonomous driving

    Ke Guo, Haochen Liu, Xiaojun Wu, Jia Pan, and Chen Lv. ipad: Iterative proposal-centric end-to-end autonomous driv- ing.arXiv preprint arXiv:2505.15111, 2025. 2

  10. [10]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 4

  11. [11]

    GAIA-1: A Generative World Model for Autonomous Driving

    Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gian- luca Corrado. Gaia-1: A generative world model for au- tonomous driving.arXiv preprint arXiv:2309.17080, 2023. 2, 3

  12. [12]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023. 2, 3, 7

  13. [13]

    Subjectdrive: Scaling generative data in autonomous driving via subject control

    Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, et al. Subjectdrive: Scaling generative data in autonomous driving via subject control. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 3617–3625, 2025. 2

  14. [14]

    Available: https://arxiv.org/abs/2311.13549

    Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, and Tiancai Wang. Adriver-i: A general world model for autonomous driving. arXiv preprint arXiv:2311.13549, 2023. 3

  15. [15]

    Imagidrive: A unified imagination-and-planning framework for autonomous driving.arXiv preprint arXiv:2508.11428, 2025

    Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, and Li Zhang. Imagidrive: A unified imagination-and- planning framework for autonomous driving.arXiv preprint arXiv:2508.11428, 2025. 2, 3

  16. [16]

    Think2drive: Efficient reinforcement learning by thinking with latent world model for autonomous driving (in carla- v2)

    Qifeng Li, Xiaosong Jia, Shaobo Wang, and Junchi Yan. Think2drive: Efficient reinforcement learning by thinking with latent world model for autonomous driving (in carla- v2). InEuropean conference on computer vision, pages 142–

  17. [17]

    Enhancing end-to-end autonomous driving with latent world model.arXiv preprint arXiv:2406.08481, 2024

    Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, and Tieniu Tan. Enhancing end-to-end autonomous driving with latent world model.arXiv preprint arXiv:2406.08481, 2024. 3, 5, 7

  18. [18]

    End-to-end driving with online trajec- tory evaluation via bev world model

    Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online trajec- tory evaluation via bev world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27137–27146, 2025. 2

  19. [19]

    Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers.arXiv preprint arXiv:2203.17270, 2022

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bev- former: Learning bird’s-eye-view representation from multi- camera images via spatiotemporal transformers.(2022).URL https://arxiv. org/abs/2203.17270, 10, 2022. 4

  20. [20]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi- target hydra-distillation.arXiv preprint arXiv:2406.06978,

  21. [21]

    Unleashing generalization of end- to-end autonomous driving with controllable long video generation,

    Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, et al. Unleashing generalization of end-to-end autonomous driving with controllable long video generation. arXiv preprint arXiv:2406.01349, 2024. 2

  22. [22]

    Trajectory prediction for autonomous driving: Progress, lim- itations, and future directions.Information Fusion, 126: 103588, 2026

    Nadya Abdel Madjid, Abdulrahman Ahmad, Murad Me- brahtu, Yousef Babaa, Abdelmoamen Nasser, Sumbal Malik, Bilal Hassan, Naoufel Werghi, Jorge Dias, and Majid Khonji. Trajectory prediction for autonomous driving: Progress, lim- itations, and future directions.Information Fusion, 126: 103588, 2026. 2

  23. [23]

    Openscene: 3d scene understanding with open vocabularies

    Songyou Peng, Kyle Genova, Chiyu Jiang, Andrea Tagliasacchi, Marc Pollefeys, Thomas Funkhouser, et al. Openscene: 3d scene understanding with open vocabularies. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 815–824, 2023. 6

  24. [24]

    Multi- modal fusion transformer for end-to-end autonomous driv- ing

    Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi- modal fusion transformer for end-to-end autonomous driv- ing. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 7077–7087,

  25. [25]

    Precog: Prediction conditioned on goals in visual multi-agent settings

    Nicholas Rhinehart, Rowan McAllister, Kris Kitani, and Sergey Levine. Precog: Prediction conditioned on goals in visual multi-agent settings. InProceedings of the IEEE/CVF international conference on computer vision, pages 2821– 2830, 2019. 2, 3

  26. [26]

    Trajectron++: Dynamically-feasible trajec- tory forecasting with heterogeneous data

    Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++: Dynamically-feasible trajec- tory forecasting with heterogeneous data. InEuropean con- ference on computer vision, pages 683–700. Springer, 2020. 3

  27. [27]

    Pip: Planning- informed trajectory prediction for autonomous driving

    Haoran Song, Wenchao Ding, Yuxuan Chen, Shaojie Shen, Michael Yu Wang, and Qifeng Chen. Pip: Planning- informed trajectory prediction for autonomous driving. In European conference on computer vision, pages 598–614. Springer, 2020. 2, 3

  28. [28]

    Learning to predict vehicle trajectories with model-based planning

    Haoran Song, Di Luan, Wenchao Ding, Michael Y Wang, and Qifeng Chen. Learning to predict vehicle trajectories with model-based planning. InConference on Robot Learn- ing, pages 1035–1045. PMLR, 2022. 2, 3

  29. [29]

    Sparsedrive: End-to-end au- tonomous driving via sparse scene representation

    Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Hao- ran Wu, and Sifa Zheng. Sparsedrive: End-to-end au- tonomous driving via sparse scene representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8795–8801. IEEE, 2025. 2

  30. [30]

    Multipath++: Efficient information fu- sion and trajectory aggregation for behavior prediction

    Balakrishnan Varadarajan, Ahmed Hefny, Avikalp Srivas- tava, Khaled S Refaat, Nigamaa Nayakanti, Andre Cornman, Kan Chen, Bertrand Douillard, Chi Pang Lam, Dragomir Anguelov, et al. Multipath++: Efficient information fu- sion and trajectory aggregation for behavior prediction. In 2022 international conference on robotics and automation (ICRA), pages 7814–...

  31. [31]

    Drivedreamer: Towards real-world- drive world models for autonomous driving

    Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jia- gang Zhu, and Jiwen Lu. Drivedreamer: Towards real-world- drive world models for autonomous driving. InEuropean conference on computer vision, pages 55–72. Springer, 2024. 2, 3

  32. [32]

    Para-drive: Parallelized architecture for real- time autonomous driving

    Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Parallelized architecture for real- time autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15449–15458, 2024. 2, 3, 7

  33. [33]

    arXiv preprint arXiv:2408.03601 (2024) 13

    Chengran Yuan, Zhanqi Zhang, Jiawei Sun, Shuo Sun, Ze- fan Huang, Christina Dao Wen Lee, Dongen Li, Yuhang Han, Anthony Wong, Keng Peng Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024. 7

  34. [34]

    Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving

    Shuang Zeng, Xinyuan Chang, Mengwei Xie, Xinran Liu, Yifan Bai, Zheng Pan, Mu Xu, Xing Wei, and Ning Guo. Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving.arXiv preprint arXiv:2505.17685,

  35. [35]

    Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430,

    Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, and Jingdong Wang. Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430, 2023. 7

  36. [36]

    Future-aware end-to-end driving: Bidirectional modeling of trajectory planning and scene evolution.arXiv preprint arXiv:2510.11092, 2025

    Bozhou Zhang, Nan Song, Jingyu Li, Xiatian Zhu, Jiankang Deng, and Li Zhang. Future-aware end-to-end driving: Bidi- rectional modeling of trajectory planning and scene evolu- tion.arXiv preprint arXiv:2510.11092, 2025. 2, 3

  37. [37]

    Epona: Autoregressive dif- fusion world model for autonomous driving

    Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, et al. Epona: Autoregressive dif- fusion world model for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 27220–27230, 2025. 2, 7

  38. [38]

    Drivedreamer-2: Llm-enhanced world models for diverse driving video generation

    Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, and Xingang Wang. Drivedreamer-2: Llm-enhanced world models for diverse driving video generation. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 10412–10420, 2025. 2

  39. [39]

    Diffe2e: Rethinking end-to-end driving with a hy- brid action diffusion and supervised policy.arXiv preprint arXiv:2505.19516, 2025

    Rui Zhao, Yuze Fan, Ziguo Chen, Fei Gao, and Zhenhai Gao. Diffe2e: Rethinking end-to-end driving with a hy- brid action diffusion and supervised policy.arXiv preprint arXiv:2505.19516, 2025. 2

  40. [40]

    From forecasting to planning: Policy world model for collaborative state-action prediction.arXiv preprint arXiv:2510.19654, 2025

    Zhida Zhao, Talas Fu, Yifan Wang, Lijun Wang, and Huchuan Lu. From forecasting to planning: Policy world model for collaborative state-action prediction.arXiv preprint arXiv:2510.19654, 2025. 2, 3

  41. [41]

    World4drive: End-to-end au- tonomous driving via intention-aware physical latent world model

    Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end au- tonomous driving via intention-aware physical latent world model. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 28632–28642, 2025. 2, 7