Recognition: 3 theorem links
· Lean TheoremDriveFuture: Future-Aware Latent World Models for Autonomous Driving
Pith reviewed 2026-05-12 02:55 UTC · model grok-4.3
The pith
Conditioning current latent states on future world states improves trajectory planning in autonomous driving.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DriveFuture predicts future latent world states from the current latent state and ego action, then refines the prediction against the ground-truth future latent state via cross-attention. The resulting future-aware latent serves as an explicit condition for a diffusion-based trajectory planner. During inference the model substitutes its own predicted future latent for the ground-truth version.
What carries the argument
Cross-attention refinement of predicted future latents against ground-truth futures, which produces a planning-oriented future-aware latent used to condition the trajectory planner.
If this is right
- Current and future features become less entangled because the refinement step forces the model to treat futures as an explicit conditioning signal.
- The diffusion planner receives a latent that already encodes planning-relevant foresight rather than raw scene dynamics.
- Performance remains high when ground-truth futures are replaced by model predictions, showing the training procedure transfers to deployment.
- The same conditioning pattern can be applied to other latent world models that currently treat future states only as auxiliary targets.
Where Pith is reading between the lines
- The same future-conditioning pattern might reduce the need for very long prediction horizons by letting short-term futures already shape immediate actions.
- Extending the refinement step to multiple future time steps could allow planners to balance short-term safety with longer-term goals.
- The approach may generalize to non-driving sequential tasks where decisions must anticipate downstream states without explicit supervision on those states.
Load-bearing premise
The cross-attention step during training creates a latent encoding that still extracts useful future information when the model must rely on its own imperfect predictions at inference time.
What would settle it
An ablation that removes the cross-attention refinement step and shows no drop in planning performance on the same driving benchmarks would falsify the claim that future conditioning is the key mechanism.
Figures
read the original abstract
Existing latent world models for autonomous driving have opened a promising path toward future-aware driving intelligence. However, they typically treat future latent states as prediction targets or auxiliary signals, rather than directly conditioning trajectory planning. This can entangle current and future features in latent space. In this work, we propose DriveFuture, a future-aware latent world modeling framework for autonomous driving that explicitly learns planning-oriented foresight by conditioning the current latent state modeling process on future world states. Specifically, during training, the model first predicts future latent world states from the current latent state and ego action, and then refines the prediction against the ground-truth future latent state via cross-attention. The resulting future-aware latent serves as an explicit condition for a diffusion-based trajectory planner. During inference, DriveFuture conditions on the predicted future latent state instead of the ground-truth future state. DriveFuture achieves SOTA performance on the public NAVSIM benchmarks, reaching \textbf{55.5} EPDMS on NAVSIM-v2 {\textcolor{blue}{\textit{navhard}}}, \textbf{89.9} EPDMS on NAVSIM-v2 {\textcolor{blue}{\textit{navtest}}}, and \textbf{90.7} PDMS on NAVSIM-v1 {\textcolor{blue}{\textit{navtest}}}, respectively. These results suggest that the key to latent world modeling lies not merely in simulating future states, but more importantly in conditioning current decision-making on future states. Notably, as of April 2026, DriveFuture ranks \textbf{1st} on the \href{https://huggingface.co/spaces/AGC2025/e2e-driving-navhard}{NAVSIM-v2 {\textcolor{blue}{\textit{navhard}}}} leaderboard and achieves SOTA performance on \href{https://huggingface.co/spaces/AGC2024-P/e2e-driving-navtest}{NAVSIM-v1 {\textcolor{blue}{\textit{navtest}}}}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DriveFuture, a latent world model for autonomous driving that predicts future latent states from the current latent and ego action, then refines the prediction via cross-attention against ground-truth future latents exclusively during training. The resulting future-aware latent explicitly conditions a diffusion-based trajectory planner. At inference the planner receives only the raw predicted future latent. The method reports SOTA EPDMS scores of 55.5 on NAVSIM-v2 navhard, 89.9 on NAVSIM-v2 navtest, and 90.7 PDMS on NAVSIM-v1 navtest, claiming first place on the navhard leaderboard and arguing that the key advance is conditioning current decisions on future states rather than treating futures only as targets.
Significance. If the reported gains prove robust to the train-inference mismatch and are attributable to the explicit future-conditioning mechanism, the work would offer a concrete demonstration that foresight should directly shape current planning in latent world models. The SOTA numbers on public NAVSIM benchmarks would then indicate a practical step toward more anticipatory end-to-end driving policies.
major comments (2)
- Abstract and §3 (method description): The cross-attention refinement is performed only against ground-truth future latents during training, yet inference conditions the planner on unrefined predicted latents. This introduces an unquantified distribution shift. No measurements of latent prediction error, cosine similarity between refined and raw latents, or error propagation to the planner are supplied, leaving the central claim that future-state conditioning is the key driver unverified.
- Experiments section and results tables: The SOTA EPDMS figures (55.5 navhard, etc.) are presented without an ablation that removes the GT-refinement step while keeping the future prediction and diffusion planner fixed. Without this control, it remains possible that the gains arise from the latent encoder, diffusion architecture, or training data rather than the future-aware conditioning mechanism.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of the training-inference consistency and the need for stronger isolation of the future-conditioning contribution. We address each major comment below and outline revisions to strengthen the paper.
read point-by-point responses
-
Referee: Abstract and §3 (method description): The cross-attention refinement is performed only against ground-truth future latents during training, yet inference conditions the planner on unrefined predicted latents. This introduces an unquantified distribution shift. No measurements of latent prediction error, cosine similarity between refined and raw latents, or error propagation to the planner are supplied, leaving the central claim that future-state conditioning is the key driver unverified.
Authors: We acknowledge the train-inference discrepancy introduced by the training-only cross-attention refinement. The refinement step is intended to improve the quality of the learned future latent representations by aligning predictions more closely with ground-truth futures during optimization, thereby enabling the model to produce better raw predictions at inference time. While the original submission did not include quantitative analysis of latent prediction error or cosine similarity, the strong benchmark results suggest the approach is effective. To directly address the concern and verify the central claim, we will add measurements of latent prediction error, cosine similarity between refined and raw predicted latents, and an analysis of error propagation to the planner in the revised manuscript. revision: yes
-
Referee: Experiments section and results tables: The SOTA EPDMS figures (55.5 navhard, etc.) are presented without an ablation that removes the GT-refinement step while keeping the future prediction and diffusion planner fixed. Without this control, it remains possible that the gains arise from the latent encoder, diffusion architecture, or training data rather than the future-aware conditioning mechanism.
Authors: We agree that an ablation isolating the GT-refinement step is necessary to attribute performance gains specifically to the future-aware conditioning mechanism. In the revised version, we will include a controlled ablation that disables the cross-attention refinement during training while retaining the future prediction module and diffusion-based planner unchanged. This will allow direct comparison of EPDMS scores and clarify whether the explicit future-state conditioning is the primary driver of the reported SOTA results. revision: yes
Circularity Check
No circularity: empirical architecture validated on benchmarks
full rationale
The paper proposes DriveFuture as an architectural framework: it predicts future latent states from current latent + ego action, applies cross-attention refinement against ground-truth future latents exclusively during training to produce a future-aware latent, and conditions a diffusion planner on that latent. At inference the planner uses the raw predicted latent. The central claim is that explicitly conditioning current decision-making on future states (rather than treating futures only as targets) yields better planning, supported by reported SOTA EPDMS scores on public NAVSIM benchmarks. No equations, parameter-fitting steps, uniqueness theorems, or self-citation chains appear in the provided text; the result is presented as an empirical engineering outcome rather than a derivation that reduces to its inputs by construction. The training/inference distinction is explicitly stated, so no load-bearing step collapses into a tautology or fitted input renamed as prediction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearthe model first predicts future latent world states from the current latent state and ego action, and then refines the prediction against the ground-truth future latent state via cross-attention. The resulting future-aware latent serves as an explicit condition for a diffusion-based trajectory planner.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearLatentAlign anneals the planning condition from Zc t+T towards ˆZt+T over training
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear8-step trajectory over a 4 second horizon with a 0.5 second interval
Reference graph
Works this paper leans on
-
[1]
Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking,
Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, and Kashyap Chitta. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking,
- [2]
-
[3]
Feiyang Jia, Caiyan Jia, Ziying Song, Zhicheng Bao, Lin Liu, Shaoqing Xu, Yan Gong, Lei Yang, Xinyu Zhang, Bin Sun, et al. Progressive robustness-aware world models in autonomous driving: A review and outlook.Authorea Preprints, 2025
work page 2025
-
[4]
A survey of world models for autonomous driving.arXiv preprint arXiv:2501.11260, 2025
Tuo Feng, Wenguan Wang, and Yi Yang. A survey of world models for autonomous driving. arXiv preprint arXiv:2501.11260, 2025
-
[5]
Sifan Tu, Xin Zhou, Dingkang Liang, Xingyu Jiang, Yumeng Zhang, Xiaofan Li, and Xiang Bai. The role of world models in shaping autonomous driving: A comprehensive survey.arXiv preprint arXiv:2502.10498, 2025
-
[6]
Robustness-aware 3d object detection in autonomous driving: A review and outlook
Ziying Song, Lin Liu, Feiyang Jia, Yadan Luo, Caiyan Jia, Guoxin Zhang, Lei Yang, and Li Wang. Robustness-aware 3d object detection in autonomous driving: A review and outlook. IEEE Transactions on Intelligent Transportation Systems, 25(11):15407–15436, 2024
work page 2024
-
[7]
arXiv preprint arXiv:2603.01063 (2026) DVGT-2 19
Yuechen Luo, Qimao Chen, Fang Li, Shaoqing Xu, Jaxin Liu, Ziying Song, Zhi-xin Yang, and Fuxi Wen. Unleashing vla potentials in autonomous driving via explicit learning from failures. arXiv preprint arXiv:2603.01063, 2026
-
[8]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, and Hongyang Li. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17853–17862, June 2023
work page 2023
-
[9]
Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving
Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12037–12047, 2025
work page 2025
-
[10]
Sparsedrive: End-to-end autonomous driving via sparse scene representation
Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Haoran Wu, and Sifa Zheng. Sparsedrive: End-to-end autonomous driving via sparse scene representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8795–8801. IEEE, 2025
work page 2025
-
[11]
Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1602–1611, 2025
work page 2025
-
[12]
Lin Liu, Caiyan Jia, Guanyi Yu, Ziying Song, JunQiao Li, Feiyang Jia, Peiliang Wu, Xiaoshuai Hao, and Yandan Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025
-
[13]
FocalAD: Local Motion Planning for End-to-End Autonomous Driving
Bin Sun, Boao Zhang, Jiayi Lu, Xinjie Feng, Jiachen Shang, Rui Cao, Mengchao Zheng, Chuanye Wang, Shichun Yang, Yaoguang Cao, et al. Focalad: Local motion planning for end-to-end autonomous driving.arXiv preprint arXiv:2506.11419, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving
Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, and Yadan Luo. Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22432–22441, 2025
work page 2025
-
[15]
Bin Suna, Yaoguang Caob, Yan Wanga, Rui Wanga, Jiachen Shanga, Xiejie Fenga, Jiayi Lu, Jia Shi, Shichun Yang, Xiaoyu Yane, et al. Minddrive: An all-in-one framework bridging world models and vision-language model for end-to-end autonomous driving.arXiv preprint arXiv:2512.04441, 2025. 10
-
[16]
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.arXiv preprint arXiv:2402.13243, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.arXiv preprint arXiv:2506.08052, 2025
-
[18]
Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, and Tieniu Tan. Enhancing end-to-end autonomous driving with latent world model.arXiv preprint arXiv:2406.08481, 2024
-
[19]
World4drive: End-to-end autonomous driving via intention-aware physical latent world model
Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 28632–28642, 2025
work page 2025
-
[20]
Worldrft: Latent world model planning with reinforcement fine-tuning for autonomous driving
Pengxuan Yang, Ben Lu, Zhongpu Xia, Chao Han, Yinfeng Gao, Teng Zhang, Kun Zhan, XianPeng Lang, Yupeng Zheng, and Qichao Zhang. Worldrft: Latent world model planning with reinforcement fine-tuning for autonomous driving. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 11649–11657, 2026
work page 2026
-
[21]
Lin Liu, Ziying Song, Caiyan Jia, Hangjun Ye, Xiaoshuai Hao, Long Chen, et al. Driveworld- vla: Unified latent-space world modeling with vision-language-action for autonomous driving. arXiv preprint arXiv:2602.06521, 2026
-
[22]
Yingyan Li, Shuyao Shang, Weisong Liu, Bing Zhan, Haochen Wang, Yuqi Wang, Yuntao Chen, Xiaoman Wang, Yasong An, Chufeng Tang, et al. Drivevla-w0: World models amplify data scaling law in autonomous driving.arXiv preprint arXiv:2510.12796, 2025
-
[23]
DriveLaW:Unifying Planning and Video Generation in a Latent Driving World
Tianze Xia, Yongkang Li, Lijun Zhou, Jingfeng Yao, Kaixin Xiong, Haiyang Sun, Bing Wang, Kun Ma, Guang Chen, Hangjun Ye, et al. Drivelaw: Unifying planning and video generation in a latent driving world.arXiv preprint arXiv:2512.23421, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Driveworld: 4d pre-trained scene understanding via world models for autonomous driving
Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, et al. Driveworld: 4d pre-trained scene understanding via world models for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15522–15533, 2024
work page 2024
-
[25]
GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023
work page internal anchor Pith review arXiv 2023
-
[26]
Drive- dreamer: Towards real-world-drive world models for autonomous driving
Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, and Jiwen Lu. Drive- dreamer: Towards real-world-drive world models for autonomous driving. InEuropean confer- ence on computer vision, pages 55–72. Springer, 2024
work page 2024
-
[27]
Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, and Hongyang Li. Vista: A generalizable driving world model with high fidelity and versatile controllability.Advances in Neural Information Processing Systems, 37:91560–91596, 2024
work page 2024
-
[28]
Epona: Autoregressive diffusion world model for autonomous driving
Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, et al. Epona: Autoregressive diffusion world model for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27220–27230, 2025
work page 2025
-
[29]
ReSim: Reliable World Simulation for Autonomous Driving
Jiazhi Yang, Kashyap Chitta, Shenyuan Gao, Long Chen, Yuqian Shao, Xiaosong Jia, Hongyang Li, Andreas Geiger, Xiangyu Yue, and Li Chen. Resim: Reliable world simulation for au- tonomous driving.arXiv preprint arXiv:2506.09981, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
Zhuoran Yang and Yanyong Zhang. Consisdrive: Identity-preserving driving world models for video generation by instance mask.arXiv preprint arXiv:2602.03213, 2026. 11
-
[31]
Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, and Zhaoxiang Zhang. Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14749–14759, 2024
work page 2024
-
[32]
Available: https://arxiv.org/abs/2311.13549
Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, and Tiancai Wang. Adriver-i: A general world model for autonomous driving.arXiv preprint arXiv:2311.13549, 2023
-
[33]
Adawm: Adaptive world model based planning for autonomous driving
Hang Wang, Xin Ye, Feng Tao, Chenbin Pan, Abhirup Mallik, Burhaneddin Yaman, Liu Ren, and Junshan Zhang. Adawm: Adaptive world model based planning for autonomous driving. arXiv preprint arXiv:2501.13072, 2025
-
[34]
Occworld: Learning a 3d occupancy world model for autonomous driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, and Jiwen Lu. Occworld: Learning a 3d occupancy world model for autonomous driving. InEuropean conference on computer vision, pages 55–72. Springer, 2024
work page 2024
-
[35]
Yu Yang, Jianbiao Mei, Yukai Ma, Siliang Du, Wenqing Chen, Yijie Qian, Yuxiang Feng, and Yong Liu. Driving in the occupancy world: Vision-centric 4d occupancy forecasting and planning via world models for autonomous driving. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 9327–9335, 2025
work page 2025
-
[36]
Occsora: 4d occupancy generation models as world simulators for autonomous driving,
Lening Wang, Wenzhao Zheng, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, and Jiwen Lu. Occsora: 4d occupancy generation models as world simulators for autonomous driving. arXiv preprint arXiv:2405.20337, 2024
-
[37]
Occllama: An occupancy- language-action generative world model for autonomous driving,
Julong Wei, Shanshuai Yuan, Pengfei Li, Qingda Hu, Zhongxue Gan, and Wenchao Ding. Occllama: An occupancy-language-action generative world model for autonomous driving. arXiv preprint arXiv:2409.03272, 2024
-
[38]
Bevworld: A multimodal world model for autonomous driving via unified bev latent space,
Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiaofan Li, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, and Haifeng Wang. Bevworld: A multimodal world simulator for autonomous driving via scene-level bev latents.arXiv preprint arXiv:2407.05679, 2024
-
[39]
End-to-end driving with online trajectory evaluation via bev world model
Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online trajectory evaluation via bev world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27137–27146, 2025
work page 2025
-
[40]
Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022
work page 2022
-
[41]
Vad: Vectorized scene representation for efficient autonomous driving
Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023
work page 2023
-
[42]
Dongyang Xu, Haokun Li, Qingfan Wang, Ziying Song, Lei Chen, and Hanming Deng. M2da: Multi-modal fusion transformer incorporating driver attention for autonomous driving.arXiv preprint arXiv:2403.12552, 2024
-
[43]
Mingzhe Guo, Zhipeng Zhang, Yuan He, Ke Wang, Liping Jing, and Haibin Ling. End- to-end autonomous driving without costly modularization and 3d manual annotation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[44]
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024
work page internal anchor Pith review arXiv 2024
-
[45]
Lin Liu, Caiyan Jia, Ziying Song, Hongyu Pan, Bencheng Liao, Wenchao Sun, Yongchang Zhang, Lei Yang, Yandan Luo, et al. Fully unified motion planning for end-to-end autonomous driving.arXiv preprint arXiv:2504.12667, 2025. 12
-
[46]
Genad: Gen- erative end-to-end autonomous driving
Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, and Long Chen. Genad: Gen- erative end-to-end autonomous driving. InEuropean Conference on Computer Vision, pages 87–104. Springer, 2024
work page 2024
-
[47]
Bridging past and future: End-to-end autonomous driving with historical prediction and planning
Bozhou Zhang, Nan Song, Xin Jin, and Li Zhang. Bridging past and future: End-to-end autonomous driving with historical prediction and planning. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6854–6863, 2025
work page 2025
-
[48]
Haochen Liu, Tianyu Li, Haohan Yang, Li Chen, Caojun Wang, Ke Guo, Haochen Tian, Hongchen Li, Hongyang Li, and Chen Lv. Reinforced refinement with self-aware expansion for end-to-end autonomous driving.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
work page 2026
-
[49]
Ziying Song, Lin Liu, Hongyu Pan, Bencheng Liao, Mingzhe Guo, Lei Yang, Yongchang Zhang, Shaoqing Xu, Caiyan Jia, and Yadan Luo. Diver: Reinforced diffusion breaks imitation bottlenecks in end-to-end autonomous driving.arXiv preprint arXiv:2507.04049, 2025
-
[50]
Zhenxin Li, Wenhao Yao, Zi Wang, Xinglong Sun, Joshua Chen, Nadine Chang, Maying Shen, Zuxuan Wu, Shiyi Lan, and Jose M Alvarez. Generalized trajectory scoring for end-to-end multimodal planning.arXiv preprint arXiv:2506.06664, 2025
-
[51]
OpenScene Contributors. Openscene: The largest up-to-date 3d occupancy prediction bench- mark in autonomous driving.https://github.com/OpenDriveLab/OpenScene, 2023
work page 2023
-
[52]
arXiv preprint arXiv:2106.11810 (2021) 3, 7
Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. Nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles, 2022. URLhttps://arxiv.org/abs/2106.11810
-
[53]
arXiv preprint arXiv:2506.04218 (2025)
Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, et al. Pseudo-simulation for autonomous driving.arXiv preprint arXiv:2506.04218, 2025
-
[54]
Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024
-
[55]
arXiv preprint arXiv:2506.06659 (2025)
Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M. Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning.arXiv preprint arXiv:2506.06659, 2025
- [56]
-
[57]
H. Tian, T. Li, H. Liu, J. Yang, Y . Qiu, G. Li, J. Wang, Y . Gao, Z. Zhang, et al. Simscale: Learning to drive via real-world simulation at scale.arXiv preprint arXiv:2511.23369, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[58]
Driving on registers.arXiv preprint arXiv:2601.05083, 2026
Ellington Kirby, Alexandre Boulch, Yihong Xu, Yuan Yin, Gilles Puy, Eloi Zablocki, Andrei Bursuc, Spyros Gidaris, Renaud Marlet, Florent Bartoccioni, Anh-Quan Cao, Nermin Samet, Tuan-Hung Vu, and Matthieu Cord. Driving on registers.arXiv preprint arXiv:2601.05083, 2026
-
[59]
Zewei Zhou, Ruining Yang, Xuewei Qi, Yiluan Guo, Sherry X. Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, et al. Spanvla: Efficient action bridging and learning from negative- recovery samples for vision-language-action model.arXiv preprint arXiv:2604.19710, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[60]
Anqing Jiang, Yu Gao, Zhigang Sun, Yiru Wang, Jijun Wang, Jinghao Chai, Qian Cao, Yuweng Heng, Hao Jiang, Zongzheng Zhang, Xianda Guo, Hao Sun, and Hao Zhao. Diffvla: Vision- language guided diffusion planning for autonomous driving.arXiv preprint arXiv:2505.19381, 2025
-
[61]
Kailin Li, Zhenxin Li, Shiyi Lan, Y . Xie, Z. Zhang, J. Liu, Z. Wu, Z. Yu, and Jose M. Alvarez. Hydra-mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv e-prints, 2025. 13
work page 2025
-
[62]
arXiv preprint arXiv:2504.19580 (2025)
Renju Feng, Ning Xi, Duanfeng Chu, Rukang Wang, Zejian Deng, Anzheng Wang, Liping Lu, Jinxiang Wang, Yanjun Huang, et al. Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving.arXiv preprint arXiv:2504.19580, 2025
-
[63]
Jialv Zou, Shaoyu Chen, Bencheng Liao, Zhiyu Zheng, Yuehao Song, Lefei Zhang, Qian Zhang, Wenyu Liu, et al. Diffusiondrivev2: Reinforcement learning-constrained truncated diffusion modeling in end-to-end autonomous driving.arXiv preprint arXiv:2512.07745, 2025
-
[64]
Latent-wam: Latent world action modeling for end-to-end autonomous driving
Linbo Wang, Yupeng Zheng, Qiang Chen, Shiwei Li, Yichen Zhang, Zebin Xing, Qichao Zhang, Xiang Li, Deheng Qian, et al. Latent-wam: Latent world action modeling for end-to-end autonomous driving.arXiv preprint arXiv:2603.24581, 2026
-
[65]
PARA-Drive: Parallelized architecture for real-time autonomous driving
Xinshuo Weng et al. PARA-Drive: Parallelized architecture for real-time autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[66]
C. Yuan, Z. Zhang, J. Sun, S. Sun, Z. Huang, C. D. W. Lee, D. Li, Y . Han, A. Wong, K. P. Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba. In International Symposium on Robotics Research, 2024
work page 2024
-
[67]
Zewei Zhou, Tianhui Cai, Seth Z. Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.arXiv preprint arXiv:2506.13757, 2025
work page internal anchor Pith review arXiv 2025
-
[68]
Transfuser: Imitation with transformer-based sensor fu- sion for autonomous driving,
Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving, 2022. URL https://arxiv.org/abs/2205.15997
-
[69]
Wenchao Sun, Xuewu Lin, Keyu Chen, Zixiang Pei, Xiang Li, Yining Shi, and Sifa Zheng. Sparsedrivev2: Scoring is all you need for end-to-end autonomous driving.arXiv preprint arXiv:2603.29163, 2026. 14 A Experiment Details A.1 Datasets and Benchmarks Datasets.We train and evaluate DriveFuture on the nuPlan (OpenScene) data used by the public NAVSIMbenchmar...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.