Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving
Pith reviewed 2026-05-21 04:41 UTC · model grok-4.3
The pith
Cognitive-physical RL for driving distills VLM knowledge into BEV encoder and adds action-conditioned future prediction for safer policies
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a cognitive-physical reinforcement learning framework called CoPhy advances autonomous driving by first distilling VLM knowledge into the BEV encoder and discarding the VLM to keep cognitive ability at zero inference cost while exposing a language interface, second building an auto-regressive BEV world model that explicitly predicts future semantic maps conditioned on candidate actions to derive interpretable safety metrics, and third optimizing the driving policy via GRPO with a dual-reward mechanism in which physical rewards from BEV rollouts enforce hard safety constraints and cognitive rewards ensure intent compliance, yielding state-of-the-art performance on NA
What carries the argument
The dual infrastructure of a distilled cognitive BEV encoder that retains VLM semantics at zero cost and an auto-regressive BEV world model that predicts future semantic maps from candidate actions to supply physical safety metrics for dual-reward GRPO
If this is right
- The method achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks.
- Safer driving results from cognitively informed scene compliance enforced by the physical reward.
- Flexible intent control becomes possible through user-defined language instructions via the cognitive channel.
- The physical reward derived from BEV rollouts directly enforces hard safety constraints during optimization.
- The cognitive reward from the language-aligned scorer maintains compliance with driving intent.
Where Pith is reading between the lines
- If the world-model predictions remain reliable across diverse weather and traffic densities, the approach could lower the volume of real-world miles needed for validation.
- The pluggable language interface could support regional or personal driving style preferences without retraining the core policy.
- Extending the same distillation step to other perception modules might cut inference costs in broader robotics applications.
- Pairing the dual-reward structure with multi-agent world models could address cooperative behaviors in dense traffic.
Load-bearing premise
The auto-regressive BEV world model produces future semantic maps accurate enough that safety metrics computed from its rollouts can be treated as reliable hard constraints.
What would settle it
Direct tests showing that collision or violation rates predicted by the BEV world model rollouts do not match observed outcomes in the NAVSIM simulator or real-world driving data would falsify the reliability of the physical safety constraints.
Figures
read the original abstract
Current end-to-end autonomous driving models are fundamentally constrained by the behavioral cloning ceiling of imitation learning. While reinforcement learning offers a path to smarter autonomy, it demands two missing pieces of infrastructure: (1) a cognitive foundation that understands traffic semantics and driving intent, and (2) a foresighted physical environment that can anticipate the consequences of candidate actions. To this end, we propose CoPhy, a CognitivePhysical reinforcement learning framework for autonomous driving. To distill to think, we distill VLM knowledge into the BEV encoder and then discard the VLM entirely, retaining cognitive ability at zero inference cost while releasing the cognitive channel as a pluggable interface for optional human language commands. To foresee to act, we build an auto-regressive BEV world model that explicitly predicts future semantic maps conditioned on candidate actions, serving as an interpretable physical sandbox from which safety metrics are directly derived. Built upon this dual infrastructure, we optimize the driving policy via GRPO with a novel dual-reward mechanism: a physical reward derived from BEV rollouts enforces hard safety constraints, while a cognitive reward from a language-aligned scorer ensures intent compliance. Extensive experiments demonstrate that CoPhy not only achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks, but also enables safer driving via cognitively informed scene compliance and flexible intent control through user-defined language instructions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CoPhy, a Cognitive-Physical RL framework for autonomous driving. It distills VLM knowledge into a BEV encoder (then discards the VLM) to retain cognitive understanding of traffic semantics and intent at zero inference cost, while constructing an auto-regressive BEV world model that predicts future semantic maps conditioned on candidate actions. The driving policy is optimized via GRPO using a dual-reward mechanism: physical rewards derived from BEV rollouts to enforce hard safety constraints, and cognitive rewards from a language-aligned scorer for intent compliance. The work claims state-of-the-art results on NAVSIM v1 and v2 benchmarks together with safer driving and flexible user-defined language control.
Significance. If the empirical claims hold and the world-model assumption is substantiated, the framework would offer a practical way to combine high-level cognitive reasoning with low-level physical foresight inside an RL loop, addressing the behavioral-cloning ceiling of imitation learning. The distillation step that preserves VLM-derived cognition without runtime cost and the pluggable language interface are concrete engineering strengths that could improve controllability and interpretability in safety-critical driving systems.
major comments (1)
- [§3.2] §3.2 (auto-regressive BEV world model): The central safety claim—that physical rewards derived from BEV rollouts enforce hard safety constraints—depends on the world model producing future semantic maps whose derived metrics (collision, off-road, etc.) remain faithful over the multi-step horizons used in GRPO. No per-step IoU, rollout-consistency, or ground-truth simulator alignment numbers are reported, so it is unclear whether compounding prediction errors allow the policy to exploit model artifacts rather than true dynamics.
minor comments (2)
- [Abstract] Abstract: The acronym GRPO is used without expansion; define it as Group Relative Policy Optimization on first use.
- [§3] Notation: The distinction between the distilled BEV encoder and the separate auto-regressive world model should be clarified with a single diagram or explicit equation reference to avoid reader confusion about which component supplies the cognitive versus physical channel.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and constructive feedback on our work. We address the major comment point-by-point below and have incorporated revisions to strengthen the manuscript's claims regarding the BEV world model.
read point-by-point responses
-
Referee: [§3.2] §3.2 (auto-regressive BEV world model): The central safety claim—that physical rewards derived from BEV rollouts enforce hard safety constraints—depends on the world model producing future semantic maps whose derived metrics (collision, off-road, etc.) remain faithful over the multi-step horizons used in GRPO. No per-step IoU, rollout-consistency, or ground-truth simulator alignment numbers are reported, so it is unclear whether compounding prediction errors allow the policy to exploit model artifacts rather than true dynamics.
Authors: We agree that explicit validation of the world model's predictive fidelity is important for substantiating the safety claims. While the original manuscript emphasized end-to-end NAVSIM results (which provide indirect evidence through policy performance), we acknowledge the value of direct metrics. In the revised manuscript, we have expanded §3.2 with a new evaluation subsection reporting per-step IoU for semantic map predictions, rollout consistency over the 5- and 10-step horizons used in GRPO, and alignment statistics against ground-truth simulator trajectories on a held-out validation set. These results indicate limited compounding error (IoU degradation <6% at 10 steps) and support that the policy optimizes against faithful dynamics. We have also added qualitative rollout visualizations and a brief discussion of remaining limitations. revision: yes
Circularity Check
No significant circularity; framework assembles independent components
full rationale
The paper constructs CoPhy by distilling VLM knowledge into a BEV encoder (then discarding the VLM), training an auto-regressive BEV world model to predict future semantic maps conditioned on actions, and optimizing a policy via GRPO using dual rewards defined directly from those models' outputs. The physical reward derives safety metrics from world-model rollouts and the cognitive reward uses a language-aligned scorer; neither is fitted post-hoc to the NAVSIM benchmark metrics nor reduces any claimed prediction to its inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the derivation chain. The SOTA claims and safer-driving results are presented as empirical outcomes of this assembly, making the overall derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GRPO with a novel dual-reward mechanism: a physical reward derived from BEV rollouts enforces hard safety constraints, while a cognitive reward from a language-aligned scorer
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Pseudo-simulation for autonomous driving
Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, et al. Pseudo-simulation for autonomous driving
-
[3]
End-to-end autonomous driving: Challenges and frontiers.TPAMI, 46(12):10164–10183, 2024
Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.TPAMI, 46(12):10164–10183, 2024
work page 2024
-
[4]
Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.ICLR, 2026
Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.ICLR, 2026
work page 2026
-
[5]
Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 45(11):12878–12895, 2022
work page 2022
-
[6]
OpenScene Contributors. Openscene: The largest up-to-date 3d occupancy prediction bench- mark in autonomous driving.https://github.com/OpenDriveLab/OpenScene, 2023
work page 2023
-
[7]
Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.NeurIPS, 37:28706–28719, 2024
work page 2024
-
[8]
Renju Feng, Ning Xi, Duanfeng Chu, Rukang Wang, Zejian Deng, Anzheng Wang, Liping Lu, Jinxiang Wang, and Yanjun Huang. Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving.RAL, 11(1):226–233, 2025
work page 2025
-
[9]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
work page 2022
-
[11]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InCVPR, pages 17853–17862, 2023
work page 2023
-
[12]
EMMA: End-to-End Multimodal Model for Autonomous Driving
Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, et al. Emma: End-to-end multimodal model for autonomous driving.arXiv preprint arXiv:2410.23262, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Anqing Jiang, Yu Gao, Yiru Wang, Zhigang Sun, Shuo Wang, Yuwen Heng, Hao Sun, Shichen Tang, Lijuan Zhu, Jinhao Chai, et al. Irl-vla: Training an vision-language-action policy via reward world model.arXiv preprint arXiv:2508.06571, 2025
-
[14]
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Vad: Vectorized scene representation for efficient autonomous driving
Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InICCV, pages 8340–8350, 2023
work page 2023
-
[16]
Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, and Xinggang Wang. Alphadrive: Unleashing the power of vlms in autonomous driving via reinforcement learning and reasoning.arXiv preprint arXiv:2503.07608, 2025. 10
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Adapt: Action-aware driving caption transformer
Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, and Jingjing Liu. Adapt: Action-aware driving caption transformer. InICRA, pages 7554–7561. IEEE, 2023
work page 2023
-
[18]
Derun Li, Jianwei Ren, Yue Wang, Xin Wen, Pengxiang Li, Leimeng Xu, Kun Zhan, Zhongpu Xia, Peng Jia, Xianpeng Lang, et al. Finetuning generative trajectory model with reinforcement learning from human feedback.arXiv e-prints, pages arXiv–2503, 2025
work page 2025
-
[19]
Kailin Li, Zhenxin Li, Shiyi Lan, Yuan Xie, Zhizhong Zhang, Jiayi Liu, Zuxuan Wu, Zhiding Yu, and Jose M Alvarez. Hydra-mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv preprint arXiv:2503.12820, 2025
-
[20]
Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, and Andreas Zell. Spacedrive: Infusing spatial awareness into vlm-based autonomous driving.arXiv preprint arXiv:2512.10719, 2, 2025
work page internal anchor Pith review arXiv 2025
-
[21]
Enhancing end-to-end autonomous driving with latent world model.ICLR, 2025
Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, and Tieniu Tan. Enhancing end-to-end autonomous driving with latent world model.ICLR, 2025
work page 2025
-
[22]
Drivevla-w0: World models amplify data scaling law in autonomous driving.ICLR, 2026
Yingyan Li, Shuyao Shang, Weisong Liu, Bing Zhan, Haochen Wang, Yuqi Wang, Yuntao Chen, Xiaoman Wang, Yasong An, Chufeng Tang, et al. Drivevla-w0: World models amplify data scaling law in autonomous driving.ICLR, 2026
work page 2026
-
[23]
End-to-end driving with online trajectory evaluation via bev world model
Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online trajectory evaluation via bev world model. InICCV, pages 27137–27146, 2025
work page 2025
-
[24]
Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.ICLR, 2026
Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.ICLR, 2026
work page 2026
-
[25]
Yongkang Li, Lijun Zhou, Sixu Yan, Bencheng Liao, Tianyi Yan, Kaixin Xiong, Long Chen, Hongwei Xie, Bing Wang, Guang Chen, et al. Unidrivevla: Unifying understanding, perception, and action planning for autonomous driving.arXiv preprint arXiv:2604.02190, 2026
-
[26]
Drive-r1: Bridging reasoning and planning in vlms for autonomous driving with reinforcement learning
Yue Li, Meng Tian, Dechang Zhu, Jiangtong Zhu, Zhenyu Lin, Zhiwei Xiong, and Xinhai Zhao. Drive-r1: Bridging reasoning and planning in vlms for autonomous driving with reinforcement learning. InAAAI, volume 40, pages 6708–6716, 2026
work page 2026
-
[27]
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving
Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InCVPR, pages 12037–12047, 2025
work page 2025
-
[29]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InICCV, pages 2980–2988, 2017
work page 2017
-
[30]
Wei Liu, Jiyuan Zhang, Binxiong Zheng, Yufeng Hu, Yingzhan Lin, and Zengfeng Zeng. X-driver: Explainable autonomous driving with vision-language models.arXiv preprint arXiv:2505.05098, 2025
-
[31]
Sgdr: Stochastic gradient descent with warm restarts
Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017
work page 2017
-
[32]
Decoupled weight decay regularization.ICLR, 2018
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.ICLR, 2018
work page 2018
-
[33]
Yuechen Luo, Fang Li, Shaoqing Xu, Zhiyi Lai, Lei Yang, Qimao Chen, Ziang Luo, Zixun Xie, Shengyin Jiang, Jiaxin Liu, et al. Adathinkdrive: Adaptive thinking via reinforcement learning for autonomous driving.arXiv preprint arXiv:2509.13769, 2025
-
[34]
Drama: Joint risk localization and captioning in driving
Srikanth Malla, Chiho Choi, Isht Dwivedi, Joon Hee Choi, and Jiachen Li. Drama: Joint risk localization and captioning in driving. InWACV, pages 1043–1052, 2023. 11
work page 2023
-
[35]
Lingoqa: Visual question answering for autonomous driving
Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, et al. Lingoqa: Visual question answering for autonomous driving. InECCV, pages 252–269. Springer, 2024
work page 2024
-
[36]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
Vlaad: Vision and language assistant for autonomous driving
SungYeon Park, MinJae Lee, JiHyuk Kang, Hahyeon Choi, Yoonah Park, Juhwan Cho, Adam Lee, and DongKyu Kim. Vlaad: Vision and language assistant for autonomous driving. In CVPR, pages 980–987, 2024
work page 2024
-
[38]
Multi-modal fusion transformer for end-to-end autonomous driving
Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi-modal fusion transformer for end-to-end autonomous driving. InCVPR, pages 7077–7087, 2021
work page 2021
-
[39]
Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario
Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, and Yu-Gang Jiang. Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario. InAAAI, volume 38, pages 4542–4550, 2024
work page 2024
-
[40]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICLR, pages 8748–8763. PMLR, 2021
work page 2021
-
[41]
Simlingo: Vision-only closed-loop autonomous driving with language-action alignment
Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InCVPR, pages 11993–12003, 2025
work page 2025
-
[42]
Luke Rowe, Rodrigue de Schaetzen, Roger Girgis, Christopher Pal, and Liam Paull. Poutine: Vision-language-trajectory pre-training and reinforcement learning post-training enable robust end-to-end autonomous driving.arXiv preprint arXiv:2506.11234, 2025
-
[43]
Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.NeurIPS, 2025
Shuyao Shang, Yuntao Chen, Yuqi Wang, Yingyan Li, and Zhaoxiang Zhang. Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.NeurIPS, 2025
work page 2025
-
[44]
Lmdrive: Closed-loop end-to-end driving with large language models
Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InCVPR, pages 15120–15130, 2024
work page 2024
-
[45]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[46]
ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving
Zihao Sheng, Xin Ye, Jingru Luo, Sikai Chen, and Liu Ren. Explorevla: Dense world modeling and exploration for end-to-end autonomous driving.arXiv preprint arXiv:2604.02714, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[47]
Drivelm: Driving with graph visual question answering
Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InECCV, pages 256–274. Springer, 2024
work page 2024
-
[48]
Chen Tang, Ben Abbatematteo, Jiaheng Hu, Rohan Chandra, Roberto Martín-Martín, and Peter Stone. Deep reinforcement learning for robotics: A survey of real-world successes.Annual Review of Control, Robotics, and Autonomous Systems, 8(1):153–188, 2025
work page 2025
-
[49]
Visualizing data using t-sne.JMLR, 9(11), 2008
Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.JMLR, 9(11), 2008
work page 2008
-
[50]
Attention is all you need.NeurIPS, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017
work page 2017
-
[51]
Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Dia- mond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv preprint arXiv:2511.00088, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Para-drive: Paral- lelized architecture for real-time autonomous driving
Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Paral- lelized architecture for real-time autonomous driving. InCVPR, pages 15449–15458, 2024. 12
work page 2024
-
[53]
Drivelaw: Unifying planning and video generation in a latent driving world.CVPR, 2026
Tianze Xia, Yongkang Li, Lijun Zhou, Jingfeng Yao, Kaixin Xiong, Haiyang Sun, Bing Wang, Kun Ma, Guang Chen, Hangjun Ye, et al. Drivelaw: Unifying planning and video generation in a latent driving world.CVPR, 2026
work page 2026
-
[54]
Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. InCVPR, pages 1602–1611, 2025
work page 2025
-
[55]
Mingwang Xu, Jiahao Cui, Feipeng Cai, Hanlin Shang, Zhihao Zhu, Shan Luan, Yifang Xu, Neng Zhang, Yaoyi Li, Jia Cai, et al. Wam-diff: A masked diffusion vla framework with moe and online reinforcement learning for autonomous driving.arXiv preprint arXiv:2512.11872, 2025
-
[56]
Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.RAL, 9(10):8186–8193, 2024
work page 2024
-
[57]
Tianyi Yan, Tao Tang, Xingtai Gui, Yongkang Li, Jiasen Zhesng, Weiyao Huang, Lingdong Kong, Wencheng Han, Xia Zhou, Xueyang Zhang, et al. Ad-r1: Closed-loop reinforcement learning for end-to-end autonomous driving with impartial world models.arXiv preprint arXiv:2511.20325, 2025
-
[58]
Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, and Junchi Yan. Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025
-
[59]
Drivesuprim: Towards precise trajectory selection for end-to-end planning
Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InAAAI, volume 40, pages 11910–11918, 2026
work page 2026
-
[60]
Chengran Yuan, Zhanqi Zhang, Jiawei Sun, Shuo Sun, Zefan Huang, Christina Dao Wen Lee, Dongen Li, Yuhang Han, Anthony Wong, Keng Peng Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024
-
[61]
Epona: Autoregressive diffusion world model for autonomous driving
Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, et al. Epona: Autoregressive diffusion world model for autonomous driving. InICCV, pages 27220–27230, 2025
work page 2025
-
[62]
Jingyuan Zhao, Yuyan Wu, Rui Deng, Susu Xu, Jinpeng Gao, and Andrew Burke. A survey of autonomous driving from a deep learning perspective.ACM Computing Surveys, 57(10):1–60, 2025
work page 2025
-
[63]
Genad: Genera- tive end-to-end autonomous driving
Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, and Long Chen. Genad: Genera- tive end-to-end autonomous driving. InECCV, pages 87–104. Springer, 2024
work page 2024
-
[64]
World4drive: End-to-end autonomous driving via intention-aware physical latent world model
Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InICCV, pages 28632–28642, 2025
work page 2025
-
[65]
Resad: Normalized residual trajectory modeling for end-to-end autonomous driving.CVPR, 2026
Zhiyu Zheng, Shaoyu Chen, Haoran Yin, Xinbang Zhang, Jialv Zou, Xinggang Wang, Qian Zhang, and Lefei Zhang. Resad: Normalized residual trajectory modeling for end-to-end autonomous driving.CVPR, 2026
work page 2026
-
[66]
Open- drivevla: Towards end-to-end autonomous driving with large vision language action model
Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, V olker Tresp, and Alois Knoll. Open- drivevla: Towards end-to-end autonomous driving with large vision language action model. In AAAI, volume 40, pages 13782–13790, 2026
work page 2026
-
[67]
Zewei Zhou, Tianhui Cai, Seth Z Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.NeurIPS, 2025
work page 2025
-
[68]
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models.arXiv preprint arXiv:2504.10479, 2025. 13
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.