Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

Jian Yang; Jin Xie; Qiang Meng; Yang Wu; Youquan Liu; Zhaojiang Liu

arxiv: 2605.21139 · v2 · pith:2F7C5IPDnew · submitted 2026-05-20 · 💻 cs.CV · cs.LG

Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving

Yang Wu , Qiang Meng , Zhaojiang Liu , Youquan Liu , Jian Yang , Jin Xie This is my paper

Pith reviewed 2026-05-25 05:50 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords autonomous drivingreinforcement learningBEV world modelvision-language modelscognitive-physical frameworkNAVSIM benchmarkintent controlsafety constraints

0 comments

The pith

CoPhy distills VLM knowledge into a BEV encoder and pairs it with an auto-regressive BEV world model to optimize driving policies via dual-reward reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a Cognitive-Physical reinforcement learning framework called CoPhy that first distills visual-language model knowledge into a bird's-eye-view encoder to retain cognitive understanding of traffic semantics and driving intent at zero inference cost. It then builds an auto-regressive BEV world model that predicts future semantic maps conditioned on candidate actions, creating an interpretable physical sandbox for deriving safety metrics. These two components support GRPO optimization with a physical reward that enforces hard safety constraints from the rollouts and a cognitive reward from a language-aligned scorer that ensures intent compliance. A sympathetic reader would care because the approach claims to surpass the behavioral cloning ceiling of imitation learning while enabling safer driving and flexible control through optional user language instructions.

Core claim

CoPhy achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks by distilling VLM knowledge into the BEV encoder for cognitive ability, building an auto-regressive BEV world model to foresee action consequences as an interpretable physical sandbox, and optimizing the policy with GRPO under a dual-reward mechanism where physical rewards from BEV rollouts enforce safety and cognitive rewards from language alignment ensure intent compliance, all while releasing the cognitive channel for optional human language commands.

What carries the argument

The auto-regressive BEV world model that explicitly predicts future semantic maps conditioned on candidate actions, serving as the interpretable physical sandbox from which safety metrics are directly derived, together with the distilled cognitive channel in the BEV encoder.

If this is right

Driving policies reach state-of-the-art performance on NAVSIM v1 and v2 benchmarks.
Physical rewards derived from BEV rollouts enforce hard safety constraints during optimization.
Cognitive rewards from the language-aligned scorer ensure compliance with driving intent.
The distilled cognitive channel supports flexible intent control through user-defined language instructions at no extra inference cost.
Cognitively informed scene compliance produces safer driving behavior overall.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of distilled cognition and predictive physics could allow the same infrastructure to support other sequential decision tasks beyond driving.
Making the world model outputs directly inspectable may simplify safety audits and regulatory review of learned policies.
User language instructions could extend to multi-agent coordination scenarios if the cognitive channel is shared across vehicles.
The dual-reward structure might be tested for transfer to simulation environments with different sensor modalities.

Load-bearing premise

The auto-regressive BEV world model can accurately predict future semantic maps conditioned on candidate actions.

What would settle it

A direct comparison showing that the predicted future semantic maps from the BEV world model diverge significantly from actual observed maps in held-out driving sequences would falsify the reliability of the physical sandbox for safety metric derivation.

Figures

Figures reproduced from arXiv: 2605.21139 by Jian Yang, Jin Xie, Qiang Meng, Yang Wu, Youquan Liu, Zhaojiang Liu.

**Figure 1.** Figure 1: (a) Previous methods isolate cognitive and physical reasoning, leading to semantic failures like ignoring a STOP sign or spatial violations like halting on a crosswalk. (b) In contrast, CoPhy respects traffic semantics and maintains lane discipline, ensuring safe and cognitive-aligned driving. collisions and lane violations. Similar to human drivers who mentally simulate outcomes before acting, an autonomo… view at source ↗

**Figure 2.** Figure 2: Overview of CoPhy. Multi-modal data are encoded into BEV state [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Hierarchical trajectory selection. Candidates passing the cognitive threshold [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons of trajectories before and after optimization. The dual-reward [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparisons with DiffusionDrive [28] and WoTE [23]. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: t-SNE [49] of distillation. Human command: I‘m in a very urgent situation, accelerate and run the red light! Original trajectory Human-intent trajectory Human command: Stop following slowly, do not tailgate. Change lanes to an open lane [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Current end-to-end autonomous driving models are fundamentally constrained by the behavioral cloning ceiling of imitation learning. While reinforcement learning offers a path to smarter autonomy, it demands two missing pieces of infrastructure: (1) a cognitive foundation that understands traffic semantics and driving intent, and (2) a foresighted physical environment that can anticipate the consequences of candidate actions. To this end, we propose CoPhy, a CognitivePhysical reinforcement learning framework for autonomous driving. To distill to think, we distill VLM knowledge into the BEV encoder and then discard the VLM entirely, retaining cognitive ability at zero inference cost while releasing the cognitive channel as a pluggable interface for optional human language commands. To foresee to act, we build an auto-regressive BEV world model that explicitly predicts future semantic maps conditioned on candidate actions, serving as an interpretable physical sandbox from which safety metrics are directly derived. Built upon this dual infrastructure, we optimize the driving policy via GRPO with a novel dual-reward mechanism: a physical reward derived from BEV rollouts enforces hard safety constraints, while a cognitive reward from a language-aligned scorer ensures intent compliance. Extensive experiments demonstrate that CoPhy not only achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks, but also enables safer driving via cognitively informed scene compliance and flexible intent control through user-defined language instructions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoPhy sketches a clean distillation-plus-world-model RL setup for driving but the abstract gives no metrics or ablations, so the SOTA and safety claims stay uncheckable from the text supplied.

read the letter

The paper's main move is to distill a VLM into the BEV encoder, drop the VLM at inference, keep the cognitive channel open for language commands, then train a policy with GRPO on a physical reward from an auto-regressive BEV world model plus a cognitive reward from a language scorer. That specific stack of distillation, action-conditioned semantic-map prediction, and dual-reward GRPO is a new combination even if each piece has earlier work behind it. The write-up does a straightforward job explaining why the distillation step keeps inference cost low and why the world model can act as an explicit sandbox for deriving collision and rule penalties. Those are practical engineering points worth noting. The obvious gap is that the abstract states SOTA on NAVSIM v1 and v2 plus safer scene compliance but shows none of the numbers, baselines, ablations, or multi-step prediction accuracy that would let anyone verify the claims. The stress-test concern about error accumulation in the auto-regressive model over 4-8 second horizons is exactly the kind of thing that needs quantitative checks (mIoU on future maps, correlation with closed-loop safety) before the physical reward can be treated as reliable. Without those, the dual-reward mechanism risks being circular. This is aimed at researchers already working on end-to-end driving and RL who want to see how cognitive and physical signals can be wired together without extra inference overhead. If the full paper contains the missing experiments and they hold up, the ideas are solid enough to deserve referee time; right now the abstract alone is too thin to judge. I'd send it out for review rather than desk-reject because the infrastructure design is worth testing even if the current evidence is missing.

Referee Report

3 major / 2 minor

Summary. The paper proposes CoPhy, a Cognitive-Physical reinforcement learning framework for autonomous driving. It distills VLM knowledge into a BEV encoder (discarding the VLM at inference) to retain cognitive ability and expose a pluggable language interface, builds an auto-regressive BEV world model that predicts future semantic maps conditioned on candidate actions to serve as an interpretable physical sandbox, and optimizes the policy via GRPO using a dual-reward mechanism (physical reward from BEV rollouts for hard safety constraints plus cognitive reward from a language-aligned scorer for intent compliance). The central claims are SOTA performance on NAVSIM v1/v2 plus safer, language-controllable driving.

Significance. If the world-model accuracy and dual-reward claims hold with supporting evidence, the work could meaningfully advance beyond behavioral cloning in end-to-end driving by supplying both cognitive semantics and foresighted physical evaluation inside the RL loop, with the pluggable cognitive channel offering a practical route to user-specified intent. The explicit derivation of safety metrics from interpretable rollouts is a potentially valuable direction if quantitatively validated.

major comments (3)

[Abstract] Abstract: the claim that CoPhy 'achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks' is unsupported by any numerical metrics, baseline comparisons, ablation tables, or error analysis in the provided text, rendering the primary performance assertion unverifiable.
[Abstract] Abstract: the auto-regressive BEV world model is presented as producing sufficiently accurate future semantic maps to derive reliable safety metrics and enforce 'hard safety constraints,' yet the manuscript supplies no multi-step prediction metrics (mIoU, instance-level error, or closed-loop safety correlation) over the 4–8 s horizons relevant to safety evaluation; this assumption is load-bearing for the physical-reward component.
[Abstract] Abstract: the dual-reward mechanism (physical reward from BEV rollouts + cognitive reward from language-aligned scorer) and its integration with GRPO are described only at the level of high-level prose with no equations, reward formulations, or training details, preventing assessment of whether the rewards are genuinely new or risk circularity.

minor comments (2)

[Abstract] Acronyms (VLM, BEV, GRPO, NAVSIM) are not defined on first use.
[Abstract] The mapping between the rhetorical phrases 'distill to think' and 'foresee to act' and the concrete technical modules could be stated more explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that several central claims require explicit supporting evidence within the abstract itself to be verifiable from the provided text. We will revise the abstract to incorporate key numerical results, prediction metrics, and reward formulations drawn from the full manuscript. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that CoPhy 'achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks' is unsupported by any numerical metrics, baseline comparisons, ablation tables, or error analysis in the provided text, rendering the primary performance assertion unverifiable.

Authors: We acknowledge that the abstract, as currently written, does not contain the supporting numerical evidence. The full manuscript reports these results in the experiments section, including direct comparisons and ablations on NAVSIM v1/v2. To address the concern directly, we will revise the abstract to include the primary SOTA metrics and improvement margins over baselines. revision: yes
Referee: [Abstract] Abstract: the auto-regressive BEV world model is presented as producing sufficiently accurate future semantic maps to derive reliable safety metrics and enforce 'hard safety constraints,' yet the manuscript supplies no multi-step prediction metrics (mIoU, instance-level error, or closed-loop safety correlation) over the 4–8 s horizons relevant to safety evaluation; this assumption is load-bearing for the physical-reward component.

Authors: This observation is correct for the abstract text. The manuscript contains the requested multi-step metrics and safety correlations in the world-model evaluation subsection. We will add a concise statement of these metrics (mIoU and horizon-specific accuracy) to the revised abstract to substantiate the physical-reward claims. revision: yes
Referee: [Abstract] Abstract: the dual-reward mechanism (physical reward from BEV rollouts + cognitive reward from language-aligned scorer) and its integration with GRPO are described only at the level of high-level prose with no equations, reward formulations, or training details, preventing assessment of whether the rewards are genuinely new or risk circularity.

Authors: We agree that the abstract provides only a prose description. The methods section supplies the full reward equations, GRPO integration, and training details. We will incorporate the core reward formulations into the revised abstract to allow assessment of novelty and avoid any appearance of circularity. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external benchmark results without self-referential reductions

full rationale

The provided abstract and text describe a framework with a distilled BEV encoder, auto-regressive world model for semantic map prediction, and dual-reward GRPO optimization. No equations, parameter-fitting procedures, or derivation steps are shown that reduce a claimed prediction or result to its own inputs by construction. Performance is asserted via SOTA on NAVSIM v1/v2 benchmarks, which are external. The world model and rewards are presented as infrastructure components without evidence of self-definition or fitted-input renaming. This is the common case of a self-contained empirical claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework rests on the unverified effectiveness of VLM distillation for retaining cognitive ability and on the predictive fidelity of the new BEV world model; both lack independent evidence in the abstract.

axioms (1)

domain assumption Vision-language models contain transferable cognitive knowledge about traffic semantics and driving intent that can be distilled into a BEV encoder without loss of utility.
Invoked to justify discarding the VLM after distillation while retaining cognitive ability.

invented entities (2)

Auto-regressive BEV world model no independent evidence
purpose: Predict future semantic maps conditioned on candidate actions to derive safety metrics
New component introduced to serve as physical sandbox; no external validation mentioned.
Cognitive channel as pluggable interface no independent evidence
purpose: Enable optional human language commands after distillation
Introduced as byproduct of the distillation process.

pith-pipeline@v0.9.0 · 5787 in / 1463 out tokens · 52880 ms · 2026-05-25T05:50:17.337906+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 12 internal anchors

[1]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

Pseudo-simulation for autonomous driving

Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, et al. Pseudo-simulation for autonomous driving

work page
[3]

End-to-end autonomous driving: Challenges and frontiers.TPAMI, 46(12):10164–10183, 2024

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.TPAMI, 46(12):10164–10183, 2024

work page 2024
[4]

Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.ICLR, 2026

Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.ICLR, 2026

work page 2026
[5]

Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 45(11):12878–12895, 2022

Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 45(11):12878–12895, 2022

work page 2022
[6]

Openscene: The largest up-to-date 3d occupancy prediction bench- mark in autonomous driving.https://github.com/OpenDriveLab/OpenScene, 2023

OpenScene Contributors. Openscene: The largest up-to-date 3d occupancy prediction bench- mark in autonomous driving.https://github.com/OpenDriveLab/OpenScene, 2023

work page 2023
[7]

Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.NeurIPS, 37:28706–28719, 2024

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.NeurIPS, 37:28706–28719, 2024

work page 2024
[8]

Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving.RAL, 11(1):226–233, 2025

Renju Feng, Ning Xi, Duanfeng Chu, Rukang Wang, Zejian Deng, Anzheng Wang, Liping Lu, Jinxiang Wang, and Yanjun Huang. Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving.RAL, 11(1):226–233, 2025

work page 2025
[9]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022
[11]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InCVPR, pages 17853–17862, 2023

work page 2023
[12]

EMMA: End-to-End Multimodal Model for Autonomous Driving

Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, et al. Emma: End-to-end multimodal model for autonomous driving.arXiv preprint arXiv:2410.23262, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Irl-vla: Training an vision-language-action policy via reward world model.arXiv preprint arXiv:2508.06571, 2025

Anqing Jiang, Yu Gao, Yiru Wang, Zhigang Sun, Shuo Wang, Yuwen Heng, Hao Sun, Shichen Tang, Lijuan Zhu, Jinhao Chai, et al. Irl-vla: Training an vision-language-action policy via reward world model.arXiv preprint arXiv:2508.06571, 2025

work page arXiv 2025
[14]

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InICCV, pages 8340–8350, 2023

work page 2023
[16]

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, and Xinggang Wang. Alphadrive: Unleashing the power of vlms in autonomous driving via reinforcement learning and reasoning.arXiv preprint arXiv:2503.07608, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Adapt: Action-aware driving caption transformer

Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, and Jingjing Liu. Adapt: Action-aware driving caption transformer. InICRA, pages 7554–7561. IEEE, 2023

work page 2023
[18]

Finetuning generative trajectory model with reinforcement learning from human feedback.arXiv e-prints, pages arXiv–2503, 2025

Derun Li, Jianwei Ren, Yue Wang, Xin Wen, Pengxiang Li, Leimeng Xu, Kun Zhan, Zhongpu Xia, Peng Jia, Xianpeng Lang, et al. Finetuning generative trajectory model with reinforcement learning from human feedback.arXiv e-prints, pages arXiv–2503, 2025

work page 2025
[19]

Hydra-mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv preprint arXiv:2503.12820, 2025

Kailin Li, Zhenxin Li, Shiyi Lan, Yuan Xie, Zhizhong Zhang, Jiayi Liu, Zuxuan Wu, Zhiding Yu, and Jose M Alvarez. Hydra-mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv preprint arXiv:2503.12820, 2025

work page arXiv 2025
[20]

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, and Andreas Zell. Spacedrive: Infusing spatial awareness into vlm-based autonomous driving.arXiv preprint arXiv:2512.10719, 2, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[21]

Enhancing end-to-end autonomous driving with latent world model.ICLR, 2025

Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, and Tieniu Tan. Enhancing end-to-end autonomous driving with latent world model.ICLR, 2025

work page 2025
[22]

Drivevla-w0: World models amplify data scaling law in autonomous driving.ICLR, 2026

Yingyan Li, Shuyao Shang, Weisong Liu, Bing Zhan, Haochen Wang, Yuqi Wang, Yuntao Chen, Xiaoman Wang, Yasong An, Chufeng Tang, et al. Drivevla-w0: World models amplify data scaling law in autonomous driving.ICLR, 2026

work page 2026
[23]

End-to-end driving with online trajectory evaluation via bev world model

Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online trajectory evaluation via bev world model. InICCV, pages 27137–27146, 2025

work page 2025
[24]

Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.ICLR, 2026

Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.ICLR, 2026

work page 2026
[25]

Unidrivevla: Unifying understanding, perception, and action planning for autonomous driving.arXiv preprint arXiv:2604.02190, 2026

Yongkang Li, Lijun Zhou, Sixu Yan, Bencheng Liao, Tianyi Yan, Kaixin Xiong, Long Chen, Hongwei Xie, Bing Wang, Guang Chen, et al. Unidrivevla: Unifying understanding, perception, and action planning for autonomous driving.arXiv preprint arXiv:2604.02190, 2026

work page arXiv 2026
[26]

Drive-r1: Bridging reasoning and planning in vlms for autonomous driving with reinforcement learning

Yue Li, Meng Tian, Dechang Zhu, Jiangtong Zhu, Zhenyu Lin, Zhiwei Xiong, and Xinhai Zhao. Drive-r1: Bridging reasoning and planning in vlms for autonomous driving with reinforcement learning. InAAAI, volume 40, pages 6708–6716, 2026

work page 2026
[27]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving

Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InCVPR, pages 12037–12047, 2025

work page 2025
[29]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InICCV, pages 2980–2988, 2017

work page 2017
[30]

X-driver: Explainable autonomous driving with vision-language models.arXiv preprint arXiv:2505.05098, 2025

Wei Liu, Jiyuan Zhang, Binxiong Zheng, Yufeng Hu, Yingzhan Lin, and Zengfeng Zeng. X-driver: Explainable autonomous driving with vision-language models.arXiv preprint arXiv:2505.05098, 2025

work page arXiv 2025
[31]

Sgdr: Stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017

work page 2017
[32]

Decoupled weight decay regularization.ICLR, 2018

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.ICLR, 2018

work page 2018
[33]

Adathinkdrive: Adaptive thinking via reinforcement learning for autonomous driving.arXiv preprint arXiv:2509.13769, 2025

Yuechen Luo, Fang Li, Shaoqing Xu, Zhiyi Lai, Lei Yang, Qimao Chen, Ziang Luo, Zixun Xie, Shengyin Jiang, Jiaxin Liu, et al. Adathinkdrive: Adaptive thinking via reinforcement learning for autonomous driving.arXiv preprint arXiv:2509.13769, 2025

work page arXiv 2025
[34]

Drama: Joint risk localization and captioning in driving

Srikanth Malla, Chiho Choi, Isht Dwivedi, Joon Hee Choi, and Jiachen Li. Drama: Joint risk localization and captioning in driving. InWACV, pages 1043–1052, 2023. 11

work page 2023
[35]

Lingoqa: Visual question answering for autonomous driving

Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, et al. Lingoqa: Visual question answering for autonomous driving. InECCV, pages 252–269. Springer, 2024

work page 2024
[36]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

Vlaad: Vision and language assistant for autonomous driving

SungYeon Park, MinJae Lee, JiHyuk Kang, Hahyeon Choi, Yoonah Park, Juhwan Cho, Adam Lee, and DongKyu Kim. Vlaad: Vision and language assistant for autonomous driving. In CVPR, pages 980–987, 2024

work page 2024
[38]

Multi-modal fusion transformer for end-to-end autonomous driving

Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi-modal fusion transformer for end-to-end autonomous driving. InCVPR, pages 7077–7087, 2021

work page 2021
[39]

Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario

Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, and Yu-Gang Jiang. Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario. InAAAI, volume 38, pages 4542–4550, 2024

work page 2024
[40]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICLR, pages 8748–8763. PMLR, 2021

work page 2021
[41]

Simlingo: Vision-only closed-loop autonomous driving with language-action alignment

Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InCVPR, pages 11993–12003, 2025

work page 2025
[42]

Poutine: Vision-language-trajectory pre-training and reinforcement learning post-training enable robust end-to-end autonomous driving.arXiv preprint arXiv:2506.11234, 2025

Luke Rowe, Rodrigue de Schaetzen, Roger Girgis, Christopher Pal, and Liam Paull. Poutine: Vision-language-trajectory pre-training and reinforcement learning post-training enable robust end-to-end autonomous driving.arXiv preprint arXiv:2506.11234, 2025

work page arXiv 2025
[43]

Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.NeurIPS, 2025

Shuyao Shang, Yuntao Chen, Yuqi Wang, Yingyan Li, and Zhaoxiang Zhang. Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.NeurIPS, 2025

work page 2025
[44]

Lmdrive: Closed-loop end-to-end driving with large language models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InCVPR, pages 15120–15130, 2024

work page 2024
[45]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[46]

ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving

Zihao Sheng, Xin Ye, Jingru Luo, Sikai Chen, and Liu Ren. Explorevla: Dense world modeling and exploration for end-to-end autonomous driving.arXiv preprint arXiv:2604.02714, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[47]

Drivelm: Driving with graph visual question answering

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InECCV, pages 256–274. Springer, 2024

work page 2024
[48]

Deep reinforcement learning for robotics: A survey of real-world successes.Annual Review of Control, Robotics, and Autonomous Systems, 8(1):153–188, 2025

Chen Tang, Ben Abbatematteo, Jiaheng Hu, Rohan Chandra, Roberto Martín-Martín, and Peter Stone. Deep reinforcement learning for robotics: A survey of real-world successes.Annual Review of Control, Robotics, and Autonomous Systems, 8(1):153–188, 2025

work page 2025
[49]

Visualizing data using t-sne.JMLR, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.JMLR, 9(11), 2008

work page 2008
[50]

Attention is all you need.NeurIPS, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017

work page 2017
[51]

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Dia- mond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv preprint arXiv:2511.00088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

Para-drive: Paral- lelized architecture for real-time autonomous driving

Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Paral- lelized architecture for real-time autonomous driving. InCVPR, pages 15449–15458, 2024. 12

work page 2024
[53]

Drivelaw: Unifying planning and video generation in a latent driving world.CVPR, 2026

Tianze Xia, Yongkang Li, Lijun Zhou, Jingfeng Yao, Kaixin Xiong, Haiyang Sun, Bing Wang, Kun Ma, Guang Chen, Hangjun Ye, et al. Drivelaw: Unifying planning and video generation in a latent driving world.CVPR, 2026

work page 2026
[54]

Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving

Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. InCVPR, pages 1602–1611, 2025

work page 2025
[55]

Wam-diff: A masked diffusion vla framework with moe and online reinforcement learning for autonomous driving.arXiv preprint arXiv:2512.11872, 2025

Mingwang Xu, Jiahao Cui, Feipeng Cai, Hanlin Shang, Zhihao Zhu, Shan Luan, Yifang Xu, Neng Zhang, Yaoyi Li, Jia Cai, et al. Wam-diff: A masked diffusion vla framework with moe and online reinforcement learning for autonomous driving.arXiv preprint arXiv:2512.11872, 2025

work page arXiv 2025
[56]

Drivegpt4: Interpretable end-to-end autonomous driving via large language model.RAL, 9(10):8186–8193, 2024

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.RAL, 9(10):8186–8193, 2024

work page 2024
[57]

Ad-r1: Closed-loop reinforcement learning for end-to-end autonomous driving with impartial world models.arXiv preprint arXiv:2511.20325, 2025

Tianyi Yan, Tao Tang, Xingtai Gui, Yongkang Li, Jiasen Zhesng, Weiyao Huang, Lingdong Kong, Wencheng Han, Xia Zhou, Xueyang Zhang, et al. Ad-r1: Closed-loop reinforcement learning for end-to-end autonomous driving with impartial world models.arXiv preprint arXiv:2511.20325, 2025

work page arXiv 2025
[58]

Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025

Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, and Junchi Yan. Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025

work page arXiv 2025
[59]

Drivesuprim: Towards precise trajectory selection for end-to-end planning

Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InAAAI, volume 40, pages 11910–11918, 2026

work page 2026
[60]

Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024

Chengran Yuan, Zhanqi Zhang, Jiawei Sun, Shuo Sun, Zefan Huang, Christina Dao Wen Lee, Dongen Li, Yuhang Han, Anthony Wong, Keng Peng Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024

work page arXiv 2024
[61]

Epona: Autoregressive diffusion world model for autonomous driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, et al. Epona: Autoregressive diffusion world model for autonomous driving. InICCV, pages 27220–27230, 2025

work page 2025
[62]

A survey of autonomous driving from a deep learning perspective.ACM Computing Surveys, 57(10):1–60, 2025

Jingyuan Zhao, Yuyan Wu, Rui Deng, Susu Xu, Jinpeng Gao, and Andrew Burke. A survey of autonomous driving from a deep learning perspective.ACM Computing Surveys, 57(10):1–60, 2025

work page 2025
[63]

Genad: Genera- tive end-to-end autonomous driving

Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, and Long Chen. Genad: Genera- tive end-to-end autonomous driving. InECCV, pages 87–104. Springer, 2024

work page 2024
[64]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model

Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InICCV, pages 28632–28642, 2025

work page 2025
[65]

Resad: Normalized residual trajectory modeling for end-to-end autonomous driving.CVPR, 2026

Zhiyu Zheng, Shaoyu Chen, Haoran Yin, Xinbang Zhang, Jialv Zou, Xinggang Wang, Qian Zhang, and Lefei Zhang. Resad: Normalized residual trajectory modeling for end-to-end autonomous driving.CVPR, 2026

work page 2026
[66]

Open- drivevla: Towards end-to-end autonomous driving with large vision language action model

Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, V olker Tresp, and Alois Knoll. Open- drivevla: Towards end-to-end autonomous driving with large vision language action model. In AAAI, volume 40, pages 13782–13790, 2026

work page 2026
[67]

Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.NeurIPS, 2025

Zewei Zhou, Tianhui Cai, Seth Z Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.NeurIPS, 2025

work page 2025
[68]

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models.arXiv preprint arXiv:2504.10479, 2025. 13

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [2]

Pseudo-simulation for autonomous driving

Wei Cao, Marcel Hallgarten, Tianyu Li, Daniel Dauner, Xunjiang Gu, Caojun Wang, Yakov Miron, Marco Aiello, Hongyang Li, Igor Gilitschenski, et al. Pseudo-simulation for autonomous driving

work page

[3] [3]

End-to-end autonomous driving: Challenges and frontiers.TPAMI, 46(12):10164–10183, 2024

Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers.TPAMI, 46(12):10164–10183, 2024

work page 2024

[4] [4]

Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.ICLR, 2026

Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Vadv2: End-to-end vectorized autonomous driving via probabilistic planning.ICLR, 2026

work page 2026

[5] [5]

Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 45(11):12878–12895, 2022

Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.TPAMI, 45(11):12878–12895, 2022

work page 2022

[6] [6]

Openscene: The largest up-to-date 3d occupancy prediction bench- mark in autonomous driving.https://github.com/OpenDriveLab/OpenScene, 2023

OpenScene Contributors. Openscene: The largest up-to-date 3d occupancy prediction bench- mark in autonomous driving.https://github.com/OpenDriveLab/OpenScene, 2023

work page 2023

[7] [7]

Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.NeurIPS, 37:28706–28719, 2024

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.NeurIPS, 37:28706–28719, 2024

work page 2024

[8] [8]

Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving.RAL, 11(1):226–233, 2025

Renju Feng, Ning Xi, Duanfeng Chu, Rukang Wang, Zejian Deng, Anzheng Wang, Liping Lu, Jinxiang Wang, and Yanjun Huang. Artemis: Autoregressive end-to-end trajectory planning with mixture of experts for autonomous driving.RAL, 11(1):226–233, 2025

work page 2025

[9] [9]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022

[11] [11]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InCVPR, pages 17853–17862, 2023

work page 2023

[12] [12]

EMMA: End-to-End Multimodal Model for Autonomous Driving

Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, et al. Emma: End-to-end multimodal model for autonomous driving.arXiv preprint arXiv:2410.23262, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Irl-vla: Training an vision-language-action policy via reward world model.arXiv preprint arXiv:2508.06571, 2025

Anqing Jiang, Yu Gao, Yiru Wang, Zhigang Sun, Shuo Wang, Yuwen Heng, Hao Sun, Shichen Tang, Lijuan Zhu, Jinhao Chai, et al. Irl-vla: Training an vision-language-action policy via reward world model.arXiv preprint arXiv:2508.06571, 2025

work page arXiv 2025

[14] [14]

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, and Xinggang Wang. Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InICCV, pages 8340–8350, 2023

work page 2023

[16] [16]

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, and Xinggang Wang. Alphadrive: Unleashing the power of vlms in autonomous driving via reinforcement learning and reasoning.arXiv preprint arXiv:2503.07608, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Adapt: Action-aware driving caption transformer

Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, and Jingjing Liu. Adapt: Action-aware driving caption transformer. InICRA, pages 7554–7561. IEEE, 2023

work page 2023

[18] [18]

Finetuning generative trajectory model with reinforcement learning from human feedback.arXiv e-prints, pages arXiv–2503, 2025

Derun Li, Jianwei Ren, Yue Wang, Xin Wen, Pengxiang Li, Leimeng Xu, Kun Zhan, Zhongpu Xia, Peng Jia, Xianpeng Lang, et al. Finetuning generative trajectory model with reinforcement learning from human feedback.arXiv e-prints, pages arXiv–2503, 2025

work page 2025

[19] [19]

Hydra-mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv preprint arXiv:2503.12820, 2025

Kailin Li, Zhenxin Li, Shiyi Lan, Yuan Xie, Zhizhong Zhang, Jiayi Liu, Zuxuan Wu, Zhiding Yu, and Jose M Alvarez. Hydra-mdp++: Advancing end-to-end driving via expert-guided hydra-distillation.arXiv preprint arXiv:2503.12820, 2025

work page arXiv 2025

[20] [20]

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, and Andreas Zell. Spacedrive: Infusing spatial awareness into vlm-based autonomous driving.arXiv preprint arXiv:2512.10719, 2, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[21] [21]

Enhancing end-to-end autonomous driving with latent world model.ICLR, 2025

Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang, and Tieniu Tan. Enhancing end-to-end autonomous driving with latent world model.ICLR, 2025

work page 2025

[22] [22]

Drivevla-w0: World models amplify data scaling law in autonomous driving.ICLR, 2026

Yingyan Li, Shuyao Shang, Weisong Liu, Bing Zhan, Haochen Wang, Yuqi Wang, Yuntao Chen, Xiaoman Wang, Yasong An, Chufeng Tang, et al. Drivevla-w0: World models amplify data scaling law in autonomous driving.ICLR, 2026

work page 2026

[23] [23]

End-to-end driving with online trajectory evaluation via bev world model

Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, and Zhaoxiang Zhang. End-to-end driving with online trajectory evaluation via bev world model. InICCV, pages 27137–27146, 2025

work page 2025

[24] [24]

Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.ICLR, 2026

Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, et al. Recogdrive: A reinforced cognitive framework for end-to-end autonomous driving.ICLR, 2026

work page 2026

[25] [25]

Unidrivevla: Unifying understanding, perception, and action planning for autonomous driving.arXiv preprint arXiv:2604.02190, 2026

Yongkang Li, Lijun Zhou, Sixu Yan, Bencheng Liao, Tianyi Yan, Kaixin Xiong, Long Chen, Hongwei Xie, Bing Wang, Guang Chen, et al. Unidrivevla: Unifying understanding, perception, and action planning for autonomous driving.arXiv preprint arXiv:2604.02190, 2026

work page arXiv 2026

[26] [26]

Drive-r1: Bridging reasoning and planning in vlms for autonomous driving with reinforcement learning

Yue Li, Meng Tian, Dechang Zhu, Jiangtong Zhu, Zhenyu Lin, Zhiwei Xiong, and Xinhai Zhao. Drive-r1: Bridging reasoning and planning in vlms for autonomous driving with reinforcement learning. InAAAI, volume 40, pages 6708–6716, 2026

work page 2026

[27] [27]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving

Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InCVPR, pages 12037–12047, 2025

work page 2025

[29] [29]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InICCV, pages 2980–2988, 2017

work page 2017

[30] [30]

X-driver: Explainable autonomous driving with vision-language models.arXiv preprint arXiv:2505.05098, 2025

Wei Liu, Jiyuan Zhang, Binxiong Zheng, Yufeng Hu, Yingzhan Lin, and Zengfeng Zeng. X-driver: Explainable autonomous driving with vision-language models.arXiv preprint arXiv:2505.05098, 2025

work page arXiv 2025

[31] [31]

Sgdr: Stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. In ICLR, 2017

work page 2017

[32] [32]

Decoupled weight decay regularization.ICLR, 2018

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.ICLR, 2018

work page 2018

[33] [33]

Adathinkdrive: Adaptive thinking via reinforcement learning for autonomous driving.arXiv preprint arXiv:2509.13769, 2025

Yuechen Luo, Fang Li, Shaoqing Xu, Zhiyi Lai, Lei Yang, Qimao Chen, Ziang Luo, Zixun Xie, Shengyin Jiang, Jiaxin Liu, et al. Adathinkdrive: Adaptive thinking via reinforcement learning for autonomous driving.arXiv preprint arXiv:2509.13769, 2025

work page arXiv 2025

[34] [34]

Drama: Joint risk localization and captioning in driving

Srikanth Malla, Chiho Choi, Isht Dwivedi, Joon Hee Choi, and Jiachen Li. Drama: Joint risk localization and captioning in driving. InWACV, pages 1043–1052, 2023. 11

work page 2023

[35] [35]

Lingoqa: Visual question answering for autonomous driving

Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, et al. Lingoqa: Visual question answering for autonomous driving. InECCV, pages 252–269. Springer, 2024

work page 2024

[36] [36]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[37] [37]

Vlaad: Vision and language assistant for autonomous driving

SungYeon Park, MinJae Lee, JiHyuk Kang, Hahyeon Choi, Yoonah Park, Juhwan Cho, Adam Lee, and DongKyu Kim. Vlaad: Vision and language assistant for autonomous driving. In CVPR, pages 980–987, 2024

work page 2024

[38] [38]

Multi-modal fusion transformer for end-to-end autonomous driving

Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi-modal fusion transformer for end-to-end autonomous driving. InCVPR, pages 7077–7087, 2021

work page 2021

[39] [39]

Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario

Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, and Yu-Gang Jiang. Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario. InAAAI, volume 38, pages 4542–4550, 2024

work page 2024

[40] [40]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICLR, pages 8748–8763. PMLR, 2021

work page 2021

[41] [41]

Simlingo: Vision-only closed-loop autonomous driving with language-action alignment

Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InCVPR, pages 11993–12003, 2025

work page 2025

[42] [42]

Poutine: Vision-language-trajectory pre-training and reinforcement learning post-training enable robust end-to-end autonomous driving.arXiv preprint arXiv:2506.11234, 2025

Luke Rowe, Rodrigue de Schaetzen, Roger Girgis, Christopher Pal, and Liam Paull. Poutine: Vision-language-trajectory pre-training and reinforcement learning post-training enable robust end-to-end autonomous driving.arXiv preprint arXiv:2506.11234, 2025

work page arXiv 2025

[43] [43]

Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.NeurIPS, 2025

Shuyao Shang, Yuntao Chen, Yuqi Wang, Yingyan Li, and Zhaoxiang Zhang. Drivedpo: Policy learning via safety dpo for end-to-end autonomous driving.NeurIPS, 2025

work page 2025

[44] [44]

Lmdrive: Closed-loop end-to-end driving with large language models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InCVPR, pages 15120–15130, 2024

work page 2024

[45] [45]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[46] [46]

ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving

Zihao Sheng, Xin Ye, Jingru Luo, Sikai Chen, and Liu Ren. Explorevla: Dense world modeling and exploration for end-to-end autonomous driving.arXiv preprint arXiv:2604.02714, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[47] [47]

Drivelm: Driving with graph visual question answering

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InECCV, pages 256–274. Springer, 2024

work page 2024

[48] [48]

Deep reinforcement learning for robotics: A survey of real-world successes.Annual Review of Control, Robotics, and Autonomous Systems, 8(1):153–188, 2025

Chen Tang, Ben Abbatematteo, Jiaheng Hu, Rohan Chandra, Roberto Martín-Martín, and Peter Stone. Deep reinforcement learning for robotics: A survey of real-world successes.Annual Review of Control, Robotics, and Autonomous Systems, 8(1):153–188, 2025

work page 2025

[49] [49]

Visualizing data using t-sne.JMLR, 9(11), 2008

Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.JMLR, 9(11), 2008

work page 2008

[50] [50]

Attention is all you need.NeurIPS, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017

work page 2017

[51] [51]

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Dia- mond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv preprint arXiv:2511.00088, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[52] [52]

Para-drive: Paral- lelized architecture for real-time autonomous driving

Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Paral- lelized architecture for real-time autonomous driving. InCVPR, pages 15449–15458, 2024. 12

work page 2024

[53] [53]

Drivelaw: Unifying planning and video generation in a latent driving world.CVPR, 2026

Tianze Xia, Yongkang Li, Lijun Zhou, Jingfeng Yao, Kaixin Xiong, Haiyang Sun, Bing Wang, Kun Ma, Guang Chen, Hangjun Ye, et al. Drivelaw: Unifying planning and video generation in a latent driving world.CVPR, 2026

work page 2026

[54] [54]

Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving

Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end-to-end autonomous driving. InCVPR, pages 1602–1611, 2025

work page 2025

[55] [55]

Wam-diff: A masked diffusion vla framework with moe and online reinforcement learning for autonomous driving.arXiv preprint arXiv:2512.11872, 2025

Mingwang Xu, Jiahao Cui, Feipeng Cai, Hanlin Shang, Zhihao Zhu, Shan Luan, Yifang Xu, Neng Zhang, Yaoyi Li, Jia Cai, et al. Wam-diff: A masked diffusion vla framework with moe and online reinforcement learning for autonomous driving.arXiv preprint arXiv:2512.11872, 2025

work page arXiv 2025

[56] [56]

Drivegpt4: Interpretable end-to-end autonomous driving via large language model.RAL, 9(10):8186–8193, 2024

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.RAL, 9(10):8186–8193, 2024

work page 2024

[57] [57]

Ad-r1: Closed-loop reinforcement learning for end-to-end autonomous driving with impartial world models.arXiv preprint arXiv:2511.20325, 2025

Tianyi Yan, Tao Tang, Xingtai Gui, Yongkang Li, Jiasen Zhesng, Weiyao Huang, Lingdong Kong, Wencheng Han, Xia Zhou, Xueyang Zhang, et al. Ad-r1: Closed-loop reinforcement learning for end-to-end autonomous driving with impartial world models.arXiv preprint arXiv:2511.20325, 2025

work page arXiv 2025

[58] [58]

Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025

Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, and Junchi Yan. Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025

work page arXiv 2025

[59] [59]

Drivesuprim: Towards precise trajectory selection for end-to-end planning

Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InAAAI, volume 40, pages 11910–11918, 2026

work page 2026

[60] [60]

Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024

Chengran Yuan, Zhanqi Zhang, Jiawei Sun, Shuo Sun, Zefan Huang, Christina Dao Wen Lee, Dongen Li, Yuhang Han, Anthony Wong, Keng Peng Tee, et al. Drama: An efficient end-to-end motion planner for autonomous driving with mamba.arXiv preprint arXiv:2408.03601, 2024

work page arXiv 2024

[61] [61]

Epona: Autoregressive diffusion world model for autonomous driving

Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu, Xingang Pan, Xiaoyang Guo, Yuan Liu, Jingwei Huang, Li Yuan, Qian Zhang, Xiao-Xiao Long, et al. Epona: Autoregressive diffusion world model for autonomous driving. InICCV, pages 27220–27230, 2025

work page 2025

[62] [62]

A survey of autonomous driving from a deep learning perspective.ACM Computing Surveys, 57(10):1–60, 2025

Jingyuan Zhao, Yuyan Wu, Rui Deng, Susu Xu, Jinpeng Gao, and Andrew Burke. A survey of autonomous driving from a deep learning perspective.ACM Computing Surveys, 57(10):1–60, 2025

work page 2025

[63] [63]

Genad: Genera- tive end-to-end autonomous driving

Wenzhao Zheng, Ruiqi Song, Xianda Guo, Chenming Zhang, and Long Chen. Genad: Genera- tive end-to-end autonomous driving. InECCV, pages 87–104. Springer, 2024

work page 2024

[64] [64]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model

Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InICCV, pages 28632–28642, 2025

work page 2025

[65] [65]

Resad: Normalized residual trajectory modeling for end-to-end autonomous driving.CVPR, 2026

Zhiyu Zheng, Shaoyu Chen, Haoran Yin, Xinbang Zhang, Jialv Zou, Xinggang Wang, Qian Zhang, and Lefei Zhang. Resad: Normalized residual trajectory modeling for end-to-end autonomous driving.CVPR, 2026

work page 2026

[66] [66]

Open- drivevla: Towards end-to-end autonomous driving with large vision language action model

Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, V olker Tresp, and Alois Knoll. Open- drivevla: Towards end-to-end autonomous driving with large vision language action model. In AAAI, volume 40, pages 13782–13790, 2026

work page 2026

[67] [67]

Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.NeurIPS, 2025

Zewei Zhou, Tianhui Cai, Seth Z Zhao, Yun Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.NeurIPS, 2025

work page 2025

[68] [68]

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models.arXiv preprint arXiv:2504.10479, 2025. 13

work page internal anchor Pith review Pith/arXiv arXiv 2025