LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving

Jun Ma; Mingxing Peng; Pei Liu; Ruiguo Zhong; Rui Yang; Ruoyu Yao

arxiv: 2606.08470 · v1 · pith:G3SKBPT6new · submitted 2026-06-07 · 💻 cs.RO

LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving

Ruoyu Yao , Pei Liu , Ruiguo Zhong , Mingxing Peng , Rui Yang , Jun Ma This is my paper

Pith reviewed 2026-06-27 18:38 UTC · model grok-4.3

classification 💻 cs.RO

keywords autonomous drivinglanguage modelslifelong learninguncertainty awarenesslightweight modelsdecision makingnuPlan benchmarkmulti-agent system

0 comments

The pith

LUNA-AD distills multi-agent reasoning into a lightweight dual-head model that learns continuously for better autonomous driving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that large language models can overcome their computational and static limitations in safety-critical driving by using a tri-system design: a multi-agent setup creates diverse uncertainty-aware decision demonstrations, these are distilled into an efficient dual-head model for fast inference of both actions and explanations, and a reflection mechanism enables ongoing refinement through closed-loop feedback. A sympathetic reader would care because this could make reasoned, adaptable decision-making feasible in real-time vehicle control without the delays of full-scale models. If correct, it would mean knowledge-driven autonomous driving systems can achieve higher benchmark success while running much faster than prior approaches.

Core claim

LUNA-AD features a tri-system architecture that reconciles complex multimodal behavioral reasoning, efficient deployment, and continual refinement: a multi-agent analytical system generates uncertainty-aware decision-making demonstrations through diverse hypothesis exploration, a dual-head lightweight heuristic model is distilled to unify the inference of decision distributions and textual explanations, and a reflection-driven lifelong learning mechanism operates on multimodal decision outputs to preserve strategic diversity and refine candidate decisions and rationales via closed-loop feedback.

What carries the argument

The tri-system architecture of multi-agent demonstration generation, dual-head model distillation, and reflection-driven lifelong learning.

If this is right

LUNA-AD delivers state-of-the-art success rates on nuPlan in both non-reactive and reactive modes.
Inference latency drops substantially compared to other knowledge-driven autonomous driving frameworks.
The lifelong learning component supports ongoing refinement of decisions and rationales without static limitations.
Uncertainty awareness in the distilled model helps maintain strategic diversity in driving choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The distillation step could extend the same lightweight reasoning pattern to other real-time control tasks that need both speed and explanatory output.
Closed-loop reflection might allow faster adaptation to novel environments than retraining on fixed datasets alone.
If the uncertainty modeling transfers well, similar pipelines could address overconfidence problems in other deployed decision systems.

Load-bearing premise

The multi-agent analytical system generates high-quality uncertainty-aware decision-making demonstrations that can be distilled into a lightweight dual-head model while preserving performance and enabling effective lifelong refinement via closed-loop feedback.

What would settle it

Running the model on nuPlan benchmarks and measuring success rates below those of existing knowledge-driven frameworks or finding inference latency not drastically reduced would show the architecture does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2606.08470 by Jun Ma, Mingxing Peng, Pei Liu, Ruiguo Zhong, Rui Yang, Ruoyu Yao.

**Figure 1.** Figure 1: Our methodological architecture: a knowledge-driven tri-system decision-making framework, combined with a prompt manager and a decision-guided [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Generation of uncertainty-aware decision-making demonstrations by [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: An illustration of the dual-head lightweight architecture for decision [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: An illustration of the reflection mechanism. Upon the execution of a high-level decision in the closed-loop environment, the progress monitor performs [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: The camera system of the data collection vehicle. Six RGB cameras [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative demonstrations of our approach in four representative scenarios. For better visualization, the scores of different planning trajectories are [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Effects of test-time RAG few-shot count on the open-loop performance of the heuristic system, which includes the fitness of decision probability [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Representative cases from DriveLM-NuScenes for demonstrating the analytical system’s ability to perform multimodal decision reasoning with multi [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Representative cases from our in-house campus dataset for demonstrating the analytical system’s ability to perform multimodal decision reasoning [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

While large language models (LLMs) offer promising reasoning capabilities, their integration into safety-critical driving systems is hindered by limited reasoning diversity, high computational overhead, and static learning paradigms. To address these challenges, we propose LUNA-AD, a lightweight uncertainty-aware language model with lifelong learning for autonomous driving (AD). LUNA-AD features a tri-system architecture that reconciles complex multimodal behavioral reasoning, efficient deployment, and continual refinement. We design a multi-agent analytical system to generate uncertainty-aware decision-making demonstrations through diverse hypothesis exploration. A dual-head lightweight heuristic model is distilled to unify the inference of decision distributions and textual explanations while enabling efficient deployment. Furthermore, a reflection-driven lifelong learning mechanism operates on multimodal decision outputs and preserves strategic diversity, allowing for the refinement of candidate decisions and rationales via closed-loop feedback to enhance driving robustness. Extensive experiments on nuPlan benchmarks demonstrate that LUNA-AD achieves state-of-the-art success rates under both non-reactive and reactive modes, with drastically reduced inference latency compared to existing knowledge-driven AD frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LUNA-AD sketches a tri-system LLM setup for driving that mixes multi-agent demos, distillation, and reflection, but the abstract supplies no numbers or ablations to check whether the distilled model actually keeps the claimed performance and uncertainty properties.

read the letter

The paper's main move is to combine three pieces for LLM-based autonomous driving: a multi-agent system that explores hypotheses to create uncertainty-aware demonstrations, a dual-head lightweight model distilled from those demos, and a reflection loop that uses closed-loop feedback to keep refining decisions over time. That specific stack applied to nuPlan is presented as new.

The architecture itself is laid out clearly enough. It directly targets three real pain points—reasoning diversity, inference speed, and the inability to adapt after deployment—and tries to solve them in one pipeline rather than bolting fixes on separately. The lifelong-learning angle with multimodal outputs and strategic diversity preservation is a reasonable extension of existing continual-learning ideas to this setting.

The weakest part is the distillation step. The abstract says the lightweight model unifies decision distributions and explanations while cutting latency, yet it gives no comparison of success rates, calibration, or explanation quality between the multi-agent teacher and the student model on the same scenarios. Without that, the SOTA claim on non-reactive and reactive nuPlan modes rests on an unverified assumption. No metrics, baselines, ablations, or error bars appear in the provided text, so the performance assertions cannot be checked against the paper's own evidence.

This is for people already working on LLM planners for robotics who need concrete architecture sketches to try. A reader hunting for reproducible benchmarks or formally verified gains will find little to use right now.

If the full paper contains the missing quantitative checks and the experiments hold up under scrutiny, it is worth sending to referees; the idea is concrete enough to test even if the current write-up is thin on proof.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes LUNA-AD, a lightweight uncertainty-aware language model with lifelong learning for autonomous driving. It features a tri-system architecture: a multi-agent analytical system to generate uncertainty-aware decision-making demonstrations through diverse hypothesis exploration, a dual-head lightweight heuristic model distilled from it to unify inference of decision distributions and textual explanations, and a reflection-driven lifelong learning mechanism that refines decisions and rationales via closed-loop feedback on multimodal outputs. The paper claims that extensive experiments on nuPlan benchmarks demonstrate state-of-the-art success rates under both non-reactive and reactive modes, along with drastically reduced inference latency compared to existing knowledge-driven AD frameworks.

Significance. If the performance claims hold with proper substantiation, the work could meaningfully advance LLM integration into safety-critical autonomous driving by addressing computational overhead, limited reasoning diversity, and static learning through efficient distillation and continual refinement. The tri-system design offers a practical path toward deployable, uncertainty-aware systems that maintain strategic diversity.

major comments (2)

[Abstract] Abstract: The central claim of state-of-the-art success rates on nuPlan (non-reactive and reactive modes) with reduced latency is presented without any quantitative metrics, baseline comparisons, error bars, dataset details, or ablation studies, preventing verification of the performance improvements against the paper's own evidence.
[Distillation process (dual-head heuristic model)] Distillation process (dual-head heuristic model): The load-bearing assumption that high-quality uncertainty-aware demonstrations from the multi-agent analytical system survive distillation into the lightweight dual-head model (preserving both performance and uncertainty awareness) lacks any quantitative verification, such as success-rate deltas, calibration error, or explanation fidelity comparisons between teacher outputs and student model on identical scenarios.

minor comments (1)

[Abstract] Abstract: The phrase 'extensive experiments' is used without any numerical results or key statistics, which would allow immediate assessment of the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the distillation process. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of state-of-the-art success rates on nuPlan (non-reactive and reactive modes) with reduced latency is presented without any quantitative metrics, baseline comparisons, error bars, dataset details, or ablation studies, preventing verification of the performance improvements against the paper's own evidence.

Authors: We agree that the abstract would benefit from including key quantitative results to make the performance claims more immediately verifiable. The experimental section of the manuscript reports the specific success rates, baseline comparisons, and latency figures on nuPlan under both modes. We will revise the abstract to incorporate representative metrics (e.g., success-rate improvements and latency reductions) along with a brief reference to the evaluation protocol. revision: yes
Referee: [Distillation process (dual-head heuristic model)] Distillation process (dual-head heuristic model): The load-bearing assumption that high-quality uncertainty-aware demonstrations from the multi-agent analytical system survive distillation into the lightweight dual-head model (preserving both performance and uncertainty awareness) lacks any quantitative verification, such as success-rate deltas, calibration error, or explanation fidelity comparisons between teacher outputs and student model on identical scenarios.

Authors: This observation is correct. The manuscript describes the distillation procedure and its design goals but does not provide direct quantitative comparisons (such as success-rate deltas, calibration metrics, or explanation fidelity) between the teacher multi-agent outputs and the distilled dual-head student model on matched scenarios. We will add an ablation study with these comparisons in the revised manuscript to substantiate the preservation of performance and uncertainty awareness. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external benchmark experiments

full rationale

The paper presents an architecture (multi-agent demonstration generator, distilled dual-head model, reflection-driven lifelong learning) whose performance is asserted via nuPlan success-rate and latency measurements. No equations, fitted parameters, or self-citations are shown to reduce any claimed result to its own inputs by construction; the distillation step is described as a design choice whose fidelity is left to empirical verification rather than enforced by definition. The derivation chain therefore remains self-contained against the stated external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no information on free parameters, axioms, or invented entities is provided.

pith-pipeline@v0.9.1-grok · 5720 in / 1107 out tokens · 26471 ms · 2026-06-27T18:38:38.350345+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 14 linked inside Pith

[1]

Planning and decision- making for autonomous vehicles,

W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision- making for autonomous vehicles,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 187–210, 2018

2018
[2]

An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors,

P. Hang, C. Lv, C. Huang, J. Cai, Z. Hu, and Y . Xing, “An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors,”IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14458–14469, 2020

2020
[3]

Senna: Bridging large vision-language models and end-to-end autonomous driving,

B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Senna: Bridging large vision-language models and end-to-end autonomous driving,”arXiv preprint arXiv:2410.22313, 2024. 15 t = t0 − 2.0s t = t0 − 1.5s t = t0 − 1.0s t = t0 − 0.5s t = t0 (present) ####Recommended decisions to user:####DK,DR ####Explanations: The ego vehi...

Pith/arXiv arXiv 2024
[4]

Bilevel multi-armed bandit- based hierarchical reinforcement learning for interaction-aware self- driving at unsignalized intersections,

Z. Peng, Y . Wang, L. Zheng, and J. Ma, “Bilevel multi-armed bandit- based hierarchical reinforcement learning for interaction-aware self- driving at unsignalized intersections,”IEEE Transactions on Vehicular Technology, vol. 74, no. 6, pp. 8824–8838, 2025

2025
[5]

End-to-end autonomous driving: Challenges and frontiers,

L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10164– 10183, 2024

2024
[6]

Language models are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language models are few-shot learners,” inthe Proceedings of Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020

1901
[7]

Qwen3 technical report,

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv,et al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

Pith/arXiv arXiv 2025
[8]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou,et al., “Chain-of-thought prompting elicits reasoning in large language models,”the Proceedings of Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022

2022
[9]

GPT-Driver: Learning to drive with GPT,

J. Mao, Y . Qian, J. Ye, H. Zhao, and Y . Wang, “GPT-Driver: Learning to drive with GPT,”arXiv preprint arXiv:2310.01415, 2023

Pith/arXiv arXiv 2023
[10]

LMDrive: Closed-loop end-to-end driving with large language models,

H. Shao, Y . Hu, L. Wang, G. Song, S. L. Waslander, Y . Liu, and H. Li, “LMDrive: Closed-loop end-to-end driving with large language models,” inthe Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15120–15130, 2024

2024
[11]

DrivegGPT4: Interpretable end-to-end autonomous driving via large language model,

Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “DrivegGPT4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8186 – 8193, 2024

2024
[12]

DriveLM: Driving with graph visual question answering,

C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “DriveLM: Driving with graph visual question answering,” inthe Proceedings of European Conference on Computer Vision, pp. 256–274, 2024

2024
[13]

Self-consistency improves chain of thought reasoning in language models,

X. Wang, J. Wei, D. Schuurmans, Q. V . Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inthe Proceedings of Inter- national Conference on Learning Representations, 2023

2023
[14]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” inthe Proceedings of Advances in Neural Information 16 Processing Systems, vol. 36, pp. 11809–11822, 2023

2023
[15]

Forking paths in neural text generation,

E. J. Bigelow, A. Holtzman, H. Tanaka, and T. Ullman, “Forking paths in neural text generation,” inthe Proceedings of International Conference on Learning Representations, 2025

2025
[16]

A unified framework integrating decision making and trajectory planning based on spatio-temporal voxels for highway autonomous driving,

T. Zhang, W. Song, M. Fu, Y . Yang, X. Tian, and M. Wang, “A unified framework integrating decision making and trajectory planning based on spatio-temporal voxels for highway autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 10365–10379, 2021

2021
[17]

CALMM-Drive: Confidence-aware autonomous driving with large mul- timodal model,

R. Yao, Y . Wang, H. Liu, R. Yang, Z. Peng, L. Zhu, and J. Ma, “CALMM-Drive: Confidence-aware autonomous driving with large mul- timodal model,”arXiv preprint arXiv:2412.04209, 2024

arXiv 2024
[18]

GameFormer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,

Z. Huang, H. Liu, and C. Lv, “GameFormer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” inthe Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3903–3913, 2023

2023
[19]

PLUTO: Pushing the limit of imi- tation learning-based planning for autonomous driving,

J. Cheng, Y . Chen, and Q. Chen, “PLUTO: Pushing the limit of imi- tation learning-based planning for autonomous driving,”arXiv preprint arXiv:2404.14327, 2024

arXiv 2024
[20]

Hierarchical prediction uncertainty-aware motion planning for autonomous driving in lane-changing scenarios,

R. Yao and X. Sun, “Hierarchical prediction uncertainty-aware motion planning for autonomous driving in lane-changing scenarios,”Trans- portation Research Part C: Emerging Technologies, vol. 171, p. 104962, 2025

2025
[21]

GenAD: Gener- ative end-to-end autonomous driving,

W. Zheng, R. Song, X. Guo, C. Zhang, and L. Chen, “GenAD: Gener- ative end-to-end autonomous driving,” inthe Proceedings of European Conference on Computer Vision, pp. 87–104, 2024

2024
[22]

Diffusion-based planning for autonomous driving with flexible guidance,

Y . Zheng, R. Liang, K. Zheng, J. Zheng, L. Mao, J. Li, W. Gu, R. Ai, S. E. Li, X. Zhan,et al., “Diffusion-based planning for autonomous driving with flexible guidance,” inthe Proceedings of International Conference on Learning Representations, 2025

2025
[23]

HE-Drive: Human-like end-to-end driving with vision language models,

J. Wang, X. Zhang, Z. Xing, S. Gu, X. Guo, Y . Hu, Z. Song, Q. Zhang, X. Long, and W. Yin, “HE-Drive: Human-like end-to-end driving with vision language models,”arXiv preprint arXiv:2410.05051, 2024

arXiv 2024
[24]

Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,

H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” arXiv preprint arXiv:2503.19755, 2025

Pith/arXiv arXiv 2025
[25]

Recogdrive: A reinforced cognitive framework for end- to-end autonomous driving,

Y . Li, K. Xiong, X. Guo, F. Li, S. Yan, G. Xu, L. Zhou, L. Chen, H. Sun, B. Wang,et al., “Recogdrive: A reinforced cognitive framework for end- to-end autonomous driving,”arXiv preprint arXiv:2506.08052, 2025

Pith/arXiv arXiv 2025
[26]

DriveVLA-W0: World models amplify data scaling law in autonomous driving,

Y . Li, S. Shang, W. Liu, B. Zhan, H. Wang, Y . Wang, Y . Chen, X. Wang, Y . An, C. Tang,et al., “DriveVLA-W0: World models amplify data scaling law in autonomous driving,”arXiv preprint arXiv:2510.12796, 2025

Pith/arXiv arXiv 2025
[27]

Scaling laws for neural language models,

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

Pith/arXiv arXiv 2001
[28]

Distilling the knowledge in a neural network,

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

Pith/arXiv arXiv 2015
[29]

A survey on knowledge distillation of large language models,

X. Xu, M. Li, C. Tao, T. Shen, R. Cheng, J. Li, C. Xu, D. Tao, and T. Zhou, “A survey on knowledge distillation of large language models,” arXiv preprint arXiv:2402.13116, 2024

Pith/arXiv arXiv 2024
[30]

Hydra-MDP: End-to-end multimodal planning with multi-target hydra-distillation,

Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu, et al., “Hydra-MDP: End-to-end multimodal planning with multi-target hydra-distillation,”arXiv preprint arXiv:2406.06978, 2024

Pith/arXiv arXiv 2024
[31]

DistillDrive: End-to-end multi-mode autonomous driving distillation by isomorphic hetero-source planning model,

R. Yu, X. Zhang, R. Zhao, H. Yan, and M. Wang, “DistillDrive: End-to-end multi-mode autonomous driving distillation by isomorphic hetero-source planning model,” inthe Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 26188–26197, 2025

2025
[32]

Continuously learning, adapting, and improving: A dual-process approach to autonomous driving,

J. Mei, Y . Ma, X. Yang, L. Wen, X. Cai, X. Li, D. Fu, B. Zhang, P. Cai, M. Dou,et al., “Continuously learning, adapting, and improving: A dual-process approach to autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 123261–123290, 2024

2024
[33]

DSDrive: Distilling large language model for lightweight end-to-end autonomous driving with unified reasoning and planning,

W. Liu, P. Liu, and J. Ma, “DSDrive: Distilling large language model for lightweight end-to-end autonomous driving with unified reasoning and planning,”arXiv preprint arXiv:2505.05360, 2025

arXiv 2025
[34]

DiLu: A knowledge-driven approach to autonomous driving with large language models,

L. Wen, D. Fu, X. Li, X. Cai, M. Tao, P. Cai, M. Dou, B. Shi, L. He, and Y . Qiao, “DiLu: A knowledge-driven approach to autonomous driving with large language models,” inthe Proceedings of International Conference on Learning Representations, 2024

2024
[35]

LeapV AD: A leap in autonomous driving via cognitive perception and dual-process thinking,

Y . Ma, T. Wei, N. Zhong, J. Mei, T. Hu, L. Wen, X. Yang, B. Shi, and Y . Liu, “LeapV AD: A leap in autonomous driving via cognitive perception and dual-process thinking,”arXiv preprint arXiv:2501.08168, 2025

arXiv 2025
[36]

Decision- making with lightweight confidence-aware language model for au- tonomous driving,

R. Yao, R. Zhong, P. Liu, M. Peng, R. Yang, and J. Ma, “Decision- making with lightweight confidence-aware language model for au- tonomous driving,”arXiv preprint arXiv:2605.25393, 2026

Pith/arXiv arXiv 2026
[37]

nuPlan: A closed-loop ML- based planning benchmark for autonomous vehicles,

H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuPlan: A closed-loop ML- based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

Pith/arXiv arXiv 2021
[38]

A behavioral planning framework for autonomous driving,

J. Wei, J. M. Snider, T. Gu, J. M. Dolan, and B. Litkouhi, “A behavioral planning framework for autonomous driving,” inthe Proceedings of IEEE Intelligent Vehicles Symposium, pp. 458–464, 2014

2014
[39]

A universal cooperative decision- making framework for connected autonomous vehicles with generic road topologies,

Z. Huang, W. Liu, S. Shen, and J. Ma, “A universal cooperative decision- making framework for connected autonomous vehicles with generic road topologies,”IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 5414–5429, 2024

2024
[40]

A safe hierarchical planning framework for complex driving scenarios based on reinforce- ment learning,

J. Li, L. Sun, J. Chen, M. Tomizuka, and W. Zhan, “A safe hierarchical planning framework for complex driving scenarios based on reinforce- ment learning,” inthe Proceedings of IEEE International Conference on Robotics and Automation, pp. 2660–2666, 2021

2021
[41]

Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving,

Z. Huang, H. Liu, J. Wu, and C. Lv, “Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 7, pp. 7244–7258, 2023

2023
[42]

Is ego status all you need for open-loop end-to-end autonomous driving?,

Z. Li, Z. Yu, S. Lan, J. Li, J. Kautz, T. Lu, and J. M. Alvarez, “Is ego status all you need for open-loop end-to-end autonomous driving?,” in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14864–14873, 2024

2024
[43]

Open- DriveVLA: Towards end-to-end autonomous driving with large vision language action model,

X. Zhou, X. Han, F. Yang, Y . Ma, V . Tresp, and A. Knoll, “Open- DriveVLA: Towards end-to-end autonomous driving with large vision language action model,”arXiv preprint arXiv:2503.23463, 2025

arXiv 2025
[44]

Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs,

M. Xiong, Z. Hu, X. Lu, Y . LI, J. Fu, J. He, and B. Hooi, “Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs,” inthe Proceedings of International Conference on Learning Representations, 2024

2024
[45]

Conformity in large language models,

X. Zhu, C. Zhang, T. Stafford, N. Collier, and A. Vlachos, “Conformity in large language models,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3854–3872, 2025

2025
[46]

What language model architecture and pretraining objective works best for zero-shot generalization?,

T. Wang, A. Roberts, D. Hesslow, T. Le Scao, H. W. Chung, I. Belt- agy, J. Launay, and C. Raffel, “What language model architecture and pretraining objective works best for zero-shot generalization?,” in the Proceedings of International Conference on Machine Learning, pp. 22964–22984, 2022

2022
[47]

Parting with misconceptions about learning-based vehicle motion planning,

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” inthe Proceedings of Conference on Robot Learning, pp. 1268–1281, 2023

2023
[48]

Diffusion-ES: Gradient-free planning with diffusion for autonomous and instruction-guided driving,

B. Yang, H. Su, N. Gkanatsios, T.-W. Ke, A. Jain, J. Schneider, and K. Fragkiadaki, “Diffusion-ES: Gradient-free planning with diffusion for autonomous and instruction-guided driving,” inthe Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15342–15353, 2024

2024
[49]

Rethinking imitation-based planners for autonomous driving,

J. Cheng, Y . Chen, X. Mei, B. Yang, B. Li, and M. Liu, “Rethinking imitation-based planners for autonomous driving,” inthe Proceedings of IEEE International Conference on Robotics and Automation, pp. 14123– 14130, 2024

2024
[50]

Generalized force model of traffic dynamics,

D. Helbing and B. Tilch, “Generalized force model of traffic dynamics,” Physical review E, vol. 58, no. 1, p. 133, 1998

1998
[51]

Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives,

S. Xie, L. Kong, Y . Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6585– 6597, 2025

2025
[52]

From prediction to planning with goal conditioned lane graph traversals,

M. Hallgarten, M. Stoll, and A. Zell, “From prediction to planning with goal conditioned lane graph traversals,” inthe Proceedings of Interna- tional Conference on Intelligent Transportation Systems, pp. 951–958, 2023

2023
[53]

Urban Driver: Learning to drive from real-world demonstrations using policy gradients,

O. Scheel, L. Bergamini, M. Wolczyk, B. Osi ´nski, and P. Ondruska, “Urban Driver: Learning to drive from real-world demonstrations using policy gradients,” inthe Proceedings of Conference on Robot Learning, pp. 718–728, 2022

2022
[54]

PlanAgent: A multi-modal large language agent for closed-loop vehicle motion planning,

Y . Zheng, Z. Xing, Q. Zhang, B. Jin, P. Li, Y . Zheng, Z. Xia, K. Zhan, X. Lang, Y . Chen,et al., “PlanAgent: A multi-modal large language agent for closed-loop vehicle motion planning,”arXiv preprint arXiv:2406.01587, 2024

arXiv 2024
[55]

DeepSeek-V3 technical report,

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan,et al., “DeepSeek-V3 technical report,”arXiv preprint arXiv:2412.19437, 2024

Pith/arXiv arXiv 2024
[56]

DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning,

D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi,et al., “DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

Pith/arXiv arXiv 2025

[1] [1]

Planning and decision- making for autonomous vehicles,

W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision- making for autonomous vehicles,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 187–210, 2018

2018

[2] [2]

An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors,

P. Hang, C. Lv, C. Huang, J. Cai, Z. Hu, and Y . Xing, “An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors,”IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14458–14469, 2020

2020

[3] [3]

Senna: Bridging large vision-language models and end-to-end autonomous driving,

B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Senna: Bridging large vision-language models and end-to-end autonomous driving,”arXiv preprint arXiv:2410.22313, 2024. 15 t = t0 − 2.0s t = t0 − 1.5s t = t0 − 1.0s t = t0 − 0.5s t = t0 (present) ####Recommended decisions to user:####DK,DR ####Explanations: The ego vehi...

Pith/arXiv arXiv 2024

[4] [4]

Bilevel multi-armed bandit- based hierarchical reinforcement learning for interaction-aware self- driving at unsignalized intersections,

Z. Peng, Y . Wang, L. Zheng, and J. Ma, “Bilevel multi-armed bandit- based hierarchical reinforcement learning for interaction-aware self- driving at unsignalized intersections,”IEEE Transactions on Vehicular Technology, vol. 74, no. 6, pp. 8824–8838, 2025

2025

[5] [5]

End-to-end autonomous driving: Challenges and frontiers,

L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10164– 10183, 2024

2024

[6] [6]

Language models are few-shot learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language models are few-shot learners,” inthe Proceedings of Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020

1901

[7] [7]

Qwen3 technical report,

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv,et al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

Pith/arXiv arXiv 2025

[8] [8]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou,et al., “Chain-of-thought prompting elicits reasoning in large language models,”the Proceedings of Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022

2022

[9] [9]

GPT-Driver: Learning to drive with GPT,

J. Mao, Y . Qian, J. Ye, H. Zhao, and Y . Wang, “GPT-Driver: Learning to drive with GPT,”arXiv preprint arXiv:2310.01415, 2023

Pith/arXiv arXiv 2023

[10] [10]

LMDrive: Closed-loop end-to-end driving with large language models,

H. Shao, Y . Hu, L. Wang, G. Song, S. L. Waslander, Y . Liu, and H. Li, “LMDrive: Closed-loop end-to-end driving with large language models,” inthe Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15120–15130, 2024

2024

[11] [11]

DrivegGPT4: Interpretable end-to-end autonomous driving via large language model,

Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “DrivegGPT4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8186 – 8193, 2024

2024

[12] [12]

DriveLM: Driving with graph visual question answering,

C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “DriveLM: Driving with graph visual question answering,” inthe Proceedings of European Conference on Computer Vision, pp. 256–274, 2024

2024

[13] [13]

Self-consistency improves chain of thought reasoning in language models,

X. Wang, J. Wei, D. Schuurmans, Q. V . Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inthe Proceedings of Inter- national Conference on Learning Representations, 2023

2023

[14] [14]

Tree of thoughts: Deliberate problem solving with large language models,

S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” inthe Proceedings of Advances in Neural Information 16 Processing Systems, vol. 36, pp. 11809–11822, 2023

2023

[15] [15]

Forking paths in neural text generation,

E. J. Bigelow, A. Holtzman, H. Tanaka, and T. Ullman, “Forking paths in neural text generation,” inthe Proceedings of International Conference on Learning Representations, 2025

2025

[16] [16]

A unified framework integrating decision making and trajectory planning based on spatio-temporal voxels for highway autonomous driving,

T. Zhang, W. Song, M. Fu, Y . Yang, X. Tian, and M. Wang, “A unified framework integrating decision making and trajectory planning based on spatio-temporal voxels for highway autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 10365–10379, 2021

2021

[17] [17]

CALMM-Drive: Confidence-aware autonomous driving with large mul- timodal model,

R. Yao, Y . Wang, H. Liu, R. Yang, Z. Peng, L. Zhu, and J. Ma, “CALMM-Drive: Confidence-aware autonomous driving with large mul- timodal model,”arXiv preprint arXiv:2412.04209, 2024

arXiv 2024

[18] [18]

GameFormer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,

Z. Huang, H. Liu, and C. Lv, “GameFormer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” inthe Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3903–3913, 2023

2023

[19] [19]

PLUTO: Pushing the limit of imi- tation learning-based planning for autonomous driving,

J. Cheng, Y . Chen, and Q. Chen, “PLUTO: Pushing the limit of imi- tation learning-based planning for autonomous driving,”arXiv preprint arXiv:2404.14327, 2024

arXiv 2024

[20] [20]

Hierarchical prediction uncertainty-aware motion planning for autonomous driving in lane-changing scenarios,

R. Yao and X. Sun, “Hierarchical prediction uncertainty-aware motion planning for autonomous driving in lane-changing scenarios,”Trans- portation Research Part C: Emerging Technologies, vol. 171, p. 104962, 2025

2025

[21] [21]

GenAD: Gener- ative end-to-end autonomous driving,

W. Zheng, R. Song, X. Guo, C. Zhang, and L. Chen, “GenAD: Gener- ative end-to-end autonomous driving,” inthe Proceedings of European Conference on Computer Vision, pp. 87–104, 2024

2024

[22] [22]

Diffusion-based planning for autonomous driving with flexible guidance,

Y . Zheng, R. Liang, K. Zheng, J. Zheng, L. Mao, J. Li, W. Gu, R. Ai, S. E. Li, X. Zhan,et al., “Diffusion-based planning for autonomous driving with flexible guidance,” inthe Proceedings of International Conference on Learning Representations, 2025

2025

[23] [23]

HE-Drive: Human-like end-to-end driving with vision language models,

J. Wang, X. Zhang, Z. Xing, S. Gu, X. Guo, Y . Hu, Z. Song, Q. Zhang, X. Long, and W. Yin, “HE-Drive: Human-like end-to-end driving with vision language models,”arXiv preprint arXiv:2410.05051, 2024

arXiv 2024

[24] [24]

Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,

H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” arXiv preprint arXiv:2503.19755, 2025

Pith/arXiv arXiv 2025

[25] [25]

Recogdrive: A reinforced cognitive framework for end- to-end autonomous driving,

Y . Li, K. Xiong, X. Guo, F. Li, S. Yan, G. Xu, L. Zhou, L. Chen, H. Sun, B. Wang,et al., “Recogdrive: A reinforced cognitive framework for end- to-end autonomous driving,”arXiv preprint arXiv:2506.08052, 2025

Pith/arXiv arXiv 2025

[26] [26]

DriveVLA-W0: World models amplify data scaling law in autonomous driving,

Y . Li, S. Shang, W. Liu, B. Zhan, H. Wang, Y . Wang, Y . Chen, X. Wang, Y . An, C. Tang,et al., “DriveVLA-W0: World models amplify data scaling law in autonomous driving,”arXiv preprint arXiv:2510.12796, 2025

Pith/arXiv arXiv 2025

[27] [27]

Scaling laws for neural language models,

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

Pith/arXiv arXiv 2001

[28] [28]

Distilling the knowledge in a neural network,

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

Pith/arXiv arXiv 2015

[29] [29]

A survey on knowledge distillation of large language models,

X. Xu, M. Li, C. Tao, T. Shen, R. Cheng, J. Li, C. Xu, D. Tao, and T. Zhou, “A survey on knowledge distillation of large language models,” arXiv preprint arXiv:2402.13116, 2024

Pith/arXiv arXiv 2024

[30] [30]

Hydra-MDP: End-to-end multimodal planning with multi-target hydra-distillation,

Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu, et al., “Hydra-MDP: End-to-end multimodal planning with multi-target hydra-distillation,”arXiv preprint arXiv:2406.06978, 2024

Pith/arXiv arXiv 2024

[31] [31]

DistillDrive: End-to-end multi-mode autonomous driving distillation by isomorphic hetero-source planning model,

R. Yu, X. Zhang, R. Zhao, H. Yan, and M. Wang, “DistillDrive: End-to-end multi-mode autonomous driving distillation by isomorphic hetero-source planning model,” inthe Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 26188–26197, 2025

2025

[32] [32]

Continuously learning, adapting, and improving: A dual-process approach to autonomous driving,

J. Mei, Y . Ma, X. Yang, L. Wen, X. Cai, X. Li, D. Fu, B. Zhang, P. Cai, M. Dou,et al., “Continuously learning, adapting, and improving: A dual-process approach to autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 123261–123290, 2024

2024

[33] [33]

DSDrive: Distilling large language model for lightweight end-to-end autonomous driving with unified reasoning and planning,

W. Liu, P. Liu, and J. Ma, “DSDrive: Distilling large language model for lightweight end-to-end autonomous driving with unified reasoning and planning,”arXiv preprint arXiv:2505.05360, 2025

arXiv 2025

[34] [34]

DiLu: A knowledge-driven approach to autonomous driving with large language models,

L. Wen, D. Fu, X. Li, X. Cai, M. Tao, P. Cai, M. Dou, B. Shi, L. He, and Y . Qiao, “DiLu: A knowledge-driven approach to autonomous driving with large language models,” inthe Proceedings of International Conference on Learning Representations, 2024

2024

[35] [35]

LeapV AD: A leap in autonomous driving via cognitive perception and dual-process thinking,

Y . Ma, T. Wei, N. Zhong, J. Mei, T. Hu, L. Wen, X. Yang, B. Shi, and Y . Liu, “LeapV AD: A leap in autonomous driving via cognitive perception and dual-process thinking,”arXiv preprint arXiv:2501.08168, 2025

arXiv 2025

[36] [36]

Decision- making with lightweight confidence-aware language model for au- tonomous driving,

R. Yao, R. Zhong, P. Liu, M. Peng, R. Yang, and J. Ma, “Decision- making with lightweight confidence-aware language model for au- tonomous driving,”arXiv preprint arXiv:2605.25393, 2026

Pith/arXiv arXiv 2026

[37] [37]

nuPlan: A closed-loop ML- based planning benchmark for autonomous vehicles,

H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuPlan: A closed-loop ML- based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

Pith/arXiv arXiv 2021

[38] [38]

A behavioral planning framework for autonomous driving,

J. Wei, J. M. Snider, T. Gu, J. M. Dolan, and B. Litkouhi, “A behavioral planning framework for autonomous driving,” inthe Proceedings of IEEE Intelligent Vehicles Symposium, pp. 458–464, 2014

2014

[39] [39]

A universal cooperative decision- making framework for connected autonomous vehicles with generic road topologies,

Z. Huang, W. Liu, S. Shen, and J. Ma, “A universal cooperative decision- making framework for connected autonomous vehicles with generic road topologies,”IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 5414–5429, 2024

2024

[40] [40]

A safe hierarchical planning framework for complex driving scenarios based on reinforce- ment learning,

J. Li, L. Sun, J. Chen, M. Tomizuka, and W. Zhan, “A safe hierarchical planning framework for complex driving scenarios based on reinforce- ment learning,” inthe Proceedings of IEEE International Conference on Robotics and Automation, pp. 2660–2666, 2021

2021

[41] [41]

Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving,

Z. Huang, H. Liu, J. Wu, and C. Lv, “Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 7, pp. 7244–7258, 2023

2023

[42] [42]

Is ego status all you need for open-loop end-to-end autonomous driving?,

Z. Li, Z. Yu, S. Lan, J. Li, J. Kautz, T. Lu, and J. M. Alvarez, “Is ego status all you need for open-loop end-to-end autonomous driving?,” in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14864–14873, 2024

2024

[43] [43]

Open- DriveVLA: Towards end-to-end autonomous driving with large vision language action model,

X. Zhou, X. Han, F. Yang, Y . Ma, V . Tresp, and A. Knoll, “Open- DriveVLA: Towards end-to-end autonomous driving with large vision language action model,”arXiv preprint arXiv:2503.23463, 2025

arXiv 2025

[44] [44]

Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs,

M. Xiong, Z. Hu, X. Lu, Y . LI, J. Fu, J. He, and B. Hooi, “Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs,” inthe Proceedings of International Conference on Learning Representations, 2024

2024

[45] [45]

Conformity in large language models,

X. Zhu, C. Zhang, T. Stafford, N. Collier, and A. Vlachos, “Conformity in large language models,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3854–3872, 2025

2025

[46] [46]

What language model architecture and pretraining objective works best for zero-shot generalization?,

T. Wang, A. Roberts, D. Hesslow, T. Le Scao, H. W. Chung, I. Belt- agy, J. Launay, and C. Raffel, “What language model architecture and pretraining objective works best for zero-shot generalization?,” in the Proceedings of International Conference on Machine Learning, pp. 22964–22984, 2022

2022

[47] [47]

Parting with misconceptions about learning-based vehicle motion planning,

D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” inthe Proceedings of Conference on Robot Learning, pp. 1268–1281, 2023

2023

[48] [48]

Diffusion-ES: Gradient-free planning with diffusion for autonomous and instruction-guided driving,

B. Yang, H. Su, N. Gkanatsios, T.-W. Ke, A. Jain, J. Schneider, and K. Fragkiadaki, “Diffusion-ES: Gradient-free planning with diffusion for autonomous and instruction-guided driving,” inthe Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15342–15353, 2024

2024

[49] [49]

Rethinking imitation-based planners for autonomous driving,

J. Cheng, Y . Chen, X. Mei, B. Yang, B. Li, and M. Liu, “Rethinking imitation-based planners for autonomous driving,” inthe Proceedings of IEEE International Conference on Robotics and Automation, pp. 14123– 14130, 2024

2024

[50] [50]

Generalized force model of traffic dynamics,

D. Helbing and B. Tilch, “Generalized force model of traffic dynamics,” Physical review E, vol. 58, no. 1, p. 133, 1998

1998

[51] [51]

Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives,

S. Xie, L. Kong, Y . Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6585– 6597, 2025

2025

[52] [52]

From prediction to planning with goal conditioned lane graph traversals,

M. Hallgarten, M. Stoll, and A. Zell, “From prediction to planning with goal conditioned lane graph traversals,” inthe Proceedings of Interna- tional Conference on Intelligent Transportation Systems, pp. 951–958, 2023

2023

[53] [53]

Urban Driver: Learning to drive from real-world demonstrations using policy gradients,

O. Scheel, L. Bergamini, M. Wolczyk, B. Osi ´nski, and P. Ondruska, “Urban Driver: Learning to drive from real-world demonstrations using policy gradients,” inthe Proceedings of Conference on Robot Learning, pp. 718–728, 2022

2022

[54] [54]

PlanAgent: A multi-modal large language agent for closed-loop vehicle motion planning,

Y . Zheng, Z. Xing, Q. Zhang, B. Jin, P. Li, Y . Zheng, Z. Xia, K. Zhan, X. Lang, Y . Chen,et al., “PlanAgent: A multi-modal large language agent for closed-loop vehicle motion planning,”arXiv preprint arXiv:2406.01587, 2024

arXiv 2024

[55] [55]

DeepSeek-V3 technical report,

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan,et al., “DeepSeek-V3 technical report,”arXiv preprint arXiv:2412.19437, 2024

Pith/arXiv arXiv 2024

[56] [56]

DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning,

D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi,et al., “DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025

Pith/arXiv arXiv 2025