pith. sign in

arxiv: 2606.08470 · v1 · pith:G3SKBPT6new · submitted 2026-06-07 · 💻 cs.RO

LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving

Pith reviewed 2026-06-27 18:38 UTC · model grok-4.3

classification 💻 cs.RO
keywords autonomous drivinglanguage modelslifelong learninguncertainty awarenesslightweight modelsdecision makingnuPlan benchmarkmulti-agent system
0
0 comments X

The pith

LUNA-AD distills multi-agent reasoning into a lightweight dual-head model that learns continuously for better autonomous driving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that large language models can overcome their computational and static limitations in safety-critical driving by using a tri-system design: a multi-agent setup creates diverse uncertainty-aware decision demonstrations, these are distilled into an efficient dual-head model for fast inference of both actions and explanations, and a reflection mechanism enables ongoing refinement through closed-loop feedback. A sympathetic reader would care because this could make reasoned, adaptable decision-making feasible in real-time vehicle control without the delays of full-scale models. If correct, it would mean knowledge-driven autonomous driving systems can achieve higher benchmark success while running much faster than prior approaches.

Core claim

LUNA-AD features a tri-system architecture that reconciles complex multimodal behavioral reasoning, efficient deployment, and continual refinement: a multi-agent analytical system generates uncertainty-aware decision-making demonstrations through diverse hypothesis exploration, a dual-head lightweight heuristic model is distilled to unify the inference of decision distributions and textual explanations, and a reflection-driven lifelong learning mechanism operates on multimodal decision outputs to preserve strategic diversity and refine candidate decisions and rationales via closed-loop feedback.

What carries the argument

The tri-system architecture of multi-agent demonstration generation, dual-head model distillation, and reflection-driven lifelong learning.

If this is right

  • LUNA-AD delivers state-of-the-art success rates on nuPlan in both non-reactive and reactive modes.
  • Inference latency drops substantially compared to other knowledge-driven autonomous driving frameworks.
  • The lifelong learning component supports ongoing refinement of decisions and rationales without static limitations.
  • Uncertainty awareness in the distilled model helps maintain strategic diversity in driving choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The distillation step could extend the same lightweight reasoning pattern to other real-time control tasks that need both speed and explanatory output.
  • Closed-loop reflection might allow faster adaptation to novel environments than retraining on fixed datasets alone.
  • If the uncertainty modeling transfers well, similar pipelines could address overconfidence problems in other deployed decision systems.

Load-bearing premise

The multi-agent analytical system generates high-quality uncertainty-aware decision-making demonstrations that can be distilled into a lightweight dual-head model while preserving performance and enabling effective lifelong refinement via closed-loop feedback.

What would settle it

Running the model on nuPlan benchmarks and measuring success rates below those of existing knowledge-driven frameworks or finding inference latency not drastically reduced would show the architecture does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2606.08470 by Jun Ma, Mingxing Peng, Pei Liu, Ruiguo Zhong, Rui Yang, Ruoyu Yao.

Figure 1
Figure 1. Figure 1: Our methodological architecture: a knowledge-driven tri-system decision-making framework, combined with a prompt manager and a decision-guided [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Generation of uncertainty-aware decision-making demonstrations by [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An illustration of the dual-head lightweight architecture for decision [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An illustration of the reflection mechanism. Upon the execution of a high-level decision in the closed-loop environment, the progress monitor performs [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The camera system of the data collection vehicle. Six RGB cameras [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative demonstrations of our approach in four representative scenarios. For better visualization, the scores of different planning trajectories are [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Effects of test-time RAG few-shot count on the open-loop performance of the heuristic system, which includes the fitness of decision probability [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Representative cases from DriveLM-NuScenes for demonstrating the analytical system’s ability to perform multimodal decision reasoning with multi [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Representative cases from our in-house campus dataset for demonstrating the analytical system’s ability to perform multimodal decision reasoning [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
read the original abstract

While large language models (LLMs) offer promising reasoning capabilities, their integration into safety-critical driving systems is hindered by limited reasoning diversity, high computational overhead, and static learning paradigms. To address these challenges, we propose LUNA-AD, a lightweight uncertainty-aware language model with lifelong learning for autonomous driving (AD). LUNA-AD features a tri-system architecture that reconciles complex multimodal behavioral reasoning, efficient deployment, and continual refinement. We design a multi-agent analytical system to generate uncertainty-aware decision-making demonstrations through diverse hypothesis exploration. A dual-head lightweight heuristic model is distilled to unify the inference of decision distributions and textual explanations while enabling efficient deployment. Furthermore, a reflection-driven lifelong learning mechanism operates on multimodal decision outputs and preserves strategic diversity, allowing for the refinement of candidate decisions and rationales via closed-loop feedback to enhance driving robustness. Extensive experiments on nuPlan benchmarks demonstrate that LUNA-AD achieves state-of-the-art success rates under both non-reactive and reactive modes, with drastically reduced inference latency compared to existing knowledge-driven AD frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes LUNA-AD, a lightweight uncertainty-aware language model with lifelong learning for autonomous driving. It features a tri-system architecture: a multi-agent analytical system to generate uncertainty-aware decision-making demonstrations through diverse hypothesis exploration, a dual-head lightweight heuristic model distilled from it to unify inference of decision distributions and textual explanations, and a reflection-driven lifelong learning mechanism that refines decisions and rationales via closed-loop feedback on multimodal outputs. The paper claims that extensive experiments on nuPlan benchmarks demonstrate state-of-the-art success rates under both non-reactive and reactive modes, along with drastically reduced inference latency compared to existing knowledge-driven AD frameworks.

Significance. If the performance claims hold with proper substantiation, the work could meaningfully advance LLM integration into safety-critical autonomous driving by addressing computational overhead, limited reasoning diversity, and static learning through efficient distillation and continual refinement. The tri-system design offers a practical path toward deployable, uncertainty-aware systems that maintain strategic diversity.

major comments (2)
  1. [Abstract] Abstract: The central claim of state-of-the-art success rates on nuPlan (non-reactive and reactive modes) with reduced latency is presented without any quantitative metrics, baseline comparisons, error bars, dataset details, or ablation studies, preventing verification of the performance improvements against the paper's own evidence.
  2. [Distillation process (dual-head heuristic model)] Distillation process (dual-head heuristic model): The load-bearing assumption that high-quality uncertainty-aware demonstrations from the multi-agent analytical system survive distillation into the lightweight dual-head model (preserving both performance and uncertainty awareness) lacks any quantitative verification, such as success-rate deltas, calibration error, or explanation fidelity comparisons between teacher outputs and student model on identical scenarios.
minor comments (1)
  1. [Abstract] Abstract: The phrase 'extensive experiments' is used without any numerical results or key statistics, which would allow immediate assessment of the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the distillation process. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of state-of-the-art success rates on nuPlan (non-reactive and reactive modes) with reduced latency is presented without any quantitative metrics, baseline comparisons, error bars, dataset details, or ablation studies, preventing verification of the performance improvements against the paper's own evidence.

    Authors: We agree that the abstract would benefit from including key quantitative results to make the performance claims more immediately verifiable. The experimental section of the manuscript reports the specific success rates, baseline comparisons, and latency figures on nuPlan under both modes. We will revise the abstract to incorporate representative metrics (e.g., success-rate improvements and latency reductions) along with a brief reference to the evaluation protocol. revision: yes

  2. Referee: [Distillation process (dual-head heuristic model)] Distillation process (dual-head heuristic model): The load-bearing assumption that high-quality uncertainty-aware demonstrations from the multi-agent analytical system survive distillation into the lightweight dual-head model (preserving both performance and uncertainty awareness) lacks any quantitative verification, such as success-rate deltas, calibration error, or explanation fidelity comparisons between teacher outputs and student model on identical scenarios.

    Authors: This observation is correct. The manuscript describes the distillation procedure and its design goals but does not provide direct quantitative comparisons (such as success-rate deltas, calibration metrics, or explanation fidelity) between the teacher multi-agent outputs and the distilled dual-head student model on matched scenarios. We will add an ablation study with these comparisons in the revised manuscript to substantiate the preservation of performance and uncertainty awareness. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external benchmark experiments

full rationale

The paper presents an architecture (multi-agent demonstration generator, distilled dual-head model, reflection-driven lifelong learning) whose performance is asserted via nuPlan success-rate and latency measurements. No equations, fitted parameters, or self-citations are shown to reduce any claimed result to its own inputs by construction; the distillation step is described as a design choice whose fidelity is left to empirical verification rather than enforced by definition. The derivation chain therefore remains self-contained against the stated external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no information on free parameters, axioms, or invented entities is provided.

pith-pipeline@v0.9.1-grok · 5720 in / 1107 out tokens · 26471 ms · 2026-06-27T18:38:38.350345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 14 linked inside Pith

  1. [1]

    Planning and decision- making for autonomous vehicles,

    W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision- making for autonomous vehicles,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 187–210, 2018

  2. [2]

    An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors,

    P. Hang, C. Lv, C. Huang, J. Cai, Z. Hu, and Y . Xing, “An integrated framework of decision making and motion planning for autonomous vehicles considering social behaviors,”IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14458–14469, 2020

  3. [3]

    Senna: Bridging large vision-language models and end-to-end autonomous driving,

    B. Jiang, S. Chen, B. Liao, X. Zhang, W. Yin, Q. Zhang, C. Huang, W. Liu, and X. Wang, “Senna: Bridging large vision-language models and end-to-end autonomous driving,”arXiv preprint arXiv:2410.22313, 2024. 15 t = t0 − 2.0s t = t0 − 1.5s t = t0 − 1.0s t = t0 − 0.5s t = t0 (present) ####Recommended decisions to user:####DK,DR ####Explanations: The ego vehi...

  4. [4]

    Bilevel multi-armed bandit- based hierarchical reinforcement learning for interaction-aware self- driving at unsignalized intersections,

    Z. Peng, Y . Wang, L. Zheng, and J. Ma, “Bilevel multi-armed bandit- based hierarchical reinforcement learning for interaction-aware self- driving at unsignalized intersections,”IEEE Transactions on Vehicular Technology, vol. 74, no. 6, pp. 8824–8838, 2025

  5. [5]

    End-to-end autonomous driving: Challenges and frontiers,

    L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, “End-to-end autonomous driving: Challenges and frontiers,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10164– 10183, 2024

  6. [6]

    Language models are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language models are few-shot learners,” inthe Proceedings of Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020

  7. [7]

    Qwen3 technical report,

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv,et al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

  8. [8]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou,et al., “Chain-of-thought prompting elicits reasoning in large language models,”the Proceedings of Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022

  9. [9]

    GPT-Driver: Learning to drive with GPT,

    J. Mao, Y . Qian, J. Ye, H. Zhao, and Y . Wang, “GPT-Driver: Learning to drive with GPT,”arXiv preprint arXiv:2310.01415, 2023

  10. [10]

    LMDrive: Closed-loop end-to-end driving with large language models,

    H. Shao, Y . Hu, L. Wang, G. Song, S. L. Waslander, Y . Liu, and H. Li, “LMDrive: Closed-loop end-to-end driving with large language models,” inthe Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15120–15130, 2024

  11. [11]

    DrivegGPT4: Interpretable end-to-end autonomous driving via large language model,

    Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “DrivegGPT4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8186 – 8193, 2024

  12. [12]

    DriveLM: Driving with graph visual question answering,

    C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “DriveLM: Driving with graph visual question answering,” inthe Proceedings of European Conference on Computer Vision, pp. 256–274, 2024

  13. [13]

    Self-consistency improves chain of thought reasoning in language models,

    X. Wang, J. Wei, D. Schuurmans, Q. V . Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inthe Proceedings of Inter- national Conference on Learning Representations, 2023

  14. [14]

    Tree of thoughts: Deliberate problem solving with large language models,

    S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” inthe Proceedings of Advances in Neural Information 16 Processing Systems, vol. 36, pp. 11809–11822, 2023

  15. [15]

    Forking paths in neural text generation,

    E. J. Bigelow, A. Holtzman, H. Tanaka, and T. Ullman, “Forking paths in neural text generation,” inthe Proceedings of International Conference on Learning Representations, 2025

  16. [16]

    A unified framework integrating decision making and trajectory planning based on spatio-temporal voxels for highway autonomous driving,

    T. Zhang, W. Song, M. Fu, Y . Yang, X. Tian, and M. Wang, “A unified framework integrating decision making and trajectory planning based on spatio-temporal voxels for highway autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 10365–10379, 2021

  17. [17]

    CALMM-Drive: Confidence-aware autonomous driving with large mul- timodal model,

    R. Yao, Y . Wang, H. Liu, R. Yang, Z. Peng, L. Zhu, and J. Ma, “CALMM-Drive: Confidence-aware autonomous driving with large mul- timodal model,”arXiv preprint arXiv:2412.04209, 2024

  18. [18]

    GameFormer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,

    Z. Huang, H. Liu, and C. Lv, “GameFormer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving,” inthe Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3903–3913, 2023

  19. [19]

    PLUTO: Pushing the limit of imi- tation learning-based planning for autonomous driving,

    J. Cheng, Y . Chen, and Q. Chen, “PLUTO: Pushing the limit of imi- tation learning-based planning for autonomous driving,”arXiv preprint arXiv:2404.14327, 2024

  20. [20]

    Hierarchical prediction uncertainty-aware motion planning for autonomous driving in lane-changing scenarios,

    R. Yao and X. Sun, “Hierarchical prediction uncertainty-aware motion planning for autonomous driving in lane-changing scenarios,”Trans- portation Research Part C: Emerging Technologies, vol. 171, p. 104962, 2025

  21. [21]

    GenAD: Gener- ative end-to-end autonomous driving,

    W. Zheng, R. Song, X. Guo, C. Zhang, and L. Chen, “GenAD: Gener- ative end-to-end autonomous driving,” inthe Proceedings of European Conference on Computer Vision, pp. 87–104, 2024

  22. [22]

    Diffusion-based planning for autonomous driving with flexible guidance,

    Y . Zheng, R. Liang, K. Zheng, J. Zheng, L. Mao, J. Li, W. Gu, R. Ai, S. E. Li, X. Zhan,et al., “Diffusion-based planning for autonomous driving with flexible guidance,” inthe Proceedings of International Conference on Learning Representations, 2025

  23. [23]

    HE-Drive: Human-like end-to-end driving with vision language models,

    J. Wang, X. Zhang, Z. Xing, S. Gu, X. Guo, Y . Hu, Z. Song, Q. Zhang, X. Long, and W. Yin, “HE-Drive: Human-like end-to-end driving with vision language models,”arXiv preprint arXiv:2410.05051, 2024

  24. [24]

    Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,

    H. Fu, D. Zhang, Z. Zhao, J. Cui, D. Liang, C. Zhang, D. Zhang, H. Xie, B. Wang, and X. Bai, “Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation,” arXiv preprint arXiv:2503.19755, 2025

  25. [25]

    Recogdrive: A reinforced cognitive framework for end- to-end autonomous driving,

    Y . Li, K. Xiong, X. Guo, F. Li, S. Yan, G. Xu, L. Zhou, L. Chen, H. Sun, B. Wang,et al., “Recogdrive: A reinforced cognitive framework for end- to-end autonomous driving,”arXiv preprint arXiv:2506.08052, 2025

  26. [26]

    DriveVLA-W0: World models amplify data scaling law in autonomous driving,

    Y . Li, S. Shang, W. Liu, B. Zhan, H. Wang, Y . Wang, Y . Chen, X. Wang, Y . An, C. Tang,et al., “DriveVLA-W0: World models amplify data scaling law in autonomous driving,”arXiv preprint arXiv:2510.12796, 2025

  27. [27]

    Scaling laws for neural language models,

    J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

  28. [28]

    Distilling the knowledge in a neural network,

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

  29. [29]

    A survey on knowledge distillation of large language models,

    X. Xu, M. Li, C. Tao, T. Shen, R. Cheng, J. Li, C. Xu, D. Tao, and T. Zhou, “A survey on knowledge distillation of large language models,” arXiv preprint arXiv:2402.13116, 2024

  30. [30]

    Hydra-MDP: End-to-end multimodal planning with multi-target hydra-distillation,

    Z. Li, K. Li, S. Wang, S. Lan, Z. Yu, Y . Ji, Z. Li, Z. Zhu, J. Kautz, Z. Wu, et al., “Hydra-MDP: End-to-end multimodal planning with multi-target hydra-distillation,”arXiv preprint arXiv:2406.06978, 2024

  31. [31]

    DistillDrive: End-to-end multi-mode autonomous driving distillation by isomorphic hetero-source planning model,

    R. Yu, X. Zhang, R. Zhao, H. Yan, and M. Wang, “DistillDrive: End-to-end multi-mode autonomous driving distillation by isomorphic hetero-source planning model,” inthe Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 26188–26197, 2025

  32. [32]

    Continuously learning, adapting, and improving: A dual-process approach to autonomous driving,

    J. Mei, Y . Ma, X. Yang, L. Wen, X. Cai, X. Li, D. Fu, B. Zhang, P. Cai, M. Dou,et al., “Continuously learning, adapting, and improving: A dual-process approach to autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 123261–123290, 2024

  33. [33]

    DSDrive: Distilling large language model for lightweight end-to-end autonomous driving with unified reasoning and planning,

    W. Liu, P. Liu, and J. Ma, “DSDrive: Distilling large language model for lightweight end-to-end autonomous driving with unified reasoning and planning,”arXiv preprint arXiv:2505.05360, 2025

  34. [34]

    DiLu: A knowledge-driven approach to autonomous driving with large language models,

    L. Wen, D. Fu, X. Li, X. Cai, M. Tao, P. Cai, M. Dou, B. Shi, L. He, and Y . Qiao, “DiLu: A knowledge-driven approach to autonomous driving with large language models,” inthe Proceedings of International Conference on Learning Representations, 2024

  35. [35]

    LeapV AD: A leap in autonomous driving via cognitive perception and dual-process thinking,

    Y . Ma, T. Wei, N. Zhong, J. Mei, T. Hu, L. Wen, X. Yang, B. Shi, and Y . Liu, “LeapV AD: A leap in autonomous driving via cognitive perception and dual-process thinking,”arXiv preprint arXiv:2501.08168, 2025

  36. [36]

    Decision- making with lightweight confidence-aware language model for au- tonomous driving,

    R. Yao, R. Zhong, P. Liu, M. Peng, R. Yang, and J. Ma, “Decision- making with lightweight confidence-aware language model for au- tonomous driving,”arXiv preprint arXiv:2605.25393, 2026

  37. [37]

    nuPlan: A closed-loop ML- based planning benchmark for autonomous vehicles,

    H. Caesar, J. Kabzan, K. S. Tan, W. K. Fong, E. Wolff, A. Lang, L. Fletcher, O. Beijbom, and S. Omari, “nuPlan: A closed-loop ML- based planning benchmark for autonomous vehicles,”arXiv preprint arXiv:2106.11810, 2021

  38. [38]

    A behavioral planning framework for autonomous driving,

    J. Wei, J. M. Snider, T. Gu, J. M. Dolan, and B. Litkouhi, “A behavioral planning framework for autonomous driving,” inthe Proceedings of IEEE Intelligent Vehicles Symposium, pp. 458–464, 2014

  39. [39]

    A universal cooperative decision- making framework for connected autonomous vehicles with generic road topologies,

    Z. Huang, W. Liu, S. Shen, and J. Ma, “A universal cooperative decision- making framework for connected autonomous vehicles with generic road topologies,”IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 5414–5429, 2024

  40. [40]

    A safe hierarchical planning framework for complex driving scenarios based on reinforce- ment learning,

    J. Li, L. Sun, J. Chen, M. Tomizuka, and W. Zhan, “A safe hierarchical planning framework for complex driving scenarios based on reinforce- ment learning,” inthe Proceedings of IEEE International Conference on Robotics and Automation, pp. 2660–2666, 2021

  41. [41]

    Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving,

    Z. Huang, H. Liu, J. Wu, and C. Lv, “Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 7, pp. 7244–7258, 2023

  42. [42]

    Is ego status all you need for open-loop end-to-end autonomous driving?,

    Z. Li, Z. Yu, S. Lan, J. Li, J. Kautz, T. Lu, and J. M. Alvarez, “Is ego status all you need for open-loop end-to-end autonomous driving?,” in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14864–14873, 2024

  43. [43]

    Open- DriveVLA: Towards end-to-end autonomous driving with large vision language action model,

    X. Zhou, X. Han, F. Yang, Y . Ma, V . Tresp, and A. Knoll, “Open- DriveVLA: Towards end-to-end autonomous driving with large vision language action model,”arXiv preprint arXiv:2503.23463, 2025

  44. [44]

    Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs,

    M. Xiong, Z. Hu, X. Lu, Y . LI, J. Fu, J. He, and B. Hooi, “Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs,” inthe Proceedings of International Conference on Learning Representations, 2024

  45. [45]

    Conformity in large language models,

    X. Zhu, C. Zhang, T. Stafford, N. Collier, and A. Vlachos, “Conformity in large language models,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 3854–3872, 2025

  46. [46]

    What language model architecture and pretraining objective works best for zero-shot generalization?,

    T. Wang, A. Roberts, D. Hesslow, T. Le Scao, H. W. Chung, I. Belt- agy, J. Launay, and C. Raffel, “What language model architecture and pretraining objective works best for zero-shot generalization?,” in the Proceedings of International Conference on Machine Learning, pp. 22964–22984, 2022

  47. [47]

    Parting with misconceptions about learning-based vehicle motion planning,

    D. Dauner, M. Hallgarten, A. Geiger, and K. Chitta, “Parting with misconceptions about learning-based vehicle motion planning,” inthe Proceedings of Conference on Robot Learning, pp. 1268–1281, 2023

  48. [48]

    Diffusion-ES: Gradient-free planning with diffusion for autonomous and instruction-guided driving,

    B. Yang, H. Su, N. Gkanatsios, T.-W. Ke, A. Jain, J. Schneider, and K. Fragkiadaki, “Diffusion-ES: Gradient-free planning with diffusion for autonomous and instruction-guided driving,” inthe Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15342–15353, 2024

  49. [49]

    Rethinking imitation-based planners for autonomous driving,

    J. Cheng, Y . Chen, X. Mei, B. Yang, B. Li, and M. Liu, “Rethinking imitation-based planners for autonomous driving,” inthe Proceedings of IEEE International Conference on Robotics and Automation, pp. 14123– 14130, 2024

  50. [50]

    Generalized force model of traffic dynamics,

    D. Helbing and B. Tilch, “Generalized force model of traffic dynamics,” Physical review E, vol. 58, no. 1, p. 133, 1998

  51. [51]

    Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives,

    S. Xie, L. Kong, Y . Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6585– 6597, 2025

  52. [52]

    From prediction to planning with goal conditioned lane graph traversals,

    M. Hallgarten, M. Stoll, and A. Zell, “From prediction to planning with goal conditioned lane graph traversals,” inthe Proceedings of Interna- tional Conference on Intelligent Transportation Systems, pp. 951–958, 2023

  53. [53]

    Urban Driver: Learning to drive from real-world demonstrations using policy gradients,

    O. Scheel, L. Bergamini, M. Wolczyk, B. Osi ´nski, and P. Ondruska, “Urban Driver: Learning to drive from real-world demonstrations using policy gradients,” inthe Proceedings of Conference on Robot Learning, pp. 718–728, 2022

  54. [54]

    PlanAgent: A multi-modal large language agent for closed-loop vehicle motion planning,

    Y . Zheng, Z. Xing, Q. Zhang, B. Jin, P. Li, Y . Zheng, Z. Xia, K. Zhan, X. Lang, Y . Chen,et al., “PlanAgent: A multi-modal large language agent for closed-loop vehicle motion planning,”arXiv preprint arXiv:2406.01587, 2024

  55. [55]

    DeepSeek-V3 technical report,

    A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan,et al., “DeepSeek-V3 technical report,”arXiv preprint arXiv:2412.19437, 2024

  56. [56]

    DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning,

    D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi,et al., “DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning,”arXiv preprint arXiv:2501.12948, 2025