Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations
Pith reviewed 2026-05-20 10:08 UTC · model grok-4.3
The pith
Deployment perturbations such as frame drops and GPS noise substantially degrade closed-loop autonomous driving performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Bench2Drive-Robust evaluates representative end-to-end driving methods under deployment-oriented perturbations from camera-stream failures, ego-state estimation errors, and compute-induced control delays. The results demonstrate that these perturbations substantially degrade closed-loop driving performance in ways not captured by conventional image-level corruption evaluations.
What carries the argument
Bench2Drive-Robust benchmark applying systematic deployment perturbations to closed-loop end-to-end autonomous driving evaluation.
Load-bearing premise
The primary deployment imperfections for closed-loop autonomous driving are camera-stream failures, ego-state estimation errors, and compute-induced control delays.
What would settle it
Demonstrating that closed-loop performance does not degrade significantly under high-severity versions of these three perturbations, or that image corruptions account for similar levels of failure.
Figures
read the original abstract
Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules and open-loop planning outputs. However, deployment can also involve system-level imperfections, such as inference latency and ego-state estimation errors, which remain less studied in closed-loop E2E-AD evaluation. These imperfections can accumulate through the feedback loop and destabilize control. In this work, we present Bench2Drive-Robust, to our knowledge the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving under realistic deployment perturbations. We systematically evaluate deployment-oriented perturbations arising from three major sources: camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, and speed or odometry errors), and compute-induced control delay (model inference delay). We evaluate representative end-to-end driving methods and analyze their robustness under different perturbation severities. Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations. By establishing a closed-loop evaluation protocol and demonstrating the substantial impact of these deployment-oriented perturbations, Bench2Drive-Robust defines practical robustness problems for end-to-end autonomous driving and encourages further research on deployment-aware robust driving systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Bench2Drive-Robust as the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving. It evaluates representative E2E methods under perturbations from three sources—camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, speed/odometry errors), and compute-induced control delay (inference latency)—across varying severities. Results indicate substantial closed-loop performance degradation, with the central claim that these deployment issues expose robustness challenges not fully captured by conventional image-level corruption evaluations.
Significance. If the empirical results hold under the stated protocol, the benchmark could usefully shift focus from perception-only corruptions to system-level deployment imperfections that accumulate in closed-loop feedback. This would define concrete, practical robustness problems for E2E-AD and support development of deployment-aware methods.
major comments (1)
- [Abstract] Abstract (paragraph on the three major sources and final results sentence): The claim that the observed degradations reveal 'robustness challenges that are not fully captured by conventional image-level corruption evaluations' lacks direct support. The experiments apply only the three deployment perturbations without a side-by-side re-evaluation of standard image-level corruptions inside the same Bench2Drive-Robust closed-loop simulator, protocol, models, and metrics. Without this comparison it remains unclear whether closed-loop feedback simply amplifies any corruption or whether the deployment issues are distinctively problematic.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on the three major sources and final results sentence): The claim that the observed degradations reveal 'robustness challenges that are not fully captured by conventional image-level corruption evaluations' lacks direct support. The experiments apply only the three deployment perturbations without a side-by-side re-evaluation of standard image-level corruptions inside the same Bench2Drive-Robust closed-loop simulator, protocol, models, and metrics. Without this comparison it remains unclear whether closed-loop feedback simply amplifies any corruption or whether the deployment issues are distinctively problematic.
Authors: We appreciate the referee's observation that a direct side-by-side comparison would provide stronger evidence for the distinction. Our perturbations arise from camera-stream failures (frame drops and partial observations), ego-state estimation errors (GPS noise, speed/odometry inaccuracies), and compute-induced control delays. These are system-level deployment imperfections that introduce temporal inconsistencies and feedback instabilities in the closed loop. In contrast, conventional image-level corruptions (e.g., weather effects or pixel noise) primarily degrade input perception in open-loop or perception-focused settings. Because the perturbation sources and evaluation protocol differ categorically, the robustness challenges we identify are not equivalent to those tested by image corruption benchmarks. Nevertheless, to address the concern, we will revise the abstract to qualify the claim as exposing 'distinct robustness challenges arising from deployment imperfections' and expand the introduction and related work to explicitly contrast our device-centric, closed-loop protocol with prior image-level studies. This textual clarification will be incorporated in the revision. revision: partial
Circularity Check
Empirical benchmark with direct simulation results; no derivation or self-referential reduction
full rationale
The paper presents Bench2Drive-Robust as a new closed-loop evaluation protocol and reports performance degradation from direct application of three classes of deployment perturbations (camera-stream failures, ego-state errors, and inference delays) inside a simulator. No equations, fitted parameters, or mathematical derivations appear in the provided text; results are obtained from independent simulation runs rather than any reduction to prior fitted quantities or self-cited uniqueness theorems. The central claim that these perturbations reveal challenges 'not fully captured by conventional image-level corruption evaluations' is an empirical observation from the new benchmark, not a quantity derived by construction from the inputs or from load-bearing self-citations. The work is therefore self-contained against external benchmarks and receives a score of 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Perturbations arising from camera-stream failures, ego-state estimation errors, and compute-induced control delay accumulate through the closed-loop feedback and destabilize control in ways not captured by image-level tests.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022. 1, 2, 3, 8, 9
work page 2022
-
[2]
Planning-oriented autonomous driving
Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023. 2, 3, 8, 9
work page 2023
-
[3]
Vad: Vectorized scene representation for efficient autonomous driving
Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023. 2, 3, 8, 9
work page 2023
-
[4]
Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022. 3
work page 2022
-
[5]
Simlingo: Vision-only closed-loop autonomous driving with language-action alignment
Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11993–12003, 2025. 2, 3, 8, 9
work page 2025
-
[6]
Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving
Xiaosong Jia, Junqi You, Zhiyuan Zhang, and Junchi Yan. Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving. InThe Thirteenth International Con- ference on Learning Representations, 2025. URL https://openreview.net/forum?id= M42KR4W9P5. 1, 3
work page 2025
-
[7]
Benchmarking robustness of 3d object detection to common corruptions
Yinpeng Dong, Caixin Kang, Jinlai Zhang, Zijian Zhu, Yikai Wang, Xiao Yang, Hang Su, Xingxing Wei, and Jun Zhu. Benchmarking robustness of 3d object detection to common corruptions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1022–1032, 2023. 2, 3, 6
work page 2023
-
[8]
Robo3d: Towards robust and reliable 3d perception against corruptions
Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Robo3d: Towards robust and reliable 3d perception against corruptions. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19994–20006, 2023. 3
work page 2023
-
[9]
Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit Cottereau, and Wei Tsang Ooi. Robodepth: Robust out-of-distribution depth estimation under corruptions.Advances in Neural Information Processing Systems, 36:21298–21342, 2023. 3 10
work page 2023
-
[10]
Benchmarking and improving bird’s eye view perception robustness in autonomous driving
Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Benchmarking and improving bird’s eye view perception robustness in autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3878–3894, 2025. 3, 6
work page 2025
-
[11]
Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, and Jing Zhang. Is your hd map constructor reliable under sensor corruptions?Advances in Neural Information Processing Systems, 37:22441–22482, 2024. 2, 3
work page 2024
-
[12]
Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024
Wei Jiang, Lu Wang, Tianyuan Zhang, Yuwei Chen, Jian Dong, Wei Bao, Zichao Zhang, and Qiang Fu. Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024. 2, 3
work page 2024
-
[13]
Dacheng Liao, Mengshi Qi, Peng Shu, Zhining Zhang, Yuxin Lin, Liang Liu, and Huadong Ma. Robodrivevlm: A novel benchmark and baseline towards robust vision-language models for autonomous driving.arXiv preprint arXiv:2512.01300, 2025. 3, 6
-
[14]
Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes
Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, and Jingdong Wang. Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[15]
Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems, 37:819–844, 2024. 2, 3, 4, 6, 8, 21, 22
work page 2024
-
[16]
End to End Learning for Self-Driving Cars
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016. 3
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
End-to-end driving via conditional imitation learning
Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In2018 IEEE international conference on robotics and automation (ICRA), pages 4693–4700. IEEE, 2018
work page 2018
-
[18]
Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Krähenbühl. Learning by cheating. In Conference on robot learning, pages 66–75. PMLR, 2020
work page 2020
-
[19]
Multi-modal fusion transformer for end-to-end autonomous driving
Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi-modal fusion transformer for end-to-end autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7077–7087, 2021
work page 2021
-
[20]
Sophia Koepke, Zeynep Akata, and Andreas Geiger
Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, and Andreas Geiger. Plant: Explainable planning transformers via object-level representations. In Conference on Robotic Learning (CoRL), 2022
work page 2022
-
[21]
Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer
Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer. InConference on Robot Learning, pages 726–737. PMLR, 2023. 3
work page 2023
-
[22]
Sparsedrive: End-to-end autonomous driving via sparse scene representation
Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Haoran Wu, and Sifa Zheng. Sparsedrive: End-to-end autonomous driving via sparse scene representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8795–8801. IEEE, 2025. 3
work page 2025
-
[23]
Runwen Zhu, Jianbo Zhao, Diankun Zhang, Guoan Wang, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, et al. Sparsead: Sparse query-centric paradigm for efficient end-to-end autonomous driving.IEEE Transactions on Artificial Intelligence, 2025
work page 2025
-
[24]
Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d
Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. InEuropean conference on computer vision, pages 194–210. Springer, 2020
work page 2020
-
[25]
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2020– 2036, 2024. 11
work page 2020
-
[26]
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view.arXiv preprint arXiv:2112.11790, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[27]
Bevdepth: Acquisition of reliable depth for multi-view 3d object detection
Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 1477–1485, 2023
work page 2023
-
[28]
Bevfusion: A simple and robust lidar-camera fusion framework
Tingting Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, and Zhi Tang. Bevfusion: A simple and robust lidar-camera fusion framework. Advances in neural information processing systems, 35:10421–10434, 2022
work page 2022
-
[29]
Detr3d: 3d object detection from multi-view images via 3d-to-2d queries
Yue Wang, Vitor Campagnolo Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, and Justin Solomon. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on robot learning, pages 180–191. PMLR, 2022
work page 2022
-
[30]
Petr: Position embedding transfor- mation for multi-view 3d object detection
Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. Petr: Position embedding transfor- mation for multi-view 3d object detection. InEuropean conference on computer vision, pages 531–548. Springer, 2022
work page 2022
-
[31]
Exploring object- centric temporal modeling for efficient multi-view 3d object detection
Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, and Xiangyu Zhang. Exploring object- centric temporal modeling for efficient multi-view 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3621–3631, 2023
work page 2023
-
[32]
Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction
Yunpeng Zhang, Zheng Zhu, and Dalong Du. Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9433–9443, 2023
work page 2023
-
[33]
Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving
Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, and Jiwen Lu. Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21729–21740, 2023
work page 2023
-
[34]
Para-drive: Par- allelized architecture for real-time autonomous driving
Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Par- allelized architecture for real-time autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15449–15458, 2024. 3, 5
work page 2024
-
[35]
Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving
Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12037–12047, 2025. 3
work page 2025
-
[36]
Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1602–1611, 2025
work page 2025
-
[37]
Lin Liu, Caiyan Jia, Guanyi Yu, Ziying Song, JunQiao Li, Feiyang Jia, Peiliang Wu, Xiaoshuai Hao, and Yadan Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025
-
[38]
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving
Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, and Yadan Luo. Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22432–22441, 2025. 3
work page 2025
-
[40]
Drivesuprim: Towards precise trajectory selection for end-to-end planning
Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 11910–11918, 2026. 12
work page 2026
-
[41]
Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, and Junchi Yan. Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025. 3
-
[42]
Lmdrive: Closed-loop end-to-end driving with large language models
Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15120–15130,
-
[43]
Drivelm: Driving with graph visual question answering
Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InEuropean conference on computer vision, pages 256–274. Springer, 2024
work page 2024
-
[44]
Dolphins: Multimodal language model for driving
Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driving. InEuropean Conference on Computer Vision, pages 403–420. Springer, 2024
work page 2024
-
[45]
Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 9(10):8186–8193, 2024
work page 2024
-
[46]
Dilu: A knowledge-driven approach to autonomous driving with large language models,
Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, and Yu Qiao. Dilu: A knowledge-driven approach to autonomous driving with large language models.arXiv preprint arXiv:2309.16292, 2023
-
[47]
Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823–24834, 2025
work page 2025
-
[48]
Open- drivevla: Towards end-to-end autonomous driving with large vision language action model
Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, V olker Tresp, and Alois Knoll. Open- drivevla: Towards end-to-end autonomous driving with large vision language action model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 13782–13790, 2026
work page 2026
-
[49]
Zewei Zhou, Tianhui Cai, Yun Zhao, Seth Z.and Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.Advances in Neural Information Processing Systems, 2025
work page 2025
-
[50]
DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
Zhenjie Yang, Yilin Chai, Xiaosong Jia, Qifeng Li, Yuqian Shao, Xuekai Zhu, Haisheng Su, and Junchi Yan. Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving.arXiv preprint arXiv:2505.16278, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
A survey on vision-language-action models for autonomous driving
Sicong Jiang, Zilin Huang, Kangan Qian, Ziang Luo, Tianze Zhu, Yang Zhong, Yihong Tang, Menglin Kong, Yunlong Wang, Siwen Jiao, et al. A survey on vision-language-action models for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4524–4536, 2025
work page 2025
-
[52]
World4drive: End-to-end autonomous driving via intention-aware physical latent world model
Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 28632–28642, 2025
work page 2025
-
[53]
Real-ad: Towards human-like reasoning in end-to-end autonomous driving
Yuhang Lu, Jiadong Tu, Yuexin Ma, and Xinge Zhu. Real-ad: Towards human-like reasoning in end-to-end autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27783–27793, 2025. 3
work page 2025
-
[54]
Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, et al. The robodrive challenge: Drive anytime anywhere in any condition.arXiv preprint arXiv:2405.08816, 2024. 3, 6 13
-
[55]
Fail2Drive: Benchmarking Closed-Loop Driving Generalization
Simon Gerstenecker, Andreas Geiger, and Katrin Renz. Fail2drive: Benchmarking closed-loop driving generalization.arXiv preprint arXiv:2604.08535, 2026. 3
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[56]
Are we ready for autonomous driving? the kitti vision benchmark suite
Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012. 3
work page 2012
-
[57]
The cityscapes dataset for semantic urban scene understanding
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016
work page 2016
-
[58]
The apolloscape dataset for autonomous driving
Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. The apolloscape dataset for autonomous driving. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 954–960, 2018
work page 2018
-
[59]
Argoverse: 3d tracking and forecasting with rich maps
Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8748–8757, 2019
work page 2019
-
[60]
nuscenes: A multimodal dataset for autonomous driving
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020
work page 2020
-
[61]
Scalability in perception for autonomous driving: Waymo open dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020
work page 2020
-
[62]
Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning
Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020
work page 2020
-
[63]
One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021
Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, et al. One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021
-
[64]
Pandaset: Advanced sensor suite dataset for autonomous driving
Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. In2021 IEEE international intelligent transportation systems conference (ITSC), pages 3095–3101. IEEE, 2021. 3
work page 2021
-
[65]
Parting with mis- conceptions about learning-based vehicle motion planning
Daniel Dauner, Marcel Hallgarten, Andreas Geiger, and Kashyap Chitta. Parting with mis- conceptions about learning-based vehicle motion planning. InConference on Robot Learning, pages 1268–1281. PMLR, 2023. 3
work page 2023
-
[66]
Carla: An open urban driving simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 3
work page 2017
-
[67]
NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[68]
Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024. 14
work page 2024
-
[69]
Drivearena: A closed-loop generative simulation platform for autonomous driving
Xuemeng Yang, Licheng Wen, Tiantian Wei, Yukai Ma, Jianbiao Mei, Xin Li, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, et al. Drivearena: A closed-loop generative simulation platform for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 26933–26943, 2025. 3
work page 2025
-
[70]
Zhiyu Huang, Haochen Liu, and Chen Lv. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3903–3913, 2023. 3
work page 2023
-
[71]
Yuqian Shao, Xiaosong Jia, Langechuan Liu, and Junchi Yan. Can users specify driving speed? bench2drive-speed: Benchmark and baselines for desired-speed conditioned autonomous driving.arXiv preprint arXiv:2603.25672, 2026
-
[72]
Hydra- next: Robust closed-loop driving with open-loop training
Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, and Jose M Alvarez. Hydra- next: Robust closed-loop driving with open-loop training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27305–27314, 2025. 3
work page 2025
-
[73]
Benchmarking neural network robustness to common corruptions and perturbations
Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HJz6tiCqYm. 3, 6
work page 2019
-
[74]
Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming
Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking robustness in object detection: Autonomous driving when winter is coming.arXiv preprint arXiv:1907.07484, 2019. 3
-
[75]
Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020
Edward Fang. Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020. 5
work page 2020
-
[76]
Lead: The llm enhanced planning system converged with end-to-end autonomous driving,
Yuhang Zhang, Jiaqi Liu, Chengkai Xu, Peng Hang, and Jian Sun. Lead: The llm enhanced plan- ning system converged with end-to-end autonomous driving.arXiv preprint arXiv:2507.05754,
-
[77]
Youjing Cui and Shuzhi Sam Ge. Autonomous vehicle positioning with gps in urban canyon environments.IEEE transactions on robotics and automation, 19(1):15–25, 2003. 5
work page 2003
-
[78]
Accurate visual localization for automotive applications
Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, and Trevor Darrell. Accurate visual localization for automotive applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019
work page 2019
-
[79]
Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021
Karl Rehrl, Stefan Göttlich, Klaus Krainz, and Andreas Graser. Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021. doi: 10.3390/s21175855
-
[80]
Shubh Gupta and Grace Gao. Reliable urban vehicle localization under faulty satellite navigation signals.EURASIP Journal on Advances in Signal Processing, 2024(1):32, 2024. doi: 10.1186/ s13634-024-01150-2. 5
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.