pith. sign in

arxiv: 2605.18059 · v1 · pith:S5EPFUJ2new · submitted 2026-05-18 · 💻 cs.RO

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

Pith reviewed 2026-05-20 10:08 UTC · model grok-4.3

classification 💻 cs.RO
keywords autonomous drivingrobustnessclosed-loop evaluationdeployment perturbationsend-to-end drivingbenchmark
0
0 comments X

The pith

Deployment perturbations such as frame drops and GPS noise substantially degrade closed-loop autonomous driving performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Bench2Drive-Robust, a benchmark designed to test closed-loop end-to-end autonomous driving under realistic deployment issues. It focuses on three types of perturbations: camera stream problems like dropped frames, errors in estimating the vehicle's state such as noisy GPS, and delays in control caused by model computation time. These issues can build up in the driving loop and cause instability, unlike the image corruptions studied before. Showing that current methods suffer under these conditions highlights the need for more practical robustness testing in autonomous driving research.

Core claim

Bench2Drive-Robust evaluates representative end-to-end driving methods under deployment-oriented perturbations from camera-stream failures, ego-state estimation errors, and compute-induced control delays. The results demonstrate that these perturbations substantially degrade closed-loop driving performance in ways not captured by conventional image-level corruption evaluations.

What carries the argument

Bench2Drive-Robust benchmark applying systematic deployment perturbations to closed-loop end-to-end autonomous driving evaluation.

Load-bearing premise

The primary deployment imperfections for closed-loop autonomous driving are camera-stream failures, ego-state estimation errors, and compute-induced control delays.

What would settle it

Demonstrating that closed-loop performance does not degrade significantly under high-severity versions of these three perturbations, or that image corruptions account for similar levels of failure.

Figures

Figures reproduced from arXiv: 2605.18059 by Haoran Liu, Junchi Yan, Shaofeng Zhang, Xianda Guo, Xiaosong Jia, Xingjun Ma, Yanlun Peng, Yu-Gang Jiang, Zhenghao Jin, Zhiyuan Zhang, Zuxuan Wu.

Figure 1
Figure 1. Figure 1: Overview of Bench2Drive-Robust. We evaluate three categories of deployment-side failures for E2E-AD—camera-stream failures, ego-state estimation errors, and compute-induced control delay—under closed-loop driving. This differs from existing image- or perception-centric robustness benchmarks that mainly target external appearance changes. ever, existing closed-loop evaluations typically assess driving capab… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Bench2Drive-Robust. We evaluate end-to-end autonomous driving models in closed-loop simulation by injecting deployment-relevant perturbations into the sensing, ego-state, and action pipelines while keeping the evaluated policy unchanged. The benchmark covers temporal delays, observation integrity perturbations, and ego-state estimation errors, enabling controlled robustness evaluation under con… view at source ↗
Figure 3
Figure 3. Figure 3: Latency modes supported in our framework: immediate execution, dynamic real-time scheduling, and fixed-delay FIFO buffering. 3.3 Robustness Taxonomy We organize deployment-oriented perturbations by where they enter the closed-loop driving stack: camera-stream perturbations, ego-state perturbations, and compute-control perturbations. This taxonomy separates failures in visual data delivery, vehicle-state se… view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of ego-state input perturbations. (a) The model receives the clean GPS reading gt. (b) GPS input noise adds Gaussian perturbations to GPS readings. (c) Speed noise independently samples a multiplicative factor ηt at each timestep and feeds v˜t = ηtvt to the policy. be affected by sensor and ego-motion reliability issues [81]. Motivated by these observations, we evaluate whether E2E-AD policies… view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of our burst frame drop implementation. (a) Camera crash or frame-lost perturbations typically replace failed views with empty or invalid images. (b) In our implementation, a failed camera stream returns the most recent valid cached frame, so the simulator timestamp advances while the visual content is temporally frozen. (c) The timeline shows an illustrative example of independently sampled b… view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of partial observation perturbation. A gray rectangular mask removes part of the camera observation while leaving the external scene unchanged. The mask location is resampled over time, and the severity is controlled by the mask ratio r. 3.4 Closed-loop Evaluation Protocol We evaluate each driving model in closed-loop simulation under both clean and perturbed conditions. For each route and sce… view at source ↗
Figure 7
Figure 7. Figure 7: Main robustness analysis. Deployment-side perturbations induce heterogeneous degra￾dation patterns across models, and inference latency reveals strong closed-loop synchronization vulnerability. Ego-state sensitivity. GPS and speed perturbations show that robustness failures are not limited to camera inputs. The GPS case in [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Perturbation injection architecture. Bench2Drive-Robust injects observation-side perturbations before policy inference and action-side latency before command execution, while keeping the evaluated model unchanged. C Further Analysis [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Overall robustness overview across perturbation types and models. The heatmap reports relative Driving Score degradation with respect to each model’s clean baseline, where larger values indicate stronger degradation. The radar plots provide a complementary per-model view by showing Driving Score retention, defined as the ratio between perturbed Driving Score and clean baseline Driving Score. Together, thes… view at source ↗
Figure 10
Figure 10. Figure 10: Detailed analysis of inference latency robustness under delayed control execution. The first two plots show absolute Driving Score and relative degradation across latency settings of 0ms, 100ms, 200ms, and 500ms. The third plot compares each model’s clean baseline Driving Score with its average Driving Score under completed latency settings, with annotations showing the average relative degradation. Toget… view at source ↗
Figure 11
Figure 11. Figure 11: shows an occlusion case study for SimLingo on RouteScenario_23910_rep0, which belongs to the InterurbanActorFlow scenario in Bench2Drive. In this scenario, the ego vehicle leaves an interurban road by making a left turn while crossing a fast traffic flow [15]. This route requires several coupled driving abilities: recognizing the road geometry and intended left-turn path, perceiving fast-moving surroundin… view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative inference-latency case study on SimLingo. We compare the same ParkingExit route under: (a) clean execution, (b) 100ms inference latency, and (c) 500ms in￾ference latency. The route requires the ego vehicle to exit a parallel parking bay and merge into traffic with timely steering and acceleration. Blue points denote navigation target points, red points denote predicted path waypoints, and gree… view at source ↗
Figure 13
Figure 13. Figure 13: Qualitative GPS localization-noise case study on TCP-traj. We compare the same RouteScenario_3869_rep0 VanillaSignalizedTurnEncounterRedLight route under two set￾tings: (a) clean baseline and (b) severe GPS localization noise with σGPS = 15m. The route requires the ego vehicle to approach a signalized intersection, respect the red-light constraint, and execute the intended turn with accurate route alignme… view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative speed-noise case study on SimLingo. We compare SimLingo on RouteSce￾nario_11381_rep0, a VehicleTurningRoutePedestrian route, under: (a) clean baseline and (b) multiplicative speed noise with η ∼ N (0.2, 0.2 2 ). In the clean setting, the ego vehicle stops behind the leading vehicle at the red light, proceeds after the light turns green, and completes the left turn while yielding to the pedestr… view at source ↗
read the original abstract

Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules and open-loop planning outputs. However, deployment can also involve system-level imperfections, such as inference latency and ego-state estimation errors, which remain less studied in closed-loop E2E-AD evaluation. These imperfections can accumulate through the feedback loop and destabilize control. In this work, we present Bench2Drive-Robust, to our knowledge the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving under realistic deployment perturbations. We systematically evaluate deployment-oriented perturbations arising from three major sources: camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, and speed or odometry errors), and compute-induced control delay (model inference delay). We evaluate representative end-to-end driving methods and analyze their robustness under different perturbation severities. Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations. By establishing a closed-loop evaluation protocol and demonstrating the substantial impact of these deployment-oriented perturbations, Bench2Drive-Robust defines practical robustness problems for end-to-end autonomous driving and encourages further research on deployment-aware robust driving systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces Bench2Drive-Robust as the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving. It evaluates representative E2E methods under perturbations from three sources—camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, speed/odometry errors), and compute-induced control delay (inference latency)—across varying severities. Results indicate substantial closed-loop performance degradation, with the central claim that these deployment issues expose robustness challenges not fully captured by conventional image-level corruption evaluations.

Significance. If the empirical results hold under the stated protocol, the benchmark could usefully shift focus from perception-only corruptions to system-level deployment imperfections that accumulate in closed-loop feedback. This would define concrete, practical robustness problems for E2E-AD and support development of deployment-aware methods.

major comments (1)
  1. [Abstract] Abstract (paragraph on the three major sources and final results sentence): The claim that the observed degradations reveal 'robustness challenges that are not fully captured by conventional image-level corruption evaluations' lacks direct support. The experiments apply only the three deployment perturbations without a side-by-side re-evaluation of standard image-level corruptions inside the same Bench2Drive-Robust closed-loop simulator, protocol, models, and metrics. Without this comparison it remains unclear whether closed-loop feedback simply amplifies any corruption or whether the deployment issues are distinctively problematic.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on the three major sources and final results sentence): The claim that the observed degradations reveal 'robustness challenges that are not fully captured by conventional image-level corruption evaluations' lacks direct support. The experiments apply only the three deployment perturbations without a side-by-side re-evaluation of standard image-level corruptions inside the same Bench2Drive-Robust closed-loop simulator, protocol, models, and metrics. Without this comparison it remains unclear whether closed-loop feedback simply amplifies any corruption or whether the deployment issues are distinctively problematic.

    Authors: We appreciate the referee's observation that a direct side-by-side comparison would provide stronger evidence for the distinction. Our perturbations arise from camera-stream failures (frame drops and partial observations), ego-state estimation errors (GPS noise, speed/odometry inaccuracies), and compute-induced control delays. These are system-level deployment imperfections that introduce temporal inconsistencies and feedback instabilities in the closed loop. In contrast, conventional image-level corruptions (e.g., weather effects or pixel noise) primarily degrade input perception in open-loop or perception-focused settings. Because the perturbation sources and evaluation protocol differ categorically, the robustness challenges we identify are not equivalent to those tested by image corruption benchmarks. Nevertheless, to address the concern, we will revise the abstract to qualify the claim as exposing 'distinct robustness challenges arising from deployment imperfections' and expand the introduction and related work to explicitly contrast our device-centric, closed-loop protocol with prior image-level studies. This textual clarification will be incorporated in the revision. revision: partial

Circularity Check

0 steps flagged

Empirical benchmark with direct simulation results; no derivation or self-referential reduction

full rationale

The paper presents Bench2Drive-Robust as a new closed-loop evaluation protocol and reports performance degradation from direct application of three classes of deployment perturbations (camera-stream failures, ego-state errors, and inference delays) inside a simulator. No equations, fitted parameters, or mathematical derivations appear in the provided text; results are obtained from independent simulation runs rather than any reduction to prior fitted quantities or self-cited uniqueness theorems. The central claim that these perturbations reveal challenges 'not fully captured by conventional image-level corruption evaluations' is an empirical observation from the new benchmark, not a quantity derived by construction from the inputs or from load-bearing self-citations. The work is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark rests on domain assumptions about which deployment imperfections are most relevant; no explicit free parameters or new invented entities are described in the abstract.

axioms (1)
  • domain assumption Perturbations arising from camera-stream failures, ego-state estimation errors, and compute-induced control delay accumulate through the closed-loop feedback and destabilize control in ways not captured by image-level tests.
    Abstract states these three sources as the major deployment imperfections studied.

pith-pipeline@v0.9.0 · 5824 in / 1240 out tokens · 67923 ms · 2026-05-20T10:08:00.015939+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 7 internal anchors

  1. [1]

    Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022

    Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022. 1, 2, 3, 8, 9

  2. [2]

    Planning-oriented autonomous driving

    Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023. 2, 3, 8, 9

  3. [3]

    Vad: Vectorized scene representation for efficient autonomous driving

    Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023. 2, 3, 8, 9

  4. [4]

    Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022

    Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022. 3

  5. [5]

    Simlingo: Vision-only closed-loop autonomous driving with language-action alignment

    Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11993–12003, 2025. 2, 3, 8, 9

  6. [6]

    Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving

    Xiaosong Jia, Junqi You, Zhiyuan Zhang, and Junchi Yan. Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving. InThe Thirteenth International Con- ference on Learning Representations, 2025. URL https://openreview.net/forum?id= M42KR4W9P5. 1, 3

  7. [7]

    Benchmarking robustness of 3d object detection to common corruptions

    Yinpeng Dong, Caixin Kang, Jinlai Zhang, Zijian Zhu, Yikai Wang, Xiao Yang, Hang Su, Xingxing Wei, and Jun Zhu. Benchmarking robustness of 3d object detection to common corruptions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1022–1032, 2023. 2, 3, 6

  8. [8]

    Robo3d: Towards robust and reliable 3d perception against corruptions

    Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Robo3d: Towards robust and reliable 3d perception against corruptions. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19994–20006, 2023. 3

  9. [9]

    Robodepth: Robust out-of-distribution depth estimation under corruptions.Advances in Neural Information Processing Systems, 36:21298–21342, 2023

    Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit Cottereau, and Wei Tsang Ooi. Robodepth: Robust out-of-distribution depth estimation under corruptions.Advances in Neural Information Processing Systems, 36:21298–21342, 2023. 3 10

  10. [10]

    Benchmarking and improving bird’s eye view perception robustness in autonomous driving

    Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Benchmarking and improving bird’s eye view perception robustness in autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3878–3894, 2025. 3, 6

  11. [11]

    Is your hd map constructor reliable under sensor corruptions?Advances in Neural Information Processing Systems, 37:22441–22482, 2024

    Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, and Jing Zhang. Is your hd map constructor reliable under sensor corruptions?Advances in Neural Information Processing Systems, 37:22441–22482, 2024. 2, 3

  12. [12]

    Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024

    Wei Jiang, Lu Wang, Tianyuan Zhang, Yuwei Chen, Jian Dong, Wei Bao, Zichao Zhang, and Qiang Fu. Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024. 2, 3

  13. [13]

    RoboDriveVLM: A novel benchmark and baseline towards robust vision-language mod- els for autonomous driving.arXiv preprint arXiv:2512.01300, 2025

    Dacheng Liao, Mengshi Qi, Peng Shu, Zhining Zhang, Yuxin Lin, Liang Liu, and Huadong Ma. Robodrivevlm: A novel benchmark and baseline towards robust vision-language models for autonomous driving.arXiv preprint arXiv:2512.01300, 2025. 3, 6

  14. [14]

    Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

    Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, and Jingdong Wang. Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430, 2023. 3

  15. [15]

    Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems, 37:819–844, 2024

    Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems, 37:819–844, 2024. 2, 3, 4, 6, 8, 21, 22

  16. [16]

    End to End Learning for Self-Driving Cars

    Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016. 3

  17. [17]

    End-to-end driving via conditional imitation learning

    Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In2018 IEEE international conference on robotics and automation (ICRA), pages 4693–4700. IEEE, 2018

  18. [18]

    Learning by cheating

    Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Krähenbühl. Learning by cheating. In Conference on robot learning, pages 66–75. PMLR, 2020

  19. [19]

    Multi-modal fusion transformer for end-to-end autonomous driving

    Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi-modal fusion transformer for end-to-end autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7077–7087, 2021

  20. [20]

    Sophia Koepke, Zeynep Akata, and Andreas Geiger

    Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, and Andreas Geiger. Plant: Explainable planning transformers via object-level representations. In Conference on Robotic Learning (CoRL), 2022

  21. [21]

    Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer

    Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer. InConference on Robot Learning, pages 726–737. PMLR, 2023. 3

  22. [22]

    Sparsedrive: End-to-end autonomous driving via sparse scene representation

    Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Haoran Wu, and Sifa Zheng. Sparsedrive: End-to-end autonomous driving via sparse scene representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8795–8801. IEEE, 2025. 3

  23. [23]

    Sparsead: Sparse query-centric paradigm for efficient end-to-end autonomous driving.IEEE Transactions on Artificial Intelligence, 2025

    Runwen Zhu, Jianbo Zhao, Diankun Zhang, Guoan Wang, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, et al. Sparsead: Sparse query-centric paradigm for efficient end-to-end autonomous driving.IEEE Transactions on Artificial Intelligence, 2025

  24. [24]

    Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d

    Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. InEuropean conference on computer vision, pages 194–210. Springer, 2020

  25. [25]

    Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2020– 2036, 2024. 11

  26. [26]

    BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

    Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view.arXiv preprint arXiv:2112.11790, 2021

  27. [27]

    Bevdepth: Acquisition of reliable depth for multi-view 3d object detection

    Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 1477–1485, 2023

  28. [28]

    Bevfusion: A simple and robust lidar-camera fusion framework

    Tingting Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, and Zhi Tang. Bevfusion: A simple and robust lidar-camera fusion framework. Advances in neural information processing systems, 35:10421–10434, 2022

  29. [29]

    Detr3d: 3d object detection from multi-view images via 3d-to-2d queries

    Yue Wang, Vitor Campagnolo Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, and Justin Solomon. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on robot learning, pages 180–191. PMLR, 2022

  30. [30]

    Petr: Position embedding transfor- mation for multi-view 3d object detection

    Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. Petr: Position embedding transfor- mation for multi-view 3d object detection. InEuropean conference on computer vision, pages 531–548. Springer, 2022

  31. [31]

    Exploring object- centric temporal modeling for efficient multi-view 3d object detection

    Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, and Xiangyu Zhang. Exploring object- centric temporal modeling for efficient multi-view 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3621–3631, 2023

  32. [32]

    Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction

    Yunpeng Zhang, Zheng Zhu, and Dalong Du. Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9433–9443, 2023

  33. [33]

    Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving

    Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, and Jiwen Lu. Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21729–21740, 2023

  34. [34]

    Para-drive: Par- allelized architecture for real-time autonomous driving

    Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Par- allelized architecture for real-time autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15449–15458, 2024. 3, 5

  35. [35]

    Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving

    Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12037–12047, 2025. 3

  36. [36]

    Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving

    Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1602–1611, 2025

  37. [37]

    Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

    Lin Liu, Caiyan Jia, Guanyi Yu, Ziying Song, JunQiao Li, Feiyang Jia, Peiliang Wu, Xiaoshuai Hao, and Yadan Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

  38. [38]

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024. 3

  39. [39]

    Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving

    Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, and Yadan Luo. Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22432–22441, 2025. 3

  40. [40]

    Drivesuprim: Towards precise trajectory selection for end-to-end planning

    Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 11910–11918, 2026. 12

  41. [41]

    Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025

    Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, and Junchi Yan. Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025. 3

  42. [42]

    Lmdrive: Closed-loop end-to-end driving with large language models

    Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15120–15130,

  43. [43]

    Drivelm: Driving with graph visual question answering

    Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InEuropean conference on computer vision, pages 256–274. Springer, 2024

  44. [44]

    Dolphins: Multimodal language model for driving

    Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driving. InEuropean Conference on Computer Vision, pages 403–420. Springer, 2024

  45. [45]

    Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 9(10):8186–8193, 2024

    Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 9(10):8186–8193, 2024

  46. [46]

    Dilu: A knowledge-driven approach to autonomous driving with large language models,

    Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, and Yu Qiao. Dilu: A knowledge-driven approach to autonomous driving with large language models.arXiv preprint arXiv:2309.16292, 2023

  47. [47]

    Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation

    Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823–24834, 2025

  48. [48]

    Open- drivevla: Towards end-to-end autonomous driving with large vision language action model

    Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, V olker Tresp, and Alois Knoll. Open- drivevla: Towards end-to-end autonomous driving with large vision language action model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 13782–13790, 2026

  49. [49]

    Zewei Zhou, Tianhui Cai, Yun Zhao, Seth Z.and Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.Advances in Neural Information Processing Systems, 2025

  50. [50]

    DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

    Zhenjie Yang, Yilin Chai, Xiaosong Jia, Qifeng Li, Yuqian Shao, Xuekai Zhu, Haisheng Su, and Junchi Yan. Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving.arXiv preprint arXiv:2505.16278, 2025

  51. [51]

    A survey on vision-language-action models for autonomous driving

    Sicong Jiang, Zilin Huang, Kangan Qian, Ziang Luo, Tianze Zhu, Yang Zhong, Yihong Tang, Menglin Kong, Yunlong Wang, Siwen Jiao, et al. A survey on vision-language-action models for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4524–4536, 2025

  52. [52]

    World4drive: End-to-end autonomous driving via intention-aware physical latent world model

    Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 28632–28642, 2025

  53. [53]

    Real-ad: Towards human-like reasoning in end-to-end autonomous driving

    Yuhang Lu, Jiadong Tu, Yuexin Ma, and Xinge Zhu. Real-ad: Towards human-like reasoning in end-to-end autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27783–27793, 2025. 3

  54. [54]

    The robodrive challenge: Drive anytime anywhere in any condition.arXiv preprint arXiv:2405.08816, 2024

    Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, et al. The robodrive challenge: Drive anytime anywhere in any condition.arXiv preprint arXiv:2405.08816, 2024. 3, 6 13

  55. [55]

    Fail2Drive: Benchmarking Closed-Loop Driving Generalization

    Simon Gerstenecker, Andreas Geiger, and Katrin Renz. Fail2drive: Benchmarking closed-loop driving generalization.arXiv preprint arXiv:2604.08535, 2026. 3

  56. [56]

    Are we ready for autonomous driving? the kitti vision benchmark suite

    Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012. 3

  57. [57]

    The cityscapes dataset for semantic urban scene understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016

  58. [58]

    The apolloscape dataset for autonomous driving

    Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. The apolloscape dataset for autonomous driving. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 954–960, 2018

  59. [59]

    Argoverse: 3d tracking and forecasting with rich maps

    Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8748–8757, 2019

  60. [60]

    nuscenes: A multimodal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020

  61. [61]

    Scalability in perception for autonomous driving: Waymo open dataset

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020

  62. [62]

    Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning

    Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020

  63. [63]

    One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021

    Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, et al. One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021

  64. [64]

    Pandaset: Advanced sensor suite dataset for autonomous driving

    Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. In2021 IEEE international intelligent transportation systems conference (ITSC), pages 3095–3101. IEEE, 2021. 3

  65. [65]

    Parting with mis- conceptions about learning-based vehicle motion planning

    Daniel Dauner, Marcel Hallgarten, Andreas Geiger, and Kashyap Chitta. Parting with mis- conceptions about learning-based vehicle motion planning. InConference on Robot Learning, pages 1268–1281. PMLR, 2023. 3

  66. [66]

    Carla: An open urban driving simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 3

  67. [67]

    NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

    Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

  68. [68]

    Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024

    Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024. 14

  69. [69]

    Drivearena: A closed-loop generative simulation platform for autonomous driving

    Xuemeng Yang, Licheng Wen, Tiantian Wei, Yukai Ma, Jianbiao Mei, Xin Li, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, et al. Drivearena: A closed-loop generative simulation platform for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 26933–26943, 2025. 3

  70. [70]

    Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving

    Zhiyu Huang, Haochen Liu, and Chen Lv. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3903–3913, 2023. 3

  71. [71]

    Can users specify driving speed? bench2drive-speed: Benchmark and baselines for desired-speed conditioned autonomous driving.arXiv preprint arXiv:2603.25672, 2026

    Yuqian Shao, Xiaosong Jia, Langechuan Liu, and Junchi Yan. Can users specify driving speed? bench2drive-speed: Benchmark and baselines for desired-speed conditioned autonomous driving.arXiv preprint arXiv:2603.25672, 2026

  72. [72]

    Hydra- next: Robust closed-loop driving with open-loop training

    Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, and Jose M Alvarez. Hydra- next: Robust closed-loop driving with open-loop training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27305–27314, 2025. 3

  73. [73]

    Benchmarking neural network robustness to common corruptions and perturbations

    Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HJz6tiCqYm. 3, 6

  74. [74]

    Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming

    Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking robustness in object detection: Autonomous driving when winter is coming.arXiv preprint arXiv:1907.07484, 2019. 3

  75. [75]

    Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020

    Edward Fang. Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020. 5

  76. [76]

    Lead: The llm enhanced planning system converged with end-to-end autonomous driving,

    Yuhang Zhang, Jiaqi Liu, Chengkai Xu, Peng Hang, and Jian Sun. Lead: The llm enhanced plan- ning system converged with end-to-end autonomous driving.arXiv preprint arXiv:2507.05754,

  77. [77]

    Autonomous vehicle positioning with gps in urban canyon environments.IEEE transactions on robotics and automation, 19(1):15–25, 2003

    Youjing Cui and Shuzhi Sam Ge. Autonomous vehicle positioning with gps in urban canyon environments.IEEE transactions on robotics and automation, 19(1):15–25, 2003. 5

  78. [78]

    Accurate visual localization for automotive applications

    Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, and Trevor Darrell. Accurate visual localization for automotive applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019

  79. [79]

    Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021

    Karl Rehrl, Stefan Göttlich, Klaus Krainz, and Andreas Graser. Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021. doi: 10.3390/s21175855

  80. [80]

    Reliable urban vehicle localization under faulty satellite navigation signals.EURASIP Journal on Advances in Signal Processing, 2024(1):32, 2024

    Shubh Gupta and Grace Gao. Reliable urban vehicle localization under faulty satellite navigation signals.EURASIP Journal on Advances in Signal Processing, 2024(1):32, 2024. doi: 10.1186/ s13634-024-01150-2. 5

Showing first 80 references.