Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

Haoran Liu; Junchi Yan; Shaofeng Zhang; Xianda Guo; Xiaosong Jia; Xingjun Ma; Yanlun Peng; Yu-Gang Jiang; Zhenghao Jin; Zhiyuan Zhang

arxiv: 2605.18059 · v1 · pith:S5EPFUJ2new · submitted 2026-05-18 · 💻 cs.RO

Bench2Drive-Robust: Benchmarking Closed-Loop Autonomous Driving under Deployment Perturbations

Zhiyuan Zhang , Zhenghao Jin , Yanlun Peng , Xianda Guo , Haoran Liu , Shaofeng Zhang , Xingjun Ma , Zuxuan Wu

show 3 more authors

Junchi Yan Xiaosong Jia Yu-Gang Jiang

This is my paper

Pith reviewed 2026-05-20 10:08 UTC · model grok-4.3

classification 💻 cs.RO

keywords autonomous drivingrobustnessclosed-loop evaluationdeployment perturbationsend-to-end drivingbenchmark

0 comments

The pith

Deployment perturbations such as frame drops and GPS noise substantially degrade closed-loop autonomous driving performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Bench2Drive-Robust, a benchmark designed to test closed-loop end-to-end autonomous driving under realistic deployment issues. It focuses on three types of perturbations: camera stream problems like dropped frames, errors in estimating the vehicle's state such as noisy GPS, and delays in control caused by model computation time. These issues can build up in the driving loop and cause instability, unlike the image corruptions studied before. Showing that current methods suffer under these conditions highlights the need for more practical robustness testing in autonomous driving research.

Core claim

Bench2Drive-Robust evaluates representative end-to-end driving methods under deployment-oriented perturbations from camera-stream failures, ego-state estimation errors, and compute-induced control delays. The results demonstrate that these perturbations substantially degrade closed-loop driving performance in ways not captured by conventional image-level corruption evaluations.

What carries the argument

Bench2Drive-Robust benchmark applying systematic deployment perturbations to closed-loop end-to-end autonomous driving evaluation.

Load-bearing premise

The primary deployment imperfections for closed-loop autonomous driving are camera-stream failures, ego-state estimation errors, and compute-induced control delays.

What would settle it

Demonstrating that closed-loop performance does not degrade significantly under high-severity versions of these three perturbations, or that image corruptions account for similar levels of failure.

Figures

Figures reproduced from arXiv: 2605.18059 by Haoran Liu, Junchi Yan, Shaofeng Zhang, Xianda Guo, Xiaosong Jia, Xingjun Ma, Yanlun Peng, Yu-Gang Jiang, Zhenghao Jin, Zhiyuan Zhang, Zuxuan Wu.

**Figure 1.** Figure 1: Overview of Bench2Drive-Robust. We evaluate three categories of deployment-side failures for E2E-AD—camera-stream failures, ego-state estimation errors, and compute-induced control delay—under closed-loop driving. This differs from existing image- or perception-centric robustness benchmarks that mainly target external appearance changes. ever, existing closed-loop evaluations typically assess driving capab… view at source ↗

**Figure 2.** Figure 2: Overview of Bench2Drive-Robust. We evaluate end-to-end autonomous driving models in closed-loop simulation by injecting deployment-relevant perturbations into the sensing, ego-state, and action pipelines while keeping the evaluated policy unchanged. The benchmark covers temporal delays, observation integrity perturbations, and ego-state estimation errors, enabling controlled robustness evaluation under con… view at source ↗

**Figure 3.** Figure 3: Latency modes supported in our framework: immediate execution, dynamic real-time scheduling, and fixed-delay FIFO buffering. 3.3 Robustness Taxonomy We organize deployment-oriented perturbations by where they enter the closed-loop driving stack: camera-stream perturbations, ego-state perturbations, and compute-control perturbations. This taxonomy separates failures in visual data delivery, vehicle-state se… view at source ↗

**Figure 4.** Figure 4: Illustration of ego-state input perturbations. (a) The model receives the clean GPS reading gt. (b) GPS input noise adds Gaussian perturbations to GPS readings. (c) Speed noise independently samples a multiplicative factor ηt at each timestep and feeds v˜t = ηtvt to the policy. be affected by sensor and ego-motion reliability issues [81]. Motivated by these observations, we evaluate whether E2E-AD policies… view at source ↗

**Figure 5.** Figure 5: Illustration of our burst frame drop implementation. (a) Camera crash or frame-lost perturbations typically replace failed views with empty or invalid images. (b) In our implementation, a failed camera stream returns the most recent valid cached frame, so the simulator timestamp advances while the visual content is temporally frozen. (c) The timeline shows an illustrative example of independently sampled b… view at source ↗

**Figure 6.** Figure 6: Illustration of partial observation perturbation. A gray rectangular mask removes part of the camera observation while leaving the external scene unchanged. The mask location is resampled over time, and the severity is controlled by the mask ratio r. 3.4 Closed-loop Evaluation Protocol We evaluate each driving model in closed-loop simulation under both clean and perturbed conditions. For each route and sce… view at source ↗

**Figure 7.** Figure 7: Main robustness analysis. Deployment-side perturbations induce heterogeneous degradation patterns across models, and inference latency reveals strong closed-loop synchronization vulnerability. Ego-state sensitivity. GPS and speed perturbations show that robustness failures are not limited to camera inputs. The GPS case in [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Perturbation injection architecture. Bench2Drive-Robust injects observation-side perturbations before policy inference and action-side latency before command execution, while keeping the evaluated model unchanged. C Further Analysis [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Overall robustness overview across perturbation types and models. The heatmap reports relative Driving Score degradation with respect to each model’s clean baseline, where larger values indicate stronger degradation. The radar plots provide a complementary per-model view by showing Driving Score retention, defined as the ratio between perturbed Driving Score and clean baseline Driving Score. Together, thes… view at source ↗

**Figure 10.** Figure 10: Detailed analysis of inference latency robustness under delayed control execution. The first two plots show absolute Driving Score and relative degradation across latency settings of 0ms, 100ms, 200ms, and 500ms. The third plot compares each model’s clean baseline Driving Score with its average Driving Score under completed latency settings, with annotations showing the average relative degradation. Toget… view at source ↗

**Figure 11.** Figure 11: shows an occlusion case study for SimLingo on RouteScenario_23910_rep0, which belongs to the InterurbanActorFlow scenario in Bench2Drive. In this scenario, the ego vehicle leaves an interurban road by making a left turn while crossing a fast traffic flow [15]. This route requires several coupled driving abilities: recognizing the road geometry and intended left-turn path, perceiving fast-moving surroundin… view at source ↗

**Figure 12.** Figure 12: Qualitative inference-latency case study on SimLingo. We compare the same ParkingExit route under: (a) clean execution, (b) 100ms inference latency, and (c) 500ms inference latency. The route requires the ego vehicle to exit a parallel parking bay and merge into traffic with timely steering and acceleration. Blue points denote navigation target points, red points denote predicted path waypoints, and gree… view at source ↗

**Figure 13.** Figure 13: Qualitative GPS localization-noise case study on TCP-traj. We compare the same RouteScenario_3869_rep0 VanillaSignalizedTurnEncounterRedLight route under two settings: (a) clean baseline and (b) severe GPS localization noise with σGPS = 15m. The route requires the ego vehicle to approach a signalized intersection, respect the red-light constraint, and execute the intended turn with accurate route alignme… view at source ↗

**Figure 14.** Figure 14: Qualitative speed-noise case study on SimLingo. We compare SimLingo on RouteScenario_11381_rep0, a VehicleTurningRoutePedestrian route, under: (a) clean baseline and (b) multiplicative speed noise with η ∼ N (0.2, 0.2 2 ). In the clean setting, the ego vehicle stops behind the leading vehicle at the red light, proceeds after the light turns green, and completes the left turn while yielding to the pedestr… view at source ↗

read the original abstract

Robustness is a critical requirement for deploying autonomous driving systems in the real world. Existing robustness benchmarks for autonomous driving have made important progress in studying the effects of image-level corruptions, such as adverse weather or camera degradation, on perception modules and open-loop planning outputs. However, deployment can also involve system-level imperfections, such as inference latency and ego-state estimation errors, which remain less studied in closed-loop E2E-AD evaluation. These imperfections can accumulate through the feedback loop and destabilize control. In this work, we present Bench2Drive-Robust, to our knowledge the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving under realistic deployment perturbations. We systematically evaluate deployment-oriented perturbations arising from three major sources: camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, and speed or odometry errors), and compute-induced control delay (model inference delay). We evaluate representative end-to-end driving methods and analyze their robustness under different perturbation severities. Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations. By establishing a closed-loop evaluation protocol and demonstrating the substantial impact of these deployment-oriented perturbations, Bench2Drive-Robust defines practical robustness problems for end-to-end autonomous driving and encourages further research on deployment-aware robust driving systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bench2Drive-Robust sets up a closed-loop testbed for deployment issues like delays and sensor errors but the claim that these are distinct from image corruptions rests on missing side-by-side evidence.

read the letter

The main thing to know about this paper is that it introduces Bench2Drive-Robust, a benchmark focused on closed-loop end-to-end autonomous driving under deployment perturbations like camera stream failures, ego-state estimation errors, and compute delays. It reports that these cause notable degradation in driving performance. What the work does well is shifting attention to system-level issues that occur in actual vehicle deployment, rather than sticking to image corruptions like weather effects. They set up a systematic evaluation of representative E2E methods under varying perturbation levels, which provides concrete data on how these imperfections accumulate in the feedback loop. This is useful because closed-loop testing better reflects real-world dynamics than open-loop planning outputs. On the soft spots, the central claim that these deployment perturbations reveal robustness challenges not fully captured by conventional image-level evaluations needs more support. The abstract makes this point, but if the experiments do not include a direct side-by-side comparison—applying image corruptions within the same Bench2Drive-Robust closed-loop simulator, using the same models and metrics—then the uniqueness of the deployment issues remains asserted rather than shown. The stress-test concern is on target here; without that comparison, it's unclear if closed-loop feedback just amplifies any kind of perturbation or if these specific ones are distinct. The paper's soundness is moderate given the abstract-level description, and full details on metrics and controls would help. This paper is aimed at the autonomous driving research community, particularly those working on robust end-to-end systems and deployment-aware designs. A reader focused on benchmarks and real-world applicability will find value in the protocol and the emphasis on practical imperfections. It is worth bringing to a reading group for discussion on evaluation methods. I would recommend sending it to peer review. The benchmark idea is solid and addresses a real gap, though the authors should be asked to add comparative experiments to back up the differential claim.

Referee Report

1 major / 0 minor

Summary. The paper introduces Bench2Drive-Robust as the first device-centric robustness benchmark for closed-loop end-to-end autonomous driving. It evaluates representative E2E methods under perturbations from three sources—camera-stream failures (frame drop, partial observation), ego-state estimation errors (GPS noise, speed/odometry errors), and compute-induced control delay (inference latency)—across varying severities. Results indicate substantial closed-loop performance degradation, with the central claim that these deployment issues expose robustness challenges not fully captured by conventional image-level corruption evaluations.

Significance. If the empirical results hold under the stated protocol, the benchmark could usefully shift focus from perception-only corruptions to system-level deployment imperfections that accumulate in closed-loop feedback. This would define concrete, practical robustness problems for E2E-AD and support development of deployment-aware methods.

major comments (1)

[Abstract] Abstract (paragraph on the three major sources and final results sentence): The claim that the observed degradations reveal 'robustness challenges that are not fully captured by conventional image-level corruption evaluations' lacks direct support. The experiments apply only the three deployment perturbations without a side-by-side re-evaluation of standard image-level corruptions inside the same Bench2Drive-Robust closed-loop simulator, protocol, models, and metrics. Without this comparison it remains unclear whether closed-loop feedback simply amplifies any corruption or whether the deployment issues are distinctively problematic.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on the three major sources and final results sentence): The claim that the observed degradations reveal 'robustness challenges that are not fully captured by conventional image-level corruption evaluations' lacks direct support. The experiments apply only the three deployment perturbations without a side-by-side re-evaluation of standard image-level corruptions inside the same Bench2Drive-Robust closed-loop simulator, protocol, models, and metrics. Without this comparison it remains unclear whether closed-loop feedback simply amplifies any corruption or whether the deployment issues are distinctively problematic.

Authors: We appreciate the referee's observation that a direct side-by-side comparison would provide stronger evidence for the distinction. Our perturbations arise from camera-stream failures (frame drops and partial observations), ego-state estimation errors (GPS noise, speed/odometry inaccuracies), and compute-induced control delays. These are system-level deployment imperfections that introduce temporal inconsistencies and feedback instabilities in the closed loop. In contrast, conventional image-level corruptions (e.g., weather effects or pixel noise) primarily degrade input perception in open-loop or perception-focused settings. Because the perturbation sources and evaluation protocol differ categorically, the robustness challenges we identify are not equivalent to those tested by image corruption benchmarks. Nevertheless, to address the concern, we will revise the abstract to qualify the claim as exposing 'distinct robustness challenges arising from deployment imperfections' and expand the introduction and related work to explicitly contrast our device-centric, closed-loop protocol with prior image-level studies. This textual clarification will be incorporated in the revision. revision: partial

Circularity Check

0 steps flagged

Empirical benchmark with direct simulation results; no derivation or self-referential reduction

full rationale

The paper presents Bench2Drive-Robust as a new closed-loop evaluation protocol and reports performance degradation from direct application of three classes of deployment perturbations (camera-stream failures, ego-state errors, and inference delays) inside a simulator. No equations, fitted parameters, or mathematical derivations appear in the provided text; results are obtained from independent simulation runs rather than any reduction to prior fitted quantities or self-cited uniqueness theorems. The central claim that these perturbations reveal challenges 'not fully captured by conventional image-level corruption evaluations' is an empirical observation from the new benchmark, not a quantity derived by construction from the inputs or from load-bearing self-citations. The work is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark rests on domain assumptions about which deployment imperfections are most relevant; no explicit free parameters or new invented entities are described in the abstract.

axioms (1)

domain assumption Perturbations arising from camera-stream failures, ego-state estimation errors, and compute-induced control delay accumulate through the closed-loop feedback and destabilize control in ways not captured by image-level tests.
Abstract states these three sources as the major deployment imperfections studied.

pith-pipeline@v0.9.0 · 5824 in / 1240 out tokens · 67923 ms · 2026-05-20T10:08:00.015939+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our results show that these deployment-related perturbations can substantially degrade closed-loop driving performance, revealing robustness challenges that are not fully captured by conventional image-level corruption evaluations.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 7 internal anchors

[1]

Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022

Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022. 1, 2, 3, 8, 9

work page 2022
[2]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023. 2, 3, 8, 9

work page 2023
[3]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023. 2, 3, 8, 9

work page 2023
[4]

Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022

Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022. 3

work page 2022
[5]

Simlingo: Vision-only closed-loop autonomous driving with language-action alignment

Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11993–12003, 2025. 2, 3, 8, 9

work page 2025
[6]

Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving

Xiaosong Jia, Junqi You, Zhiyuan Zhang, and Junchi Yan. Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving. InThe Thirteenth International Con- ference on Learning Representations, 2025. URL https://openreview.net/forum?id= M42KR4W9P5. 1, 3

work page 2025
[7]

Benchmarking robustness of 3d object detection to common corruptions

Yinpeng Dong, Caixin Kang, Jinlai Zhang, Zijian Zhu, Yikai Wang, Xiao Yang, Hang Su, Xingxing Wei, and Jun Zhu. Benchmarking robustness of 3d object detection to common corruptions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1022–1032, 2023. 2, 3, 6

work page 2023
[8]

Robo3d: Towards robust and reliable 3d perception against corruptions

Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Robo3d: Towards robust and reliable 3d perception against corruptions. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19994–20006, 2023. 3

work page 2023
[9]

Robodepth: Robust out-of-distribution depth estimation under corruptions.Advances in Neural Information Processing Systems, 36:21298–21342, 2023

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit Cottereau, and Wei Tsang Ooi. Robodepth: Robust out-of-distribution depth estimation under corruptions.Advances in Neural Information Processing Systems, 36:21298–21342, 2023. 3 10

work page 2023
[10]

Benchmarking and improving bird’s eye view perception robustness in autonomous driving

Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Benchmarking and improving bird’s eye view perception robustness in autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3878–3894, 2025. 3, 6

work page 2025
[11]

Is your hd map constructor reliable under sensor corruptions?Advances in Neural Information Processing Systems, 37:22441–22482, 2024

Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, and Jing Zhang. Is your hd map constructor reliable under sensor corruptions?Advances in Neural Information Processing Systems, 37:22441–22482, 2024. 2, 3

work page 2024
[12]

Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024

Wei Jiang, Lu Wang, Tianyuan Zhang, Yuwei Chen, Jian Dong, Wei Bao, Zichao Zhang, and Qiang Fu. Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024. 2, 3

work page 2024
[13]

RoboDriveVLM: A novel benchmark and baseline towards robust vision-language mod- els for autonomous driving.arXiv preprint arXiv:2512.01300, 2025

Dacheng Liao, Mengshi Qi, Peng Shu, Zhining Zhang, Yuxin Lin, Liang Liu, and Huadong Ma. Robodrivevlm: A novel benchmark and baseline towards robust vision-language models for autonomous driving.arXiv preprint arXiv:2512.01300, 2025. 3, 6

work page arXiv 2025
[14]

Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, and Jingdong Wang. Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems, 37:819–844, 2024

Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems, 37:819–844, 2024. 2, 3, 4, 6, 8, 21, 22

work page 2024
[16]

End to End Learning for Self-Driving Cars

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016. 3

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

End-to-end driving via conditional imitation learning

Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In2018 IEEE international conference on robotics and automation (ICRA), pages 4693–4700. IEEE, 2018

work page 2018
[18]

Learning by cheating

Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Krähenbühl. Learning by cheating. In Conference on robot learning, pages 66–75. PMLR, 2020

work page 2020
[19]

Multi-modal fusion transformer for end-to-end autonomous driving

Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi-modal fusion transformer for end-to-end autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7077–7087, 2021

work page 2021
[20]

Sophia Koepke, Zeynep Akata, and Andreas Geiger

Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, and Andreas Geiger. Plant: Explainable planning transformers via object-level representations. In Conference on Robotic Learning (CoRL), 2022

work page 2022
[21]

Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer

Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer. InConference on Robot Learning, pages 726–737. PMLR, 2023. 3

work page 2023
[22]

Sparsedrive: End-to-end autonomous driving via sparse scene representation

Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Haoran Wu, and Sifa Zheng. Sparsedrive: End-to-end autonomous driving via sparse scene representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8795–8801. IEEE, 2025. 3

work page 2025
[23]

Sparsead: Sparse query-centric paradigm for efficient end-to-end autonomous driving.IEEE Transactions on Artificial Intelligence, 2025

Runwen Zhu, Jianbo Zhao, Diankun Zhang, Guoan Wang, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, et al. Sparsead: Sparse query-centric paradigm for efficient end-to-end autonomous driving.IEEE Transactions on Artificial Intelligence, 2025

work page 2025
[24]

Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d

Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. InEuropean conference on computer vision, pages 194–210. Springer, 2020

work page 2020
[25]

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2020– 2036, 2024. 11

work page 2020
[26]

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view.arXiv preprint arXiv:2112.11790, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[27]

Bevdepth: Acquisition of reliable depth for multi-view 3d object detection

Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 1477–1485, 2023

work page 2023
[28]

Bevfusion: A simple and robust lidar-camera fusion framework

Tingting Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, and Zhi Tang. Bevfusion: A simple and robust lidar-camera fusion framework. Advances in neural information processing systems, 35:10421–10434, 2022

work page 2022
[29]

Detr3d: 3d object detection from multi-view images via 3d-to-2d queries

Yue Wang, Vitor Campagnolo Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, and Justin Solomon. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on robot learning, pages 180–191. PMLR, 2022

work page 2022
[30]

Petr: Position embedding transfor- mation for multi-view 3d object detection

Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. Petr: Position embedding transfor- mation for multi-view 3d object detection. InEuropean conference on computer vision, pages 531–548. Springer, 2022

work page 2022
[31]

Exploring object- centric temporal modeling for efficient multi-view 3d object detection

Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, and Xiangyu Zhang. Exploring object- centric temporal modeling for efficient multi-view 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3621–3631, 2023

work page 2023
[32]

Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction

Yunpeng Zhang, Zheng Zhu, and Dalong Du. Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9433–9443, 2023

work page 2023
[33]

Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving

Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, and Jiwen Lu. Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21729–21740, 2023

work page 2023
[34]

Para-drive: Par- allelized architecture for real-time autonomous driving

Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Par- allelized architecture for real-time autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15449–15458, 2024. 3, 5

work page 2024
[35]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving

Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12037–12047, 2025. 3

work page 2025
[36]

Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving

Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1602–1611, 2025

work page 2025
[37]

Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

Lin Liu, Caiyan Jia, Guanyi Yu, Ziying Song, JunQiao Li, Feiyang Jia, Peiliang Wu, Xiaoshuai Hao, and Yadan Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

work page arXiv 2025
[38]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving

Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, and Yadan Luo. Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22432–22441, 2025. 3

work page 2025
[40]

Drivesuprim: Towards precise trajectory selection for end-to-end planning

Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 11910–11918, 2026. 12

work page 2026
[41]

Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025

Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, and Junchi Yan. Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025. 3

work page arXiv 2025
[42]

Lmdrive: Closed-loop end-to-end driving with large language models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15120–15130,

work page
[43]

Drivelm: Driving with graph visual question answering

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InEuropean conference on computer vision, pages 256–274. Springer, 2024

work page 2024
[44]

Dolphins: Multimodal language model for driving

Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driving. InEuropean Conference on Computer Vision, pages 403–420. Springer, 2024

work page 2024
[45]

Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 9(10):8186–8193, 2024

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 9(10):8186–8193, 2024

work page 2024
[46]

Dilu: A knowledge-driven approach to autonomous driving with large language models,

Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, and Yu Qiao. Dilu: A knowledge-driven approach to autonomous driving with large language models.arXiv preprint arXiv:2309.16292, 2023

work page arXiv 2023
[47]

Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation

Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823–24834, 2025

work page 2025
[48]

Open- drivevla: Towards end-to-end autonomous driving with large vision language action model

Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, V olker Tresp, and Alois Knoll. Open- drivevla: Towards end-to-end autonomous driving with large vision language action model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 13782–13790, 2026

work page 2026
[49]

Zewei Zhou, Tianhui Cai, Yun Zhao, Seth Z.and Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.Advances in Neural Information Processing Systems, 2025

work page 2025
[50]

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Zhenjie Yang, Yilin Chai, Xiaosong Jia, Qifeng Li, Yuqian Shao, Xuekai Zhu, Haisheng Su, and Junchi Yan. Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving.arXiv preprint arXiv:2505.16278, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[51]

A survey on vision-language-action models for autonomous driving

Sicong Jiang, Zilin Huang, Kangan Qian, Ziang Luo, Tianze Zhu, Yang Zhong, Yihong Tang, Menglin Kong, Yunlong Wang, Siwen Jiao, et al. A survey on vision-language-action models for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4524–4536, 2025

work page 2025
[52]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model

Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 28632–28642, 2025

work page 2025
[53]

Real-ad: Towards human-like reasoning in end-to-end autonomous driving

Yuhang Lu, Jiadong Tu, Yuexin Ma, and Xinge Zhu. Real-ad: Towards human-like reasoning in end-to-end autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27783–27793, 2025. 3

work page 2025
[54]

The robodrive challenge: Drive anytime anywhere in any condition.arXiv preprint arXiv:2405.08816, 2024

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, et al. The robodrive challenge: Drive anytime anywhere in any condition.arXiv preprint arXiv:2405.08816, 2024. 3, 6 13

work page arXiv 2024
[55]

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Simon Gerstenecker, Andreas Geiger, and Katrin Renz. Fail2drive: Benchmarking closed-loop driving generalization.arXiv preprint arXiv:2604.08535, 2026. 3

work page internal anchor Pith review Pith/arXiv arXiv 2026
[56]

Are we ready for autonomous driving? the kitti vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012. 3

work page 2012
[57]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016

work page 2016
[58]

The apolloscape dataset for autonomous driving

Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. The apolloscape dataset for autonomous driving. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 954–960, 2018

work page 2018
[59]

Argoverse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8748–8757, 2019

work page 2019
[60]

nuscenes: A multimodal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020

work page 2020
[61]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020

work page 2020
[62]

Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning

Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020

work page 2020
[63]

One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021

Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, et al. One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021

work page arXiv 2021
[64]

Pandaset: Advanced sensor suite dataset for autonomous driving

Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. In2021 IEEE international intelligent transportation systems conference (ITSC), pages 3095–3101. IEEE, 2021. 3

work page 2021
[65]

Parting with mis- conceptions about learning-based vehicle motion planning

Daniel Dauner, Marcel Hallgarten, Andreas Geiger, and Kashyap Chitta. Parting with mis- conceptions about learning-based vehicle motion planning. InConference on Robot Learning, pages 1268–1281. PMLR, 2023. 3

work page 2023
[66]

Carla: An open urban driving simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 3

work page 2017
[67]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[68]

Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024. 14

work page 2024
[69]

Drivearena: A closed-loop generative simulation platform for autonomous driving

Xuemeng Yang, Licheng Wen, Tiantian Wei, Yukai Ma, Jianbiao Mei, Xin Li, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, et al. Drivearena: A closed-loop generative simulation platform for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 26933–26943, 2025. 3

work page 2025
[70]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving

Zhiyu Huang, Haochen Liu, and Chen Lv. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3903–3913, 2023. 3

work page 2023
[71]

Can users specify driving speed? bench2drive-speed: Benchmark and baselines for desired-speed conditioned autonomous driving.arXiv preprint arXiv:2603.25672, 2026

Yuqian Shao, Xiaosong Jia, Langechuan Liu, and Junchi Yan. Can users specify driving speed? bench2drive-speed: Benchmark and baselines for desired-speed conditioned autonomous driving.arXiv preprint arXiv:2603.25672, 2026

work page arXiv 2026
[72]

Hydra- next: Robust closed-loop driving with open-loop training

Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, and Jose M Alvarez. Hydra- next: Robust closed-loop driving with open-loop training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27305–27314, 2025. 3

work page 2025
[73]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HJz6tiCqYm. 3, 6

work page 2019
[74]

Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming

Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking robustness in object detection: Autonomous driving when winter is coming.arXiv preprint arXiv:1907.07484, 2019. 3

work page arXiv 1907
[75]

Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020

Edward Fang. Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020. 5

work page 2020
[76]

Lead: The llm enhanced planning system converged with end-to-end autonomous driving,

Yuhang Zhang, Jiaqi Liu, Chengkai Xu, Peng Hang, and Jian Sun. Lead: The llm enhanced plan- ning system converged with end-to-end autonomous driving.arXiv preprint arXiv:2507.05754,

work page arXiv
[77]

Autonomous vehicle positioning with gps in urban canyon environments.IEEE transactions on robotics and automation, 19(1):15–25, 2003

Youjing Cui and Shuzhi Sam Ge. Autonomous vehicle positioning with gps in urban canyon environments.IEEE transactions on robotics and automation, 19(1):15–25, 2003. 5

work page 2003
[78]

Accurate visual localization for automotive applications

Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, and Trevor Darrell. Accurate visual localization for automotive applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019

work page 2019
[79]

Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021

Karl Rehrl, Stefan Göttlich, Klaus Krainz, and Andreas Graser. Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021. doi: 10.3390/s21175855

work page doi:10.3390/s21175855 2021
[80]

Reliable urban vehicle localization under faulty satellite navigation signals.EURASIP Journal on Advances in Signal Processing, 2024(1):32, 2024

Shubh Gupta and Grace Gao. Reliable urban vehicle localization under faulty satellite navigation signals.EURASIP Journal on Advances in Signal Processing, 2024(1):32, 2024. doi: 10.1186/ s13634-024-01150-2. 5

work page 2024

Showing first 80 references.

[1] [1]

Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022

Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, and Yu Qiao. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline.Advances in Neural Information Processing Systems, 35:6119–6132, 2022. 1, 2, 3, 8, 9

work page 2022

[2] [2]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17853–17862, 2023. 2, 3, 8, 9

work page 2023

[3] [3]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8350, 2023. 2, 3, 8, 9

work page 2023

[4] [4]

Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022

Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving.IEEE transactions on pattern analysis and machine intelligence, 45(11):12878–12895, 2022. 3

work page 2022

[5] [5]

Simlingo: Vision-only closed-loop autonomous driving with language-action alignment

Katrin Renz, Long Chen, Elahe Arani, and Oleg Sinavski. Simlingo: Vision-only closed-loop autonomous driving with language-action alignment. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11993–12003, 2025. 2, 3, 8, 9

work page 2025

[6] [6]

Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving

Xiaosong Jia, Junqi You, Zhiyuan Zhang, and Junchi Yan. Drivetransformer: Unified trans- former for scalable end-to-end autonomous driving. InThe Thirteenth International Con- ference on Learning Representations, 2025. URL https://openreview.net/forum?id= M42KR4W9P5. 1, 3

work page 2025

[7] [7]

Benchmarking robustness of 3d object detection to common corruptions

Yinpeng Dong, Caixin Kang, Jinlai Zhang, Zijian Zhu, Yikai Wang, Xiao Yang, Hang Su, Xingxing Wei, and Jun Zhu. Benchmarking robustness of 3d object detection to common corruptions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1022–1032, 2023. 2, 3, 6

work page 2023

[8] [8]

Robo3d: Towards robust and reliable 3d perception against corruptions

Lingdong Kong, Youquan Liu, Xin Li, Runnan Chen, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Robo3d: Towards robust and reliable 3d perception against corruptions. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19994–20006, 2023. 3

work page 2023

[9] [9]

Robodepth: Robust out-of-distribution depth estimation under corruptions.Advances in Neural Information Processing Systems, 36:21298–21342, 2023

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit Cottereau, and Wei Tsang Ooi. Robodepth: Robust out-of-distribution depth estimation under corruptions.Advances in Neural Information Processing Systems, 36:21298–21342, 2023. 3 10

work page 2023

[10] [10]

Benchmarking and improving bird’s eye view perception robustness in autonomous driving

Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, and Ziwei Liu. Benchmarking and improving bird’s eye view perception robustness in autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5):3878–3894, 2025. 3, 6

work page 2025

[11] [11]

Is your hd map constructor reliable under sensor corruptions?Advances in Neural Information Processing Systems, 37:22441–22482, 2024

Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, and Jing Zhang. Is your hd map constructor reliable under sensor corruptions?Advances in Neural Information Processing Systems, 37:22441–22482, 2024. 2, 3

work page 2024

[12] [12]

Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024

Wei Jiang, Lu Wang, Tianyuan Zhang, Yuwei Chen, Jian Dong, Wei Bao, Zichao Zhang, and Qiang Fu. Robuste2e: Exploring the robustness of end-to-end autonomous driving.Electronics, 13(16):3299, 2024. 2, 3

work page 2024

[13] [13]

RoboDriveVLM: A novel benchmark and baseline towards robust vision-language mod- els for autonomous driving.arXiv preprint arXiv:2512.01300, 2025

Dacheng Liao, Mengshi Qi, Peng Shu, Zhining Zhang, Yuxin Lin, Liang Liu, and Huadong Ma. Robodrivevlm: A novel benchmark and baseline towards robust vision-language models for autonomous driving.arXiv preprint arXiv:2512.01300, 2025. 3, 6

work page arXiv 2025

[14] [14]

Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, and Jingdong Wang. Rethinking the open-loop evaluation of end-to-end autonomous driving in nuscenes.arXiv preprint arXiv:2305.10430, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems, 37:819–844, 2024

Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability benchmarking of closed-loop end-to-end autonomous driving.Advances in Neural Information Processing Systems, 37:819–844, 2024. 2, 3, 4, 6, 8, 21, 22

work page 2024

[16] [16]

End to End Learning for Self-Driving Cars

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars.arXiv preprint arXiv:1604.07316, 2016. 3

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

End-to-end driving via conditional imitation learning

Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, and Alexey Dosovitskiy. End-to-end driving via conditional imitation learning. In2018 IEEE international conference on robotics and automation (ICRA), pages 4693–4700. IEEE, 2018

work page 2018

[18] [18]

Learning by cheating

Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Krähenbühl. Learning by cheating. In Conference on robot learning, pages 66–75. PMLR, 2020

work page 2020

[19] [19]

Multi-modal fusion transformer for end-to-end autonomous driving

Aditya Prakash, Kashyap Chitta, and Andreas Geiger. Multi-modal fusion transformer for end-to-end autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7077–7087, 2021

work page 2021

[20] [20]

Sophia Koepke, Zeynep Akata, and Andreas Geiger

Katrin Renz, Kashyap Chitta, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata, and Andreas Geiger. Plant: Explainable planning transformers via object-level representations. In Conference on Robotic Learning (CoRL), 2022

work page 2022

[21] [21]

Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer

Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. Safety-enhanced au- tonomous driving using interpretable sensor fusion transformer. InConference on Robot Learning, pages 726–737. PMLR, 2023. 3

work page 2023

[22] [22]

Sparsedrive: End-to-end autonomous driving via sparse scene representation

Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Haoran Wu, and Sifa Zheng. Sparsedrive: End-to-end autonomous driving via sparse scene representation. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 8795–8801. IEEE, 2025. 3

work page 2025

[23] [23]

Sparsead: Sparse query-centric paradigm for efficient end-to-end autonomous driving.IEEE Transactions on Artificial Intelligence, 2025

Runwen Zhu, Jianbo Zhao, Diankun Zhang, Guoan Wang, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, et al. Sparsead: Sparse query-centric paradigm for efficient end-to-end autonomous driving.IEEE Transactions on Artificial Intelligence, 2025

work page 2025

[24] [24]

Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d

Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. InEuropean conference on computer vision, pages 194–210. Springer, 2020

work page 2020

[25] [25]

Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. Bevformer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2020– 2036, 2024. 11

work page 2020

[26] [26]

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Junjie Huang, Guan Huang, Zheng Zhu, Yun Ye, and Dalong Du. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view.arXiv preprint arXiv:2112.11790, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[27] [27]

Bevdepth: Acquisition of reliable depth for multi-view 3d object detection

Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 1477–1485, 2023

work page 2023

[28] [28]

Bevfusion: A simple and robust lidar-camera fusion framework

Tingting Liang, Hongwei Xie, Kaicheng Yu, Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Tao Tang, Bing Wang, and Zhi Tang. Bevfusion: A simple and robust lidar-camera fusion framework. Advances in neural information processing systems, 35:10421–10434, 2022

work page 2022

[29] [29]

Detr3d: 3d object detection from multi-view images via 3d-to-2d queries

Yue Wang, Vitor Campagnolo Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, and Justin Solomon. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on robot learning, pages 180–191. PMLR, 2022

work page 2022

[30] [30]

Petr: Position embedding transfor- mation for multi-view 3d object detection

Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. Petr: Position embedding transfor- mation for multi-view 3d object detection. InEuropean conference on computer vision, pages 531–548. Springer, 2022

work page 2022

[31] [31]

Exploring object- centric temporal modeling for efficient multi-view 3d object detection

Shihao Wang, Yingfei Liu, Tiancai Wang, Ying Li, and Xiangyu Zhang. Exploring object- centric temporal modeling for efficient multi-view 3d object detection. InProceedings of the IEEE/CVF international conference on computer vision, pages 3621–3631, 2023

work page 2023

[32] [32]

Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction

Yunpeng Zhang, Zheng Zhu, and Dalong Du. Occformer: Dual-path transformer for vision- based 3d semantic occupancy prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9433–9443, 2023

work page 2023

[33] [33]

Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving

Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, and Jiwen Lu. Surroundocc: Multi-camera 3d occupancy prediction for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21729–21740, 2023

work page 2023

[34] [34]

Para-drive: Par- allelized architecture for real-time autonomous driving

Xinshuo Weng, Boris Ivanovic, Yan Wang, Yue Wang, and Marco Pavone. Para-drive: Par- allelized architecture for real-time autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15449–15458, 2024. 3, 5

work page 2024

[35] [35]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving

Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12037–12047, 2025. 3

work page 2025

[36] [36]

Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving

Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, and Wei Yin. Goalflow: Goal-driven flow matching for multimodal trajectories generation in end- to-end autonomous driving. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1602–1611, 2025

work page 2025

[37] [37]

Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

Lin Liu, Caiyan Jia, Guanyi Yu, Ziying Song, JunQiao Li, Feiyang Jia, Peiliang Wu, Xiaoshuai Hao, and Yadan Luo. Guideflow: Constraint-guided flow matching for planning in end-to-end autonomous driving.arXiv preprint arXiv:2511.18729, 2025

work page arXiv 2025

[38] [38]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, et al. Hydra-mdp: End-to-end multimodal planning with multi-target hydra-distillation.arXiv preprint arXiv:2406.06978, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving

Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, and Yadan Luo. Don’t shake the wheel: Momentum-aware planning in end-to-end autonomous driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22432–22441, 2025. 3

work page 2025

[40] [40]

Drivesuprim: Towards precise trajectory selection for end-to-end planning

Wenhao Yao, Zhenxin Li, Shiyi Lan, Zi Wang, Xinglong Sun, Jose M Alvarez, and Zuxuan Wu. Drivesuprim: Towards precise trajectory selection for end-to-end planning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 11910–11918, 2026. 12

work page 2026

[41] [41]

Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025

Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, and Junchi Yan. Raw2drive: Reinforcement learning with aligned world models for end-to-end autonomous driving (in carla v2).arXiv preprint arXiv:2505.16394, 2025. 3

work page arXiv 2025

[42] [42]

Lmdrive: Closed-loop end-to-end driving with large language models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hong- sheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15120–15130,

work page

[43] [43]

Drivelm: Driving with graph visual question answering

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Jens Beißwenger, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual question answering. InEuropean conference on computer vision, pages 256–274. Springer, 2024

work page 2024

[44] [44]

Dolphins: Multimodal language model for driving

Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driving. InEuropean Conference on Computer Vision, pages 403–420. Springer, 2024

work page 2024

[45] [45]

Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 9(10):8186–8193, 2024

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model.IEEE Robotics and Automation Letters, 9(10):8186–8193, 2024

work page 2024

[46] [46]

Dilu: A knowledge-driven approach to autonomous driving with large language models,

Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, and Yu Qiao. Dilu: A knowledge-driven approach to autonomous driving with large language models.arXiv preprint arXiv:2309.16292, 2023

work page arXiv 2023

[47] [47]

Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation

Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to-end autonomous driving framework by vision-language instructed action generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 24823–24834, 2025

work page 2025

[48] [48]

Open- drivevla: Towards end-to-end autonomous driving with large vision language action model

Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, V olker Tresp, and Alois Knoll. Open- drivevla: Towards end-to-end autonomous driving with large vision language action model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 13782–13790, 2026

work page 2026

[49] [49]

Zewei Zhou, Tianhui Cai, Yun Zhao, Seth Z.and Zhang, Zhiyu Huang, Bolei Zhou, and Jiaqi Ma. Autovla: A vision-language-action model for end-to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning.Advances in Neural Information Processing Systems, 2025

work page 2025

[50] [50]

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

Zhenjie Yang, Yilin Chai, Xiaosong Jia, Qifeng Li, Yuqian Shao, Xuekai Zhu, Haisheng Su, and Junchi Yan. Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving.arXiv preprint arXiv:2505.16278, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[51] [51]

A survey on vision-language-action models for autonomous driving

Sicong Jiang, Zilin Huang, Kangan Qian, Ziang Luo, Tianze Zhu, Yang Zhong, Yihong Tang, Menglin Kong, Yunlong Wang, Siwen Jiao, et al. A survey on vision-language-action models for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4524–4536, 2025

work page 2025

[52] [52]

World4drive: End-to-end autonomous driving via intention-aware physical latent world model

Yupeng Zheng, Pengxuan Yang, Zebin Xing, Qichao Zhang, Yuhang Zheng, Yinfeng Gao, Pengfei Li, Teng Zhang, Zhongpu Xia, Peng Jia, et al. World4drive: End-to-end autonomous driving via intention-aware physical latent world model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 28632–28642, 2025

work page 2025

[53] [53]

Real-ad: Towards human-like reasoning in end-to-end autonomous driving

Yuhang Lu, Jiadong Tu, Yuexin Ma, and Xinge Zhu. Real-ad: Towards human-like reasoning in end-to-end autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27783–27793, 2025. 3

work page 2025

[54] [54]

The robodrive challenge: Drive anytime anywhere in any condition.arXiv preprint arXiv:2405.08816, 2024

Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, et al. The robodrive challenge: Drive anytime anywhere in any condition.arXiv preprint arXiv:2405.08816, 2024. 3, 6 13

work page arXiv 2024

[55] [55]

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Simon Gerstenecker, Andreas Geiger, and Katrin Renz. Fail2drive: Benchmarking closed-loop driving generalization.arXiv preprint arXiv:2604.08535, 2026. 3

work page internal anchor Pith review Pith/arXiv arXiv 2026

[56] [56]

Are we ready for autonomous driving? the kitti vision benchmark suite

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012. 3

work page 2012

[57] [57]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016

work page 2016

[58] [58]

The apolloscape dataset for autonomous driving

Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. The apolloscape dataset for autonomous driving. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 954–960, 2018

work page 2018

[59] [59]

Argoverse: 3d tracking and forecasting with rich maps

Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, et al. Argoverse: 3d tracking and forecasting with rich maps. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8748–8757, 2019

work page 2019

[60] [60]

nuscenes: A multimodal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020

work page 2020

[61] [61]

Scalability in perception for autonomous driving: Waymo open dataset

Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020

work page 2020

[62] [62]

Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning

Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heterogeneous mul- titask learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020

work page 2020

[63] [63]

One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021

Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, et al. One million scenes for autonomous driving: Once dataset.arXiv preprint arXiv:2106.11037, 2021

work page arXiv 2021

[64] [64]

Pandaset: Advanced sensor suite dataset for autonomous driving

Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. In2021 IEEE international intelligent transportation systems conference (ITSC), pages 3095–3101. IEEE, 2021. 3

work page 2021

[65] [65]

Parting with mis- conceptions about learning-based vehicle motion planning

Daniel Dauner, Marcel Hallgarten, Andreas Geiger, and Kashyap Chitta. Parting with mis- conceptions about learning-based vehicle motion planning. InConference on Robot Learning, pages 1268–1281. PMLR, 2023. 3

work page 2023

[66] [66]

Carla: An open urban driving simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. InConference on robot learning, pages 1–16. PMLR, 2017. 3

work page 2017

[67] [67]

NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles

Holger Caesar, Juraj Kabzan, Kok Seang Tan, Whye Kit Fong, Eric Wolff, Alex Lang, Luke Fletcher, Oscar Beijbom, and Sammy Omari. nuplan: A closed-loop ml-based planning benchmark for autonomous vehicles.arXiv preprint arXiv:2106.11810, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[68] [68]

Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non- reactive autonomous vehicle simulation and benchmarking.Advances in Neural Information Processing Systems, 37:28706–28719, 2024. 14

work page 2024

[69] [69]

Drivearena: A closed-loop generative simulation platform for autonomous driving

Xuemeng Yang, Licheng Wen, Tiantian Wei, Yukai Ma, Jianbiao Mei, Xin Li, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, et al. Drivearena: A closed-loop generative simulation platform for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 26933–26943, 2025. 3

work page 2025

[70] [70]

Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving

Zhiyu Huang, Haochen Liu, and Chen Lv. Gameformer: Game-theoretic modeling and learning of transformer-based interactive prediction and planning for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3903–3913, 2023. 3

work page 2023

[71] [71]

Can users specify driving speed? bench2drive-speed: Benchmark and baselines for desired-speed conditioned autonomous driving.arXiv preprint arXiv:2603.25672, 2026

Yuqian Shao, Xiaosong Jia, Langechuan Liu, and Junchi Yan. Can users specify driving speed? bench2drive-speed: Benchmark and baselines for desired-speed conditioned autonomous driving.arXiv preprint arXiv:2603.25672, 2026

work page arXiv 2026

[72] [72]

Hydra- next: Robust closed-loop driving with open-loop training

Zhenxin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Zuxuan Wu, and Jose M Alvarez. Hydra- next: Robust closed-loop driving with open-loop training. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27305–27314, 2025. 3

work page 2025

[73] [73]

Benchmarking neural network robustness to common corruptions and perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=HJz6tiCqYm. 3, 6

work page 2019

[74] [74]

Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming

Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking robustness in object detection: Autonomous driving when winter is coming.arXiv preprint arXiv:1907.07484, 2019. 3

work page arXiv 1907

[75] [75]

Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020

Edward Fang. Dynamic deadlines in motion planning for autonomous driving systems.UC Berkeley, 2020. 5

work page 2020

[76] [76]

Lead: The llm enhanced planning system converged with end-to-end autonomous driving,

Yuhang Zhang, Jiaqi Liu, Chengkai Xu, Peng Hang, and Jian Sun. Lead: The llm enhanced plan- ning system converged with end-to-end autonomous driving.arXiv preprint arXiv:2507.05754,

work page arXiv

[77] [77]

Autonomous vehicle positioning with gps in urban canyon environments.IEEE transactions on robotics and automation, 19(1):15–25, 2003

Youjing Cui and Shuzhi Sam Ge. Autonomous vehicle positioning with gps in urban canyon environments.IEEE transactions on robotics and automation, 19(1):15–25, 2003. 5

work page 2003

[78] [78]

Accurate visual localization for automotive applications

Eli Brosh, Matan Friedmann, Ilan Kadar, Lev Yitzhak Lavy, Elad Levi, Shmuel Rippa, Yair Lempert, Bruno Fernandez-Ruiz, Roei Herzig, and Trevor Darrell. Accurate visual localization for automotive applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019

work page 2019

[79] [79]

Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021

Karl Rehrl, Stefan Göttlich, Klaus Krainz, and Andreas Graser. Evaluating localization accuracy of automated driving systems.Sensors, 21(17):5855, 2021. doi: 10.3390/s21175855

work page doi:10.3390/s21175855 2021

[80] [80]

Reliable urban vehicle localization under faulty satellite navigation signals.EURASIP Journal on Advances in Signal Processing, 2024(1):32, 2024

Shubh Gupta and Grace Gao. Reliable urban vehicle localization under faulty satellite navigation signals.EURASIP Journal on Advances in Signal Processing, 2024(1):32, 2024. doi: 10.1186/ s13634-024-01150-2. 5

work page 2024