pith. machine review for the scientific record. sign in

arxiv: 2605.00531 · v1 · submitted 2026-05-01 · 💻 cs.SE

Recognition: unknown

From Research to Practice: An Interactive Rapid Review of Autonomous Driving System Testing in Industry

Authors on Pith no claims yet

Pith reviewed 2026-05-09 18:49 UTC · model grok-4.3

classification 💻 cs.SE
keywords autonomous driving systemstesting challengesend-to-end ADSpractitioner perspectivesresearch-practice gapscenario generationrapid reviewindustry applicability
0
0 comments X

The pith

An interactive review with industry practitioners reveals that research on testing end-to-end autonomous driving systems often overlooks practical constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Twenty-one practitioners from a leading automotive company identified twelve key challenges in testing autonomous driving systems. They ranked approaches to and completeness of testing for end-to-end systems as the top priorities. A review of seventeen relevant research papers shows that most efforts target generating critical testing scenarios, yet these methods frequently lack the context and applicability needed in real industrial settings. The work demonstrates a clear disconnect between academic advances and what practitioners require, pointing to the necessity of more tailored solutions to make testing research useful in practice.

Core claim

Through an interactive rapid review involving 21 practitioners, the study identifies 12 challenges in ADS testing and prioritizes two related to end-to-end systems. Analysis of 17 papers reveals that while research emphasizes generating critical scenarios, these approaches often fail to account for practical constraints like regulatory requirements, legacy systems, and specific operational contexts in industry. The core finding is the persistent disconnect, calling for more industry-relevant research.

What carries the argument

The interactive rapid review process that integrates practitioner input to identify challenges and evaluate research applicability on ADS testing.

Load-bearing premise

The assumption that the views of twenty-one practitioners from one automotive company and the selection of seventeen studies adequately represent the broader industry's testing challenges and the full research landscape.

What would settle it

A survey of practitioners from additional companies revealing different top priorities for ADS testing challenges, or a larger review finding that many of the seventeen studies are already adapted for industrial use in practice.

Figures

Figures reproduced from arXiv: 2605.00531 by Ali Nouri, Federica Sarro, H{\aa}kan Sivencrona, Qunying Song.

Figure 1
Figure 1. Figure 1: An overview of our research method, including the five key steps along with their main elements and outputs. view at source ↗
read the original abstract

Autonomous driving systems (ADS) are increasingly deployed in real traffic, yet testing remains fundamentally challenging due to open environments, complex scenarios, and the lack of established processes and metrics. Despite extensive research, a gap persists between academic advances and their applicability in industrial practice. To address this, we conduct an interactive rapid review in collaboration with 21 practitioners from a leading automotive company. Practitioners identified 12 key challenges in ADS testing, and prioritised two as the most critical issues, namely approaches to and completeness of testing for End-to-End (E2E) ADS. We analyzed 17 research studies relevant to these two challenges, most of which focus on generating critical testing scenarios, and subsequently assessed their relevance and applicability in practice. Our study provides the first practitioner-driven review and evaluation of current ADS testing research, reveals practical challenges in ADS testing, offers rapid insights for practitioners, and highlights the need for more context-aware, industry-relevant solutions to bridge the gap between research and practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper conducts an interactive rapid review of autonomous driving system (ADS) testing in collaboration with 21 practitioners from one leading automotive company. Practitioners identified 12 challenges and prioritized two on approaches to and completeness of testing for End-to-End (E2E) ADS; the authors then analyzed 17 relevant research studies (mostly on scenario generation) and assessed their practical relevance and applicability, claiming to provide the first practitioner-driven evaluation that reveals industry challenges and highlights needs for more context-aware solutions.

Significance. If the central synthesis holds after addressing scope limitations, the work offers a valuable practitioner perspective on the research-practice gap in safety-critical ADS testing, which is a strength for software engineering venues focused on empirical methods and industry collaboration. The interactive approach with practitioners is a positive element, but the narrow sample and opaque selection process limit its broader utility as a generalizable review.

major comments (2)
  1. [Methods (practitioner collaboration and literature analysis)] The methods description (practitioner collaboration and literature analysis sections) provides no details on the search strategy, databases, inclusion/exclusion criteria, or screening process used to identify and select the 17 relevant studies. This directly weakens the validity of the relevance/applicability assessments and the synthesis of findings on E2E ADS testing challenges.
  2. [Practitioner input and results sections] The practitioner sample is restricted to 21 individuals from a single automotive company. This assumption of representativeness underpins the identification of the 12 challenges, the prioritization of E2E ADS testing issues, and the claims of revealing 'practical challenges in ADS testing' and offering 'rapid insights for practitioners'; automotive firms vary substantially in architectures, standards, and testing contexts, so the gap-bridging conclusions rest on untested external validity.
minor comments (2)
  1. [Abstract] The abstract's claim of being 'the first practitioner-driven review' would benefit from a short qualification or reference to prior ADS testing reviews to avoid overstatement.
  2. [Throughout the manuscript] Ensure all acronyms (ADS, E2E) are defined on first use and that the applicability assessment criteria are explicitly listed for reader evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to enhance transparency and appropriately scope our claims.

read point-by-point responses
  1. Referee: The methods description (practitioner collaboration and literature analysis sections) provides no details on the search strategy, databases, inclusion/exclusion criteria, or screening process used to identify and select the 17 relevant studies. This directly weakens the validity of the relevance/applicability assessments and the synthesis of findings on E2E ADS testing challenges.

    Authors: We agree that the current manuscript lacks sufficient detail on the literature selection process. As this is a rapid review driven by the two practitioner-prioritized challenges rather than a comprehensive systematic review, the 17 studies were identified through targeted searches for relevance to E2E ADS testing approaches and completeness. We will add a new subsection to the Methods section explicitly describing the search strategy (including keywords such as 'end-to-end autonomous driving testing' and 'scenario generation for ADS'), databases (IEEE Xplore, ACM Digital Library, Google Scholar), inclusion/exclusion criteria (e.g., peer-reviewed studies from 2018 onward focusing on E2E systems, excluding purely simulation-only works without testing implications), and the two-stage screening process. This will allow readers to assess the validity of our relevance and applicability evaluations. revision: yes

  2. Referee: The practitioner sample is restricted to 21 individuals from a single automotive company. This assumption of representativeness underpins the identification of the 12 challenges, the prioritization of E2E ADS testing issues, and the claims of revealing 'practical challenges in ADS testing' and offering 'rapid insights for practitioners'; automotive firms vary substantially in architectures, standards, and testing contexts, so the gap-bridging conclusions rest on untested external validity.

    Authors: We accept that the single-company sample limits generalizability and do not assert that the identified challenges or their prioritization apply universally across the automotive sector. The study is framed as an in-depth interactive rapid review with one leading company, which provides unique access to industrial perspectives often unavailable in public literature. We will revise the manuscript by adding an explicit Limitations section (or expanding Threats to Validity) that discusses the single-company scope, rephrases broader claims (e.g., changing 'reveals practical challenges in ADS testing' to 'reveals practical challenges in ADS testing within the context of the collaborating company'), and positions the work as a foundation for future multi-company studies rather than a definitive industry-wide synthesis. revision: partial

Circularity Check

0 steps flagged

No circularity: qualitative synthesis of practitioner input and literature

full rationale

The paper performs an interactive rapid review: practitioners from one firm identify 12 challenges and prioritize two, after which the authors select and assess 17 studies for relevance. No equations, parameters, predictions, or derivations exist. No self-citations are invoked as load-bearing premises, and the central claims (revealing challenges, assessing applicability) are direct outputs of the described process rather than reductions to prior self-referential results. The single-company sample raises external-validity concerns but does not create circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper without mathematical models or new postulates. It draws on standard rapid-review methodology and direct practitioner input rather than introducing free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5481 in / 1082 out tokens · 37681 ms · 2026-05-09T18:49:46.325911+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 11 canonical work pages · 2 internal anchors

  1. [1]

    ACM. 2025. ACM Digital Library. https://dl.acm.org/ (last accessed: April 16 2026)

  2. [2]

    Victor Basili, Lionel Briand, Domenico Bianculli, Shiva Nejati, Fabrizio Pastore, and Mehrdad Sabetzadeh. 2018. Software engineering research and industry: a symbiotic relationship to foster impact.IEEE Software35, 5 (2018), 44–49

  3. [3]

    Felix Beringhoff, Joel Greenyer, Christian Roesener, and Matthias Tichy. 2022. Thirty-one challenges in testing automated vehicles: Interviews with experts from industry and research. In2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 360–366

  4. [4]

    Lionel Briand, Domenico Bianculli, Shiva Nejati, Fabrizio Pastore, and Mehrdad Sabetzadeh. 2017. The case for context-driven software engineering research: generalizability is overrated.IEEE Software34, 5 (2017), 72–75

  5. [5]

    Jinkang Cai, Weiwen Deng, Haoran Guang, Ying Wang, Jiangkun Li, and Juan Ding. 2022. A survey on data-driven scenario generation for automated vehicle testing.Machines10, 11 (2022), 1101

  6. [6]

    Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. 2024. End-to-End Autonomous Driving: Challenges and Frontiers. IEEE Transactions on Pattern Analysis and Machine Intelligence46, 12 (2024), 10164–10183. doi:10.1109/TPAMI.2024.3435937

  7. [7]

    Mingfei Cheng, Renzhi Wang, Xiaofei Xie, Yuan Zhou, and Lei Ma. 2025. STCLocker: Deadlock Avoidance Testing for Autonomous Driving Systems.arXiv preprint arXiv:2506.23995(2025)

  8. [8]

    Cornell University. 2025. arXiv. https://arxiv.org/ (last accessed: April 16 2026)

  9. [9]

    Elsevier. 2025. Scopus Content. https://www.elsevier.com/products/scopus/ content (last accessed: April 16 2026)

  10. [10]

    Yuan Gao, Mattia Piccinini, Yuchen Zhang, Dingrui Wang, Korbinian Moller, Roberto Brusnicki, Baha Zarrouki, Alessio Gambi, Jan Frederik Totz, Kai Storms, et al. 2026. Foundation models in autonomous driving: A survey on scenario generation and scenario analysis.IEEE Open Journal of Intelligent Transportation Systems(2026)

  11. [11]

    Jiaheng Geng, Jiatong Du, Xinyu Zhang, Ye Li, Panqu Wang, and Yanjun Huang

  12. [12]

    Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving.arXiv preprint arXiv:2512.16055 (2025)

  13. [13]

    Fitash Ul Haq, Donghwan Shin, and Lionel C Briand. 2023. Many-objective rein- forcement learning for online testing of dnn-enabled systems. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1814–1826

  14. [14]

    IEEE. 2025. IEEE Xplore. https://ieeexplore.ieee.org/Xplore/home.jsp (last accessed: April 16 2026)

  15. [15]

    Pengliang Ji, Ruan Li, Yunzhi Xue, Qian Dong, Limin Xiao, and Rui Xue. 2021. Per- spective, survey and trends: Public driving datasets and toolsets for autonomous driving virtual test. In2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE, 264–269

  16. [16]

    Yuxiong Ji, Zhongke Xu, Cong Zhao, Kun Chen, and Yuchuan Du. 2025. Accel- erated Testing and Evaluation for Black-Box Autonomous Driving Systems via Adaptive Markov Chain Monte Carlo.IEEE Transactions on Intelligent Transporta- tion Systems26, 5 (2025), 6463–6476

  17. [17]

    Yue Kang, Hang Yin, and Christian Berger. 2019. Test your self-driving algo- rithm: An overview of publicly available driving datasets and virtual testing environments.IEEE Transactions on Intelligent Vehicles4, 2 (2019), 171–185

  18. [18]

    Alessia Knauss, Jan Schröder, Christian Berger, and Henrik Eriksson. 2017. Paving the roadway for safety of automated vehicles: An empirical study on testing challenges. In2017 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1873–1880

  19. [19]

    Patricia Lago, Per Runeson, Qunying Song, and Roberto Verdecchia. 2024. Threats to validity in software engineering–hypocritical paper section or essential analy- sis?. InProceedings of the 18th ACM/IEEE International symposium on empirical software engineering and measurement. 314–324

  20. [20]

    Changwen Li, Joseph Sifakis, Rongjie Yan, and Jian Zhang. 2025. A comprehensive evaluation of four end-to-end ai autopilots using cctest and the carla leaderboard. arXiv preprint arXiv:2501.12090(2025)

  21. [21]

    Linfeng Liang and Xi Zheng. 2025. MARL-OT: Multi-Agent Reinforcement Learning Guided Online Fuzzing to Detect Safety Violation in Autonomous Driving Systems.arXiv preprint arXiv:2501.14451(2025)

  22. [22]

    Yihan Liao, Jingyu Zhang, Jacky Keung, Yan Xiao, and Yurou Dai. 2025. Advancing autonomous driving system testing: Demands, challenges, and future directions. Information and Software Technology(2025), 107859

  23. [23]

    Guannan Lou, Yao Deng, Xi Zheng, Mengshi Zhang, and Tianyi Zhang. 2022. Testing of autonomous driving systems: where are we and where should we go?. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 31–43

  24. [24]

    Chengjie Lu, Shaukat Ali, and Tao Yue. 2024. Epitester: Testing autonomous vehicles with epigenetic algorithm and attention mechanism.IEEE Transactions on Software Engineering50, 10 (2024), 2614–2632

  25. [25]

    Jing Ma, Xiaobo Che, Yanqiang Li, and Edmund M-K Lai. 2021. Traffic scenarios for automated vehicle testing: A review of description languages and systems. Machines9, 12 (2021), 342

  26. [26]

    Sagar Pathrudkar, Saadhana Venkataraman, Deepika Kanade, Aswin Ajayan, Palash Gupta, Shehzaman Khatib, Vijaya Sarathi Indla, and Saikat Mukherjee

  27. [27]

    SAFR-AV: Safety Analysis of Autonomous Vehicles using Real World Data– An end-to-end solution for real world data driven scenario-based testing for pre-certification of AV stacks.arXiv preprint arXiv:2302.14601(2023)

  28. [28]

    Sergio Rico, Nauman Bin Ali, Emelie Engström, and Martin Höst. 2020. Guidelines for conducting interactive rapid reviews in software engineering–from a focus on technology transfer to knowledge exchange.Technical Report(2020)

  29. [29]

    Sergio Rico, Nauman Bin Ali, Emelie Engström, and Martin Höst. 2024. Experi- ences from conducting rapid reviews in collaboration with practitioners—Two industrial cases.Information and Software Technology167 (2024), 107364

  30. [30]

    Stefan Riedmaier, Thomas Ponn, Dieter Ludwig, Bernhard Schick, and Frank Diermeyer. 2020. Survey on scenario-based safety assessment of automated vehicles.IEEE access8 (2020), 87456–87477

  31. [31]

    Francisca Rosique, Pedro J Navarro, Carlos Fernández, and Antonio Padilla. 2019. A systematic review of perception system and simulators for autonomous vehicles research.Sensors19, 3 (2019), 648

  32. [32]

    Per Runeson, Emelie Engström, and Margaret-Anne Storey. 2020. The design science paradigm as a frame for empirical software engineering. InContemporary empirical methods in software engineering. Springer, 127–147

  33. [33]

    Qunying Song, Markus Borg, Emelie Engström, Håkan Ardö, and Sergio Rico

  34. [34]

    InProceedings of the 1st International Conference on AI Engineering: Software Engineering for AI

    Exploring ML testing in practice: Lessons learned from an interactive rapid review with axis communications. InProceedings of the 1st International Conference on AI Engineering: Software Engineering for AI. 10–21

  35. [35]

    Qunying Song, Emelie Engström, and Per Runeson. 2024. An empirically grounded path forward for scenario-based testing of autonomous driving sys- tems. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering. 232–243

  36. [36]

    Qunying Song, Emelie Engström, and Per Runeson. 2024. Industry practices for challenging autonomous driving systems with critical scenarios.ACM Transac- tions on Software Engineering and Methodology33, 4 (2024), 1–35

  37. [37]

    Qunying Song, Ali Nouri, Håkan Sivencrona, Mark Harman, and Federica Sarro

  38. [38]

    doi:10.5281/zenodo.19627023

    Supplementary Material for Interactive Rapid Review on ADS Testing in Industry. doi:10.5281/zenodo.19627023

  39. [39]

    Qunying Song and Per Runeson. 2023. Industry-academia collaboration for real- ism in software engineering research: Insights and recommendations.Information and Software Technology156 (2023), 107135

  40. [40]

    Qunying Song, He Ye, Mark Harman, and Federica Sarro. 2025. Generative AI for Testing of Autonomous Driving Systems: A Survey.arXiv preprint arXiv:2508.19882(2025)

  41. [41]

    Jian Sun, He Zhang, Huajun Zhou, Rongjie Yu, and Ye Tian. 2021. Scenario-based test automation for highly automated vehicles: A review and paving the way for systematic safety assurance.IEEE transactions on intelligent transportation systems23, 9 (2021), 14088–14103

  42. [42]

    Shuncheng Tang, Zhenya Zhang, Yi Zhang, Jixiang Zhou, Yan Guo, Shuang Liu, Shengjian Guo, Yan-Fu Li, Lei Ma, Yinxing Xue, et al . 2023. A survey on automated driving system testing: Landscapes and trends.ACM Transactions on Software Engineering and Methodology32, 5 (2023), 1–62

  43. [43]

    Hanlin Tian, Kethan Reddy, Yuxiang Feng, Mohammed Quddus, Yiannis Demiris, and Panagiotis Angeloudis. 2025. Large (vision) language models for autonomous vehicles: Current trends and future directions.IEEE Transactions on Intelligent Transportation Systems27, 1 (2025), 187–210

  44. [44]

    Roberto Verdecchia, Emelie Engström, Patricia Lago, Per Runeson, and Qunying Song. 2023. Threats to validity in software engineering research: A critical reflection.Information and Software Technology164 (2023), 107329

  45. [45]

    Tong Wang, Xiaohui Kuang, Hu Li, Qianjin Du, Zhanhao Hu, Huan Deng, and Gang Zhao. 2023. Driving into danger: Adversarial patch attack on end-to-end autonomous driving systems using deep learning. In2023 IEEE Symposium on Computers and Communications (ISCC). IEEE, 995–1000

  46. [46]

    Jiahui Wu, Chengjie Lu, Aitor Arrieta, and Shaukat Ali. 2025. Multi-objective reinforcement learning for critical scenario generation of autonomous vehicles. arXiv preprint arXiv:2502.15792(2025)

  47. [47]

    Xiongfei Wu, Mingfei Cheng, Xiaoning Ren, Qiang Hu, Jianlang Chen, Yuheng Huang, Maxime Cordy, Yao Zhang, Xiaofei Xie, Lei Ma, et al. 2026. Foundation Models for Autonomous Driving Systems: An Initial Roadmap.ACM Transactions on Software Engineering and Methodology(2026)

  48. [48]

    Songyang Yan, Xiaodong Zhang, Kunkun Hao, Haojie Xin, Yonggang Luo, Jucheng Yang, Ming Fan, Chao Yang, Jun Sun, and Zijiang Yang. 2025. On- demand scenario generation for testing automated driving systems.Proceedings of the ACM on Software Engineering2, FSE (2025), 86–105

  49. [49]

    Yuhang Yang, Kalle Kujanpää, I Amin Babadi, Joni Pajarinen, and Alexander Ilin

  50. [50]

    In2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)

    Suicidal pedestrian: Generation of safety-critical scenarios for autonomous vehicles. In2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 1983–1988

  51. [51]

    Hossein Yousefizadeh, Shenghui Gu, Lionel C Briand, and Ali Nasr. 2025. Con- strained Co-evolutionary Metamorphic Differential Testing for Autonomous Systems with an Interpretability Approach.arXiv preprint arXiv:2509.16478 (2025). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Song et al

  52. [52]

    Jingyu Zhang, Jacky Wai Keung, Yan Xiao, Yihan Liao, Yishu Li, and Xiaoxue Ma

  53. [53]

    Uniada: Universal adaptive multiobjective adversarial attack for end-to- end autonomous driving systems.IEEE Transactions on Reliability73, 4 (2024), 1892–1906

  54. [54]

    Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, et al. 2022. Finding critical scenarios for automated driving systems: A systematic mapping study.IEEE Transactions on Software Engineering49, 3 (2022), 991–1026

  55. [55]

    Yongqi Zhao, Ji Zhou, Dong Bi, Tomislav Mihalj, Jia Hu, and Arno Eichberger

  56. [56]

    A survey on the application of large language models in scenario-based test- ing of automated driving systems.IEEE Transactions on Intelligent Transportation Systems(2026)

  57. [57]

    Yixing Zheng, Yizhuo Xiao, Zhongpan Zhu, Mustafa Suphi Erden, and Cheng Wang. 2025. CADiffusion: Controllable Adversarial Diffusion for Attacking Lane Detection of Autonomous Vehicles. In2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 4516–4522

  58. [58]

    Ziyuan Zhong, Gail Kaiser, and Baishakhi Ray. 2022. Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles.IEEE Transactions on Software Engineering49, 4 (2022), 1860–1875

  59. [59]

    Jiawei Zhou, Linye Lyu, Zhuotao Tian, Cheng Zhuo, and Yu Li. 2025. Safemvdrive: Multi-view safety-critical driving video synthesis in the real world domain.arXiv preprint arXiv:2505.17727(2025). Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009