Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance

Alireza Abbaspour; B Ravi Kiran; Russel Mohr; Senthil Yogamani; Tejaskumar Balgonda Patil

arxiv: 2511.08439 · v2 · submitted 2025-11-11 · 💻 cs.AI

Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance

Alireza Abbaspour , Tejaskumar Balgonda Patil , B Ravi Kiran , Russel Mohr , Senthil Yogamani This is my paper

Pith reviewed 2026-05-17 23:29 UTC · model grok-4.3

classification 💻 cs.AI

keywords dataset safetyautonomous drivingAI perceptionISO/PAS 8800data lifecyclesafety analysisverification and validationhazard identification

0 comments

The pith

A framework develops safe datasets for autonomous driving by aligning with ISO/PAS 8800 and managing the full data lifecycle.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a structured approach to building datasets that support safe AI systems in autonomous vehicles. It centers on perception systems and introduces the AI Data Flywheel as a way to handle the complete dataset process from initial collection through annotation, curation, and ongoing maintenance. A reader would care because flawed data directly creates hazards that can compromise vehicle safety and reliability. The work adds safety analyses to find risks from insufficient data, defines how to set dataset safety requirements, and outlines verification steps to meet standards. It also surveys recent research to highlight current issues and possible next steps in the field.

Core claim

The paper claims that a structured framework aligned with ISO/PAS 8800 guidelines develops safe datasets for AI-based perception in autonomous driving by introducing the AI Data Flywheel and the dataset lifecycle that covers collection, annotation, curation, and maintenance, while adding rigorous safety analyses to identify hazards from dataset insufficiencies, defining processes for safety requirements, and proposing verification and validation strategies to ensure compliance.

What carries the argument

The AI Data Flywheel, a cyclic process that drives continuous data collection, annotation, curation, and maintenance to reduce risks from dataset problems.

If this is right

Hazards from dataset insufficiencies can be spotted early through the included safety analyses.
Clear dataset safety requirements can be set to match ISO/PAS 8800 guidelines.
Verification and validation steps can confirm that datasets meet safety standards before use.
Insights from reviewed research can point to practical challenges in maintaining safe data over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be adapted to improve data quality in other AI applications that rely on perception, such as robotics or medical imaging.
Widespread adoption might encourage shared industry standards for dataset audits in autonomous systems.
Testing the lifecycle stages on real driving data collections would show whether the proposed steps actually prevent specific failure modes.

Load-bearing premise

The safety analyses, dataset requirements, and verification strategies will successfully reduce risks from poor data and achieve standard compliance even though no empirical tests or case studies are shown.

What would settle it

An autonomous vehicle system built with datasets following the framework that still suffers a safety incident caused by dataset insufficiencies such as missing edge cases or annotation errors.

Figures

Figures reproduced from arXiv: 2511.08439 by Alireza Abbaspour, B Ravi Kiran, Russel Mohr, Senthil Yogamani, Tejaskumar Balgonda Patil.

**Figure 2.** Figure 2: Data flywheel from collection, Data quality and diversification, model training, automated labeling model based [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Automated data or file selection pipeline with various configurations to retrieve files that satisfy requirements, metadata [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Automated annotation quality check model is a semantic segmentation pipeline based on SAM and OpenClip. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Dataset Lifecycle recommended by ISO/PAS 8800 [ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Dataset integrity is fundamental to the safety and reliability of AI systems, especially in autonomous driving. This paper presents a structured framework for developing safe datasets aligned with ISO/PAS 8800 guidelines. Using AI-based perception systems as the primary use case, it introduces the AI Data Flywheel and the dataset lifecycle, covering data collection, annotation, curation, and maintenance. The framework incorporates rigorous safety analyses to identify hazards and mitigate risks caused by dataset insufficiencies. It also defines processes for establishing dataset safety requirements and proposes verification and validation strategies to ensure compliance with safety standards. In addition to outlining best practices, the paper reviews recent research and emerging trends in dataset safety and autonomous vehicle development, providing insights into current challenges and future directions. By integrating these perspectives, the paper aims to advance robust, safety-assured AI systems for autonomous driving applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper proposes an AI Data Flywheel and dataset lifecycle framework for autonomous driving safety aligned with ISO/PAS 8800, but it offers only process descriptions without data or validation.

read the letter

Hi, the main thing to know is that this paper proposes a framework for safe datasets in autonomous driving perception systems. It introduces the AI Data Flywheel concept and maps a lifecycle from collection through annotation, curation, and maintenance, all tied to ISO/PAS 8800 guidelines with hazard analyses and verification steps. They also review recent research and trends in the area. That part gives a clear overview of current challenges and could work as a reference for teams needing to align with safety standards. The soft spot is the complete lack of evidence. The claims about risk mitigation and assurance come from process outlines alone, with no case studies, experiments, or results to show whether any of it actually works in practice. This leaves the central argument at the level of a suggestion rather than a demonstrated approach. The paper is aimed at engineers and safety people working on autonomous vehicle datasets who want practical guidance on compliance. A reader focused on industry standards might find some usable structure here. It deserves a serious referee because the topic is important for real systems and review could push for added examples or checks to make the framework more solid. I would send it to peer review.

Referee Report

2 major / 3 minor

Summary. The paper presents a structured framework for developing safe datasets in autonomous driving, focused on AI-based perception systems and aligned with ISO/PAS 8800. It introduces the AI Data Flywheel concept and outlines a dataset lifecycle covering collection, annotation, curation, and maintenance. The framework includes safety analyses to identify hazards from dataset insufficiencies, processes for defining dataset safety requirements, and verification/validation strategies, accompanied by a review of recent research and trends in dataset safety.

Significance. If the framework holds as a coherent and practical proposal, it offers a useful synthesis of best practices for dataset safety assurance in safety-critical AI applications. By explicitly linking data lifecycle stages to ISO/PAS 8800 compliance and hazard mitigation, the work could help standardize approaches in autonomous vehicle development where dataset quality directly affects perception reliability. The literature review component adds value by contextualizing emerging challenges.

major comments (2)

[Safety Analyses and Framework Overview] Abstract and Section on Safety Analyses: the assertion that the framework 'incorporates rigorous safety analyses to identify hazards and mitigate risks' is load-bearing for the assurance claims in the title, yet the provided descriptions remain at the level of process outlines without specifying concrete hazard identification techniques, risk metrics, or failure mode examples tied to perception datasets.
[Verification and Validation Strategies] Section on Verification and Validation Strategies: the proposed V&V strategies reference external ISO/PAS 8800 guidelines as grounding but do not demonstrate how the AI Data Flywheel outputs feed into specific compliance checks or traceability requirements, leaving the central claim of ensured compliance without an internal mechanism for evaluation.

minor comments (3)

[AI Data Flywheel] The AI Data Flywheel is introduced as a novel construct but its operational definition (e.g., feedback loops between stages) could be clarified with a diagram or pseudocode to distinguish it from standard iterative data pipelines.
[Related Work and Trends] Literature review section would benefit from explicit mapping of cited works to specific framework components (e.g., which papers address annotation risks) to strengthen traceability.
[Dataset Safety Requirements] Notation for dataset safety requirements could be made more consistent; terms like 'insufficiencies' and 'hazards' are used interchangeably in places without a glossary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. The comments identify opportunities to strengthen the concreteness of the safety analyses and the traceability of compliance claims. We address each point below and will incorporate targeted revisions to improve clarity while preserving the framework's conceptual scope.

read point-by-point responses

Referee: [Safety Analyses and Framework Overview] Abstract and Section on Safety Analyses: the assertion that the framework 'incorporates rigorous safety analyses to identify hazards and mitigate risks' is load-bearing for the assurance claims in the title, yet the provided descriptions remain at the level of process outlines without specifying concrete hazard identification techniques, risk metrics, or failure mode examples tied to perception datasets.

Authors: We agree that the current descriptions emphasize process structure over concrete instantiation. The manuscript positions the safety analyses as an integrated component of the dataset lifecycle aligned with ISO/PAS 8800, drawing on the literature review for context. To address the concern, we will expand the relevant section with explicit examples: hazard identification via adapted FMEA for dataset issues, illustrative risk metrics such as coverage ratios for edge cases, and perception-specific failure modes (e.g., annotation errors in low-light conditions or distributional shifts in sensor data). These additions will reference the reviewed research trends without altering the high-level framework. revision: yes
Referee: [Verification and Validation Strategies] Section on Verification and Validation Strategies: the proposed V&V strategies reference external ISO/PAS 8800 guidelines as grounding but do not demonstrate how the AI Data Flywheel outputs feed into specific compliance checks or traceability requirements, leaving the central claim of ensured compliance without an internal mechanism for evaluation.

Authors: We acknowledge the value of making the linkage between Flywheel outputs and compliance mechanisms more explicit. The manuscript already describes the Flywheel as generating stage-specific artifacts (collection logs, annotation quality metrics, curation reports) intended to support traceability. In revision we will add a dedicated workflow diagram and accompanying text that maps these outputs to example ISO/PAS 8800 checks, including traceability requirements and verification criteria. This will illustrate the internal evaluation path while continuing to reference the standard for detailed normative requirements. revision: yes

Circularity Check

0 steps flagged

No significant circularity in framework proposal

full rationale

The paper presents a structured framework for dataset safety in autonomous driving aligned with external ISO/PAS 8800 guidelines. It introduces the AI Data Flywheel and dataset lifecycle (covering collection, annotation, curation, maintenance) along with safety analyses, requirements processes, and V&V strategies. These elements are proposed as best practices grounded in external standards and a literature review of recent research, with no mathematical derivations, equations, fitted parameters, or predictions that reduce by construction to the paper's own inputs. No self-citation chains, uniqueness theorems, or ansatzes from prior author work serve as load-bearing justifications. The contribution is self-contained as a process-oriented proposal against external benchmarks rather than an internally derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that ISO/PAS 8800 guidelines are appropriate for dataset safety in AI perception and that structured processes can identify and mitigate dataset hazards without additional empirical proof.

axioms (1)

domain assumption ISO/PAS 8800 provides suitable guidelines for safety in AI systems for autonomous driving
The entire framework is presented as aligned with these guidelines.

invented entities (1)

AI Data Flywheel no independent evidence
purpose: To represent the continuous cycle of data collection, annotation, curation, and maintenance for safety assurance
New concept introduced to organize the dataset lifecycle in the framework.

pith-pipeline@v0.9.0 · 5457 in / 1307 out tokens · 58300 ms · 2026-05-17T23:29:39.698976+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

presents a structured framework for developing safe datasets aligned with ISO/PAS 8800 guidelines... AI Data Flywheel and the dataset lifecycle

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 1 internal anchor

[1]

The safety risks of ai- driven solutions in autonomous road vehicles,

F. Mirzarazi, S. Danishvar, and A. Mousavi, “The safety risks of ai- driven solutions in autonomous road vehicles,”World Electric Vehicle Journal, vol. 15, no. 10, p. 438, 2024

work page 2024
[2]

Neurall: Towards a unified visual perception model for automated driving,

G. Sistu, I. Leang, S. Chennupati, S. Yogamani, C. Hughes, S. Milz, and S. Rawashdeh, “Neurall: Towards a unified visual perception model for automated driving,” in2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 796–803

work page 2019
[3]

Near-field depth estimation using monocular fisheye camera: A semi-supervised learning approach using sparse lidar data,

V . R. Kumar, S. Milz, C. Witt, M. Simon, K. Amende, J. Petzold, S. Yogamani, and T. Pech, “Near-field depth estimation using monocular fisheye camera: A semi-supervised learning approach using sparse lidar data,” inCVPR Workshop, vol. 7, 2018, p. 2

work page 2018
[4]

Overview and empirical analysis of ISP parameter tuning for visual perception in autonomous driving,

L. Yahiaoui, J. Horgan, B. Deeganet al., “Overview and empirical analysis of ISP parameter tuning for visual perception in autonomous driving,”Journal of Imaging, vol. 5, no. 10, p. 78, 2019

work page 2019
[5]

AuxNet: Auxiliary Tasks Enhanced Semantic Segmentation for Automated Driving,

S. Chennupati, G. Sistu, S. Yogamaniet al., “AuxNet: Auxiliary Tasks Enhanced Semantic Segmentation for Automated Driving,” inProceed- ings of the International Conference on Computer Vision Theory and Applications, 2019, pp. 645–652

work page 2019
[6]

Collaborative perception datasets for autonomous driving: A review,

N. Wang, D. Shang, Y . Gong, X. Hu, Z. Song, L. Yang, Y . Huang, X. Wang, and J. Lu, “Collaborative perception datasets for autonomous driving: A review,”arXiv preprint arXiv:2504.12696, 2025

work page arXiv 2025
[7]

A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook,

M. Liu, E. Yurtsever, J. Fossaert, X. Zhou, W. Zimmer, Y . Cui, B. L. Zagar, and A. C. Knoll, “A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024
[8]

Computer vision for autonomous vehicles: Problems, datasets and state of the art,

J. Janai, F. G ¨uney, A. Behl, A. Geigeret al., “Computer vision for autonomous vehicles: Problems, datasets and state of the art,”Founda- tions and trends® in computer graphics and vision, vol. 12, no. 1–3, pp. 1–308, 2020

work page 2020
[9]

Are we hungry for 3d lidar data for semantic segmentation? a survey of datasets and methods,

B. Gao, Y . Pan, C. Li, S. Geng, and H. Zhao, “Are we hungry for 3d lidar data for semantic segmentation? a survey of datasets and methods,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6063–6081, 2021

work page 2021
[10]

Joseph and A

L. Joseph and A. K. Mondal,Autonomous driving and advanced driver- assistance systems (ADAS): applications, development, legal issues, and testing. CRC Press, 2021

work page 2021
[11]

X-align: Cross-modal cross-view alignment for bird’s-eye-view segmentation,

S. Borse, M. Klingner, V . R. Kumar, H. Cai, A. Almuzairee, S. Yoga- mani, and F. Porikli, “X-align: Cross-modal cross-view alignment for bird’s-eye-view segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 3287–3297

work page 2023
[12]

X3kd: Knowledge distillation across modalities, tasks and stages for multi-camera 3d object detection,

M. Klingner, S. Borse, V . R. Kumar, B. Rezaei, V . Narayanan, S. Yoga- mani, and F. Porikli, “X3kd: Knowledge distillation across modalities, tasks and stages for multi-camera 3d object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 343–13 353

work page 2023
[13]

FisheyeYOLO: Object Detec- tion on Fisheye Cameras for Autonomous Driving,

H. Rashed, E. Mohamed, G. Sistuet al., “FisheyeYOLO: Object Detec- tion on Fisheye Cameras for Autonomous Driving,”Machine Learning for Autonomous Driving NeurIPSW, 2020

work page 2020
[14]

Challenges in de- signing datasets and validation for autonomous driving,

M. U ˇriˇc´aˇr., D. Hurych., P. Kˇr´ıˇzek, and S. Yogamani., “Challenges in de- signing datasets and validation for autonomous driving,” inProceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP: VISAPP ,. SciTePress, 2019, pp. 653–659

work page 2019
[15]

Artificial intelligence in automated driving: an analysis of safety and cybersecurity challenges,

R. HAMON, H. JUNKLEWITZ, M. J. I. SANCHEZ, L. D. FERNAN- DEZ, G. E. GOMEZ, A. A. HERRERA, A. KRISTONet al., “Artificial intelligence in automated driving: an analysis of safety and cybersecurity challenges,” 2022

work page 2022
[16]

A survey on autonomous driving datasets,

W. Liu, Q. Dong, P. Wang, G. Yang, L. Meng, Y . Song, Y . Shi, and Y . Xue, “A survey on autonomous driving datasets,” in2021 8th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 2021, pp. 399–407

work page 2021
[17]

Open-sourced data ecosystem in autonomous driving: the present and future,

H. Li, Y . Li, H. Wang, J. Zeng, H. Xu, P. Cai, L. Chen, J. Yan, F. Xu, L. Xionget al., “Open-sourced data ecosystem in autonomous driving: the present and future,”arXiv preprint arXiv:2312.03408, 2023

work page arXiv 2023
[18]

Synthetic datasets for autonomous driving: A survey,

Z. Song, Z. He, X. Li, Q. Ma, R. Ming, Z. Mao, H. Pei, L. Peng, J. Hu, D. Yaoet al., “Synthetic datasets for autonomous driving: A survey,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 1847–1864, 2023

work page 2023
[19]

Perception datasets for anomaly detection in autonomous driving: A survey,

D. Bogdoll, S. Uhlemeyer, K. Kowol, and J. M. Z ¨ollner, “Perception datasets for anomaly detection in autonomous driving: A survey,” in 2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2023, pp. 1–8

work page 2023
[20]

A survey on datasets for the decision making of autonomous vehicles,

Y . Wang, Z. Han, Y . Xing, S. Xu, and J. Wang, “A survey on datasets for the decision making of autonomous vehicles,”IEEE Intelligent Transportation Systems Magazine, vol. 16, no. 2, pp. 23–40, 2024

work page 2024
[21]

Aide: An automatic data engine for object detection in autonomous driving,

M. Liang, J.-C. Su, S. Schulter, S. Garg, S. Zhao, Y . Wu, and M. Chandraker, “Aide: An automatic data engine for object detection in autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 695–14 706

work page 2024
[22]

Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies,

L. Li, W. Shao, W. Dong, Y . Tian, Q. Zhang, K. Yang, and W. Zhang, “Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies,”arXiv preprint arXiv:2401.12888, 2024. 14 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

work page arXiv 2024
[23]

Tesla ai day 2022,

Tesla, “Tesla ai day 2022,” 2022. [Online]. Available: https: //www.youtube.com/watch?v=ODSJsviD SU

work page 2022
[24]

Upgrading your fleet into an av data engine - scale ai,

S. AI, “Upgrading your fleet into an av data engine - scale ai,” 2023. [Online]. Available: https://www.youtube.com/watch?v=lbOoXI1EeEs

work page 2023
[25]

The aurora data engine—advancing the aurora driver through valuable data that drives machine learning,

Aurora, “The aurora data engine—advancing the aurora driver through valuable data that drives machine learning,” 2021. [Online]. Available: https://www.youtube.com/watch?v=Xe8YtdkMkS8

work page 2021
[26]

Momenta at cvpr 2023: How data-driven flywheel enables scalable path to full autonomy,

Momenta, “Momenta at cvpr 2023: How data-driven flywheel enables scalable path to full autonomy,” 2023. [Online]. Available: https://www.youtube.com/watch?v=tNpEeIyuiJs

work page 2023
[27]

Maptr: Structured modeling and learning for online vectorized hd map construction

B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “Maptr: Structured modeling and learning for online vectorized hd map construction,”arXiv preprint arXiv:2208.14437, 2022

work page arXiv 2022
[28]

Planning-oriented autonomous driving,

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

work page 2023
[29]

Vad: Vectorized scene representation for efficient autonomous driving,

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang, “Vad: Vectorized scene representation for efficient autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8340–8350

work page 2023
[30]

Para- drive: Parallelized architecture for real-time autonomous driving,

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone, “Para- drive: Parallelized architecture for real-time autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458

work page 2024
[31]

Lingo-2: Driving with natural language,

W. R. Teamet al., “Lingo-2: Driving with natural language,” 2024

work page 2024
[32]

Navigation-guided sparse scene representation for end-to-end autonomous driving,

P. Li and D. Cui, “Navigation-guided sparse scene representation for end-to-end autonomous driving,” inThe Thirteenth International Con- ference on Learning Representations

work page
[33]

Li, Adrien Bardes, Suzanne Petryk, Oscar Ma ˜nas, et al

F. Bordes, R. Y . Pang, A. Ajay, A. C. Li, A. Bardes, S. Petryk, O. Ma˜nas, Z. Lin, A. Mahmoud, B. Jayaramanet al., “An introduction to vision- language modeling,”arXiv preprint arXiv:2405.17247, 2024

work page arXiv 2024
[34]

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

X. Tian, J. Gu, B. Li, Y . Liu, Y . Wang, Z. Zhao, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Drivevlm: The convergence of autonomous driving and large vision-language models,”arXiv preprint arXiv:2402.12289, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Covla: Comprehensive vision-language-action dataset for autonomous driving,

H. Arai, K. Miwa, K. Sasaki, K. Watanabe, Y . Yamaguchi, S. Aoki, and I. Yamamoto, “Covla: Comprehensive vision-language-action dataset for autonomous driving,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1933–1943

work page 2025
[36]

A survey on sensor selection and placement for connected and automated mobility,

M. Kiraz, F. Sivrikaya, and S. Albayrak, “A survey on sensor selection and placement for connected and automated mobility,”IEEE Open Journal of Intelligent Transportation Systems, 2024

work page 2024
[37]

A concept for requirements-driven identification and mitigation of dataset gaps for perception tasks in automated driving

M. S. Moustafa, M. Bieshaar, A. Albrecht, and B. Sick, “A concept for requirements-driven identification and mitigation of dataset gaps for perception tasks in automated driving.”

work page
[38]

Semantic-aware video compres- sion for automotive cameras,

Y . Wang, P. H. Chan, and V . Donzella, “Semantic-aware video compres- sion for automotive cameras,”IEEE Transactions on Intelligent Vehicles, vol. 8, no. 6, pp. 3712–3722, 2023

work page 2023
[39]

A survey on data compression techniques for automotive lidar point clouds,

R. Roriz, H. Silva, F. Dias, and T. Gomes, “A survey on data compression techniques for automotive lidar point clouds,”Sensors, vol. 24, no. 10, p. 3185, 2024

work page 2024
[40]

Navya3dseg- navya 3d semantic segmentation dataset design & split generation for autonomous vehicles,

A. Almin, L. Lemari ´e, A. Duong, and B. R. Kiran, “Navya3dseg- navya 3d semantic segmentation dataset design & split generation for autonomous vehicles,”IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5584–5591, 2023

work page 2023
[41]

Leakage in data mining: Formulation, detection, and avoidance,

S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman, “Leakage in data mining: Formulation, detection, and avoidance,”ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 6, no. 4, pp. 1–21, 2012

work page 2012
[42]

D-lede: A data leakage detection method for automotive perception systems

M. A. A. Babu, S. K. Pandey, D. Durisic, A. C. Koppisetty, and M. Staron, “D-lede: A data leakage detection method for automotive perception systems.”

work page
[43]

Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it,

A. Lilja, J. Fu, E. Stenborg, and L. Hammarstrand, “Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 150–22 159

work page 2024
[44]

Zopp: A framework of zero-shot offboard panoptic perception for autonomous driving,

T. Ma, H. Zhou, Q. Huang, X. Yang, J. Guo, B. Zhang, M. Dou, Y . Qiao, B. Shi, and H. Li, “Zopp: A framework of zero-shot offboard panoptic perception for autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 140 266–140 291, 2024

work page 2024
[45]

Run-time introspection of 2d object detection in automated driving systems using learning representations,

H. Y . Yatbaz, M. Dianati, K. Koufos, and R. Woodman, “Run-time introspection of 2d object detection in automated driving systems using learning representations,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 6, pp. 5033–5046, 2024

work page 2024
[46]

Objectlab: Automated diagnosis of mislabeled images in object detection data,

U. Tkachenko, A. Thyagarajan, and J. Mueller, “Objectlab: Automated diagnosis of mislabeled images in object detection data,”arXiv preprint arXiv:2309.00832, 2023

work page arXiv 2023
[47]

Delving into localization errors for monocular 3d object detection,

X. Ma, Y . Zhang, D. Xu, D. Zhou, S. Yi, H. Li, and W. Ouyang, “Delving into localization errors for monocular 3d object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4721–4730

work page 2021
[48]

Key safety design overview in ai-driven autonomous vehicles,

V . Vyas and Z. Xu, “Key safety design overview in ai-driven autonomous vehicles,”arXiv preprint arXiv:2412.08862, 2024

work page arXiv 2024
[49]

A comprehensive review on traffic datasets and simulators for autonomous vehicles,

S. Sarker, B. Maples, I. Islam, M. Fan, C. Papadopoulos, and W. Li, “A comprehensive review on traffic datasets and simulators for autonomous vehicles,”arXiv preprint arXiv:2412.14207, 2024

work page arXiv 2024
[50]

A systematic digital engineering approach to verification & validation of autonomous ground vehicles in off-road environments,

T. Vilas Samak, C. Vilas Samak, J. Brault, C. Harber, K. McCane, J. Smereka, M. Brudnak, D. Gorsich, and V . Krovi, “A systematic digital engineering approach to verification & validation of autonomous ground vehicles in off-road environments,”arXiv e-prints, pp. arXiv– 2503, 2025

work page 2025
[51]

Road vehicles – safety and artificial intelligence,

I. O. for Standardization, “Road vehicles – safety and artificial intelligence,” ISO/PAS Standard No. 8800:2024, 2024. [Online]. Available: https://www.iso.org/standard/83303.html

work page 2024
[52]

Are we ready for autonomous driving? the kitti vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361

work page 2012
[53]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

work page 2017
[54]

Lgsvl simulator: A high fidelity simulator for autonomous driving,

G. Rong, B. H. Shin, H. Tabatabaee, Q. Lu, S. Lemke, M. Mo ˇzeiko, E. Boise, G. Uhm, M. Gerow, S. Mehtaet al., “Lgsvl simulator: A high fidelity simulator for autonomous driving,” in2020 IEEE 23rd International conference on intelligent transportation systems (ITSC). IEEE, 2020, pp. 1–6

work page 2020
[55]

A survey on image data augmen- tation for deep learning,

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmen- tation for deep learning,”Journal of big data, vol. 6, no. 1, pp. 1–48, 2019

work page 2019
[56]

Bdd100k: A diverse driving dataset for heterogeneous multitask learning,

F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645

work page 2020
[57]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

work page 2020
[58]

Visualizing data using t-sne

L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”Journal of machine learning research, vol. 9, no. 11, 2008

work page 2008
[59]

Comparing the benefits of pseudonymi- sation and anonymisation under the gdpr,

M. Hintze and K. El Emam, “Comparing the benefits of pseudonymi- sation and anonymisation under the gdpr,”Journal of Data Protection & Privacy, vol. 2, no. 2, pp. 145–158, 2018

work page 2018
[60]

A unifying view on dataset shift in classification,

J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodr ´ıguez, N. V . Chawla, and F. Herrera, “A unifying view on dataset shift in classification,”Pattern recognition, vol. 45, no. 1, pp. 521–530, 2012

work page 2012
[61]

Deep learning,

I. Goodfellow, “Deep learning,” 2016

work page 2016
[62]

Scalability in perception for autonomous driving: Waymo open dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

work page 2020
[63]

Processing, assess- ing, and enhancing the waymo autonomous vehicle open dataset for driving behavior research,

X. Hu, Z. Zheng, D. Chen, X. Zhang, and J. Sun, “Processing, assess- ing, and enhancing the waymo autonomous vehicle open dataset for driving behavior research,”Transportation Research Part C: Emerging Technologies, vol. 134, p. 103490, 2022

work page 2022
[64]

Semi-automatic framework for traffic landmark annotation,

W. H. Lee, K. Jung, C. Kang, and H. S. Chang, “Semi-automatic framework for traffic landmark annotation,”IEEE Open Journal of Intelligent Transportation Systems, vol. 2, pp. 1–12, 2021

work page 2021
[65]

Understanding the effectiveness of lossy compression in machine learning training sets,

R. Underwood, J. C. Calhoun, S. Di, and F. Cappello, “Understanding the effectiveness of lossy compression in machine learning training sets,” arXiv preprint arXiv:2403.15953, 2024

work page arXiv 2024
[66]

Fast error-bounded lossy hpc data compression with sz,

S. Di and F. Cappello, “Fast error-bounded lossy hpc data compression with sz,” in2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 2016, pp. 730–739

work page 2016
[67]

Iterative compression towards in-distribution features in domain generalization,

Y . Jiang, T. Zhang, Y . Li, G. Chen, and F. Chen, “Iterative compression towards in-distribution features in domain generalization,”Neurocom- puting, vol. 638, p. 130011, 2025

work page 2025
[68]

Deep-learning-based image com- pression for microscopy images: An empirical study,

Y . Zhou, J. Sollmann, and J. Chen, “Deep-learning-based image com- pression for microscopy images: An empirical study,”Biological Imag- ing, vol. 4, p. e16, 2024

work page 2024
[69]

Operability studies and hazard analysis,

H. Lawley, “Operability studies and hazard analysis,”Chem. Eng. Prog., vol. 70, no. 4, pp. 45–56, 1974. ABBASPOURet al.: DATASET SAFETY IN AUTONOMOUS DRIVING: REQUIREMENTS, RISKS, AND ASSURANCE 15

work page 1974
[70]

A hierarchical hazop-like safety analysis for learning-enabled systems,

Y . Qi, P. R. Conmy, W. Huang, X. Zhao, and X. Huang, “A hierarchical hazop-like safety analysis for learning-enabled systems,”arXiv preprint arXiv:2206.10216, 2022

work page arXiv 2022
[71]

Dataset fault tree analysis for systematic evaluation of machine learning systems,

T. Aoki, D. Kawakami, N. Chida, and T. Tomita, “Dataset fault tree analysis for systematic evaluation of machine learning systems,” in 2020 IEEE 25th Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 2020, pp. 100–109

work page 2020
[72]

Introducing the ml fmea,

P. Schmitt, H. B. Seifert, M. Bijelic, K. Pennar, J. Lopez, and F. Heide, “Introducing the ml fmea,” SAE Technical Paper, Tech. Rep., 2025

work page 2025
[73]

Stpa for learning-enabled systems: a survey and a new practice,

Y . Qi, Y . Dong, S. Khastgir, P. Jennings, X. Zhao, and X. Huang, “Stpa for learning-enabled systems: a survey and a new practice,” in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2023, pp. 1381–1388

work page 2023
[74]

Collaborative perception in autonomous driving: Methods, datasets, and challenges,

Y . Han, H. Zhang, H. Li, Y . Jin, C. Lang, and Y . Li, “Collaborative perception in autonomous driving: Methods, datasets, and challenges,” IEEE Intelligent Transportation Systems Magazine, vol. 15, no. 6, pp. 131–151, 2023

work page 2023

[1] [1]

The safety risks of ai- driven solutions in autonomous road vehicles,

F. Mirzarazi, S. Danishvar, and A. Mousavi, “The safety risks of ai- driven solutions in autonomous road vehicles,”World Electric Vehicle Journal, vol. 15, no. 10, p. 438, 2024

work page 2024

[2] [2]

Neurall: Towards a unified visual perception model for automated driving,

G. Sistu, I. Leang, S. Chennupati, S. Yogamani, C. Hughes, S. Milz, and S. Rawashdeh, “Neurall: Towards a unified visual perception model for automated driving,” in2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 796–803

work page 2019

[3] [3]

Near-field depth estimation using monocular fisheye camera: A semi-supervised learning approach using sparse lidar data,

V . R. Kumar, S. Milz, C. Witt, M. Simon, K. Amende, J. Petzold, S. Yogamani, and T. Pech, “Near-field depth estimation using monocular fisheye camera: A semi-supervised learning approach using sparse lidar data,” inCVPR Workshop, vol. 7, 2018, p. 2

work page 2018

[4] [4]

Overview and empirical analysis of ISP parameter tuning for visual perception in autonomous driving,

L. Yahiaoui, J. Horgan, B. Deeganet al., “Overview and empirical analysis of ISP parameter tuning for visual perception in autonomous driving,”Journal of Imaging, vol. 5, no. 10, p. 78, 2019

work page 2019

[5] [5]

AuxNet: Auxiliary Tasks Enhanced Semantic Segmentation for Automated Driving,

S. Chennupati, G. Sistu, S. Yogamaniet al., “AuxNet: Auxiliary Tasks Enhanced Semantic Segmentation for Automated Driving,” inProceed- ings of the International Conference on Computer Vision Theory and Applications, 2019, pp. 645–652

work page 2019

[6] [6]

Collaborative perception datasets for autonomous driving: A review,

N. Wang, D. Shang, Y . Gong, X. Hu, Z. Song, L. Yang, Y . Huang, X. Wang, and J. Lu, “Collaborative perception datasets for autonomous driving: A review,”arXiv preprint arXiv:2504.12696, 2025

work page arXiv 2025

[7] [7]

A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook,

M. Liu, E. Yurtsever, J. Fossaert, X. Zhou, W. Zimmer, Y . Cui, B. L. Zagar, and A. C. Knoll, “A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024

[8] [8]

Computer vision for autonomous vehicles: Problems, datasets and state of the art,

J. Janai, F. G ¨uney, A. Behl, A. Geigeret al., “Computer vision for autonomous vehicles: Problems, datasets and state of the art,”Founda- tions and trends® in computer graphics and vision, vol. 12, no. 1–3, pp. 1–308, 2020

work page 2020

[9] [9]

Are we hungry for 3d lidar data for semantic segmentation? a survey of datasets and methods,

B. Gao, Y . Pan, C. Li, S. Geng, and H. Zhao, “Are we hungry for 3d lidar data for semantic segmentation? a survey of datasets and methods,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 6063–6081, 2021

work page 2021

[10] [10]

Joseph and A

L. Joseph and A. K. Mondal,Autonomous driving and advanced driver- assistance systems (ADAS): applications, development, legal issues, and testing. CRC Press, 2021

work page 2021

[11] [11]

X-align: Cross-modal cross-view alignment for bird’s-eye-view segmentation,

S. Borse, M. Klingner, V . R. Kumar, H. Cai, A. Almuzairee, S. Yoga- mani, and F. Porikli, “X-align: Cross-modal cross-view alignment for bird’s-eye-view segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 3287–3297

work page 2023

[12] [12]

X3kd: Knowledge distillation across modalities, tasks and stages for multi-camera 3d object detection,

M. Klingner, S. Borse, V . R. Kumar, B. Rezaei, V . Narayanan, S. Yoga- mani, and F. Porikli, “X3kd: Knowledge distillation across modalities, tasks and stages for multi-camera 3d object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 343–13 353

work page 2023

[13] [13]

FisheyeYOLO: Object Detec- tion on Fisheye Cameras for Autonomous Driving,

H. Rashed, E. Mohamed, G. Sistuet al., “FisheyeYOLO: Object Detec- tion on Fisheye Cameras for Autonomous Driving,”Machine Learning for Autonomous Driving NeurIPSW, 2020

work page 2020

[14] [14]

Challenges in de- signing datasets and validation for autonomous driving,

M. U ˇriˇc´aˇr., D. Hurych., P. Kˇr´ıˇzek, and S. Yogamani., “Challenges in de- signing datasets and validation for autonomous driving,” inProceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP: VISAPP ,. SciTePress, 2019, pp. 653–659

work page 2019

[15] [15]

Artificial intelligence in automated driving: an analysis of safety and cybersecurity challenges,

R. HAMON, H. JUNKLEWITZ, M. J. I. SANCHEZ, L. D. FERNAN- DEZ, G. E. GOMEZ, A. A. HERRERA, A. KRISTONet al., “Artificial intelligence in automated driving: an analysis of safety and cybersecurity challenges,” 2022

work page 2022

[16] [16]

A survey on autonomous driving datasets,

W. Liu, Q. Dong, P. Wang, G. Yang, L. Meng, Y . Song, Y . Shi, and Y . Xue, “A survey on autonomous driving datasets,” in2021 8th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 2021, pp. 399–407

work page 2021

[17] [17]

Open-sourced data ecosystem in autonomous driving: the present and future,

H. Li, Y . Li, H. Wang, J. Zeng, H. Xu, P. Cai, L. Chen, J. Yan, F. Xu, L. Xionget al., “Open-sourced data ecosystem in autonomous driving: the present and future,”arXiv preprint arXiv:2312.03408, 2023

work page arXiv 2023

[18] [18]

Synthetic datasets for autonomous driving: A survey,

Z. Song, Z. He, X. Li, Q. Ma, R. Ming, Z. Mao, H. Pei, L. Peng, J. Hu, D. Yaoet al., “Synthetic datasets for autonomous driving: A survey,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 1847–1864, 2023

work page 2023

[19] [19]

Perception datasets for anomaly detection in autonomous driving: A survey,

D. Bogdoll, S. Uhlemeyer, K. Kowol, and J. M. Z ¨ollner, “Perception datasets for anomaly detection in autonomous driving: A survey,” in 2023 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2023, pp. 1–8

work page 2023

[20] [20]

A survey on datasets for the decision making of autonomous vehicles,

Y . Wang, Z. Han, Y . Xing, S. Xu, and J. Wang, “A survey on datasets for the decision making of autonomous vehicles,”IEEE Intelligent Transportation Systems Magazine, vol. 16, no. 2, pp. 23–40, 2024

work page 2024

[21] [21]

Aide: An automatic data engine for object detection in autonomous driving,

M. Liang, J.-C. Su, S. Schulter, S. Garg, S. Zhao, Y . Wu, and M. Chandraker, “Aide: An automatic data engine for object detection in autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 695–14 706

work page 2024

[22] [22]

Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies,

L. Li, W. Shao, W. Dong, Y . Tian, Q. Zhang, K. Yang, and W. Zhang, “Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies,”arXiv preprint arXiv:2401.12888, 2024. 14 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

work page arXiv 2024

[23] [23]

Tesla ai day 2022,

Tesla, “Tesla ai day 2022,” 2022. [Online]. Available: https: //www.youtube.com/watch?v=ODSJsviD SU

work page 2022

[24] [24]

Upgrading your fleet into an av data engine - scale ai,

S. AI, “Upgrading your fleet into an av data engine - scale ai,” 2023. [Online]. Available: https://www.youtube.com/watch?v=lbOoXI1EeEs

work page 2023

[25] [25]

The aurora data engine—advancing the aurora driver through valuable data that drives machine learning,

Aurora, “The aurora data engine—advancing the aurora driver through valuable data that drives machine learning,” 2021. [Online]. Available: https://www.youtube.com/watch?v=Xe8YtdkMkS8

work page 2021

[26] [26]

Momenta at cvpr 2023: How data-driven flywheel enables scalable path to full autonomy,

Momenta, “Momenta at cvpr 2023: How data-driven flywheel enables scalable path to full autonomy,” 2023. [Online]. Available: https://www.youtube.com/watch?v=tNpEeIyuiJs

work page 2023

[27] [27]

Maptr: Structured modeling and learning for online vectorized hd map construction

B. Liao, S. Chen, X. Wang, T. Cheng, Q. Zhang, W. Liu, and C. Huang, “Maptr: Structured modeling and learning for online vectorized hd map construction,”arXiv preprint arXiv:2208.14437, 2022

work page arXiv 2022

[28] [28]

Planning-oriented autonomous driving,

Y . Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wanget al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 17 853–17 862

work page 2023

[29] [29]

Vad: Vectorized scene representation for efficient autonomous driving,

B. Jiang, S. Chen, Q. Xu, B. Liao, J. Chen, H. Zhou, Q. Zhang, W. Liu, C. Huang, and X. Wang, “Vad: Vectorized scene representation for efficient autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8340–8350

work page 2023

[30] [30]

Para- drive: Parallelized architecture for real-time autonomous driving,

X. Weng, B. Ivanovic, Y . Wang, Y . Wang, and M. Pavone, “Para- drive: Parallelized architecture for real-time autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 449–15 458

work page 2024

[31] [31]

Lingo-2: Driving with natural language,

W. R. Teamet al., “Lingo-2: Driving with natural language,” 2024

work page 2024

[32] [32]

Navigation-guided sparse scene representation for end-to-end autonomous driving,

P. Li and D. Cui, “Navigation-guided sparse scene representation for end-to-end autonomous driving,” inThe Thirteenth International Con- ference on Learning Representations

work page

[33] [33]

Li, Adrien Bardes, Suzanne Petryk, Oscar Ma ˜nas, et al

F. Bordes, R. Y . Pang, A. Ajay, A. C. Li, A. Bardes, S. Petryk, O. Ma˜nas, Z. Lin, A. Mahmoud, B. Jayaramanet al., “An introduction to vision- language modeling,”arXiv preprint arXiv:2405.17247, 2024

work page arXiv 2024

[34] [34]

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

X. Tian, J. Gu, B. Li, Y . Liu, Y . Wang, Z. Zhao, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Drivevlm: The convergence of autonomous driving and large vision-language models,”arXiv preprint arXiv:2402.12289, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

Covla: Comprehensive vision-language-action dataset for autonomous driving,

H. Arai, K. Miwa, K. Sasaki, K. Watanabe, Y . Yamaguchi, S. Aoki, and I. Yamamoto, “Covla: Comprehensive vision-language-action dataset for autonomous driving,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 1933–1943

work page 2025

[36] [36]

A survey on sensor selection and placement for connected and automated mobility,

M. Kiraz, F. Sivrikaya, and S. Albayrak, “A survey on sensor selection and placement for connected and automated mobility,”IEEE Open Journal of Intelligent Transportation Systems, 2024

work page 2024

[37] [37]

A concept for requirements-driven identification and mitigation of dataset gaps for perception tasks in automated driving

M. S. Moustafa, M. Bieshaar, A. Albrecht, and B. Sick, “A concept for requirements-driven identification and mitigation of dataset gaps for perception tasks in automated driving.”

work page

[38] [38]

Semantic-aware video compres- sion for automotive cameras,

Y . Wang, P. H. Chan, and V . Donzella, “Semantic-aware video compres- sion for automotive cameras,”IEEE Transactions on Intelligent Vehicles, vol. 8, no. 6, pp. 3712–3722, 2023

work page 2023

[39] [39]

A survey on data compression techniques for automotive lidar point clouds,

R. Roriz, H. Silva, F. Dias, and T. Gomes, “A survey on data compression techniques for automotive lidar point clouds,”Sensors, vol. 24, no. 10, p. 3185, 2024

work page 2024

[40] [40]

Navya3dseg- navya 3d semantic segmentation dataset design & split generation for autonomous vehicles,

A. Almin, L. Lemari ´e, A. Duong, and B. R. Kiran, “Navya3dseg- navya 3d semantic segmentation dataset design & split generation for autonomous vehicles,”IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5584–5591, 2023

work page 2023

[41] [41]

Leakage in data mining: Formulation, detection, and avoidance,

S. Kaufman, S. Rosset, C. Perlich, and O. Stitelman, “Leakage in data mining: Formulation, detection, and avoidance,”ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 6, no. 4, pp. 1–21, 2012

work page 2012

[42] [42]

D-lede: A data leakage detection method for automotive perception systems

M. A. A. Babu, S. K. Pandey, D. Durisic, A. C. Koppisetty, and M. Staron, “D-lede: A data leakage detection method for automotive perception systems.”

work page

[43] [43]

Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it,

A. Lilja, J. Fu, E. Stenborg, and L. Hammarstrand, “Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 22 150–22 159

work page 2024

[44] [44]

Zopp: A framework of zero-shot offboard panoptic perception for autonomous driving,

T. Ma, H. Zhou, Q. Huang, X. Yang, J. Guo, B. Zhang, M. Dou, Y . Qiao, B. Shi, and H. Li, “Zopp: A framework of zero-shot offboard panoptic perception for autonomous driving,”Advances in Neural Information Processing Systems, vol. 37, pp. 140 266–140 291, 2024

work page 2024

[45] [45]

Run-time introspection of 2d object detection in automated driving systems using learning representations,

H. Y . Yatbaz, M. Dianati, K. Koufos, and R. Woodman, “Run-time introspection of 2d object detection in automated driving systems using learning representations,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 6, pp. 5033–5046, 2024

work page 2024

[46] [46]

Objectlab: Automated diagnosis of mislabeled images in object detection data,

U. Tkachenko, A. Thyagarajan, and J. Mueller, “Objectlab: Automated diagnosis of mislabeled images in object detection data,”arXiv preprint arXiv:2309.00832, 2023

work page arXiv 2023

[47] [47]

Delving into localization errors for monocular 3d object detection,

X. Ma, Y . Zhang, D. Xu, D. Zhou, S. Yi, H. Li, and W. Ouyang, “Delving into localization errors for monocular 3d object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4721–4730

work page 2021

[48] [48]

Key safety design overview in ai-driven autonomous vehicles,

V . Vyas and Z. Xu, “Key safety design overview in ai-driven autonomous vehicles,”arXiv preprint arXiv:2412.08862, 2024

work page arXiv 2024

[49] [49]

A comprehensive review on traffic datasets and simulators for autonomous vehicles,

S. Sarker, B. Maples, I. Islam, M. Fan, C. Papadopoulos, and W. Li, “A comprehensive review on traffic datasets and simulators for autonomous vehicles,”arXiv preprint arXiv:2412.14207, 2024

work page arXiv 2024

[50] [50]

A systematic digital engineering approach to verification & validation of autonomous ground vehicles in off-road environments,

T. Vilas Samak, C. Vilas Samak, J. Brault, C. Harber, K. McCane, J. Smereka, M. Brudnak, D. Gorsich, and V . Krovi, “A systematic digital engineering approach to verification & validation of autonomous ground vehicles in off-road environments,”arXiv e-prints, pp. arXiv– 2503, 2025

work page 2025

[51] [51]

Road vehicles – safety and artificial intelligence,

I. O. for Standardization, “Road vehicles – safety and artificial intelligence,” ISO/PAS Standard No. 8800:2024, 2024. [Online]. Available: https://www.iso.org/standard/83303.html

work page 2024

[52] [52]

Are we ready for autonomous driving? the kitti vision benchmark suite,

A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361

work page 2012

[53] [53]

Carla: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “Carla: An open urban driving simulator,” inConference on robot learning. PMLR, 2017, pp. 1–16

work page 2017

[54] [54]

Lgsvl simulator: A high fidelity simulator for autonomous driving,

G. Rong, B. H. Shin, H. Tabatabaee, Q. Lu, S. Lemke, M. Mo ˇzeiko, E. Boise, G. Uhm, M. Gerow, S. Mehtaet al., “Lgsvl simulator: A high fidelity simulator for autonomous driving,” in2020 IEEE 23rd International conference on intelligent transportation systems (ITSC). IEEE, 2020, pp. 1–6

work page 2020

[55] [55]

A survey on image data augmen- tation for deep learning,

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmen- tation for deep learning,”Journal of big data, vol. 6, no. 1, pp. 1–48, 2019

work page 2019

[56] [56]

Bdd100k: A diverse driving dataset for heterogeneous multitask learning,

F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636–2645

work page 2020

[57] [57]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

work page 2020

[58] [58]

Visualizing data using t-sne

L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.”Journal of machine learning research, vol. 9, no. 11, 2008

work page 2008

[59] [59]

Comparing the benefits of pseudonymi- sation and anonymisation under the gdpr,

M. Hintze and K. El Emam, “Comparing the benefits of pseudonymi- sation and anonymisation under the gdpr,”Journal of Data Protection & Privacy, vol. 2, no. 2, pp. 145–158, 2018

work page 2018

[60] [60]

A unifying view on dataset shift in classification,

J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodr ´ıguez, N. V . Chawla, and F. Herrera, “A unifying view on dataset shift in classification,”Pattern recognition, vol. 45, no. 1, pp. 521–530, 2012

work page 2012

[61] [61]

Deep learning,

I. Goodfellow, “Deep learning,” 2016

work page 2016

[62] [62]

Scalability in perception for autonomous driving: Waymo open dataset,

P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454

work page 2020

[63] [63]

Processing, assess- ing, and enhancing the waymo autonomous vehicle open dataset for driving behavior research,

X. Hu, Z. Zheng, D. Chen, X. Zhang, and J. Sun, “Processing, assess- ing, and enhancing the waymo autonomous vehicle open dataset for driving behavior research,”Transportation Research Part C: Emerging Technologies, vol. 134, p. 103490, 2022

work page 2022

[64] [64]

Semi-automatic framework for traffic landmark annotation,

W. H. Lee, K. Jung, C. Kang, and H. S. Chang, “Semi-automatic framework for traffic landmark annotation,”IEEE Open Journal of Intelligent Transportation Systems, vol. 2, pp. 1–12, 2021

work page 2021

[65] [65]

Understanding the effectiveness of lossy compression in machine learning training sets,

R. Underwood, J. C. Calhoun, S. Di, and F. Cappello, “Understanding the effectiveness of lossy compression in machine learning training sets,” arXiv preprint arXiv:2403.15953, 2024

work page arXiv 2024

[66] [66]

Fast error-bounded lossy hpc data compression with sz,

S. Di and F. Cappello, “Fast error-bounded lossy hpc data compression with sz,” in2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 2016, pp. 730–739

work page 2016

[67] [67]

Iterative compression towards in-distribution features in domain generalization,

Y . Jiang, T. Zhang, Y . Li, G. Chen, and F. Chen, “Iterative compression towards in-distribution features in domain generalization,”Neurocom- puting, vol. 638, p. 130011, 2025

work page 2025

[68] [68]

Deep-learning-based image com- pression for microscopy images: An empirical study,

Y . Zhou, J. Sollmann, and J. Chen, “Deep-learning-based image com- pression for microscopy images: An empirical study,”Biological Imag- ing, vol. 4, p. e16, 2024

work page 2024

[69] [69]

Operability studies and hazard analysis,

H. Lawley, “Operability studies and hazard analysis,”Chem. Eng. Prog., vol. 70, no. 4, pp. 45–56, 1974. ABBASPOURet al.: DATASET SAFETY IN AUTONOMOUS DRIVING: REQUIREMENTS, RISKS, AND ASSURANCE 15

work page 1974

[70] [70]

A hierarchical hazop-like safety analysis for learning-enabled systems,

Y . Qi, P. R. Conmy, W. Huang, X. Zhao, and X. Huang, “A hierarchical hazop-like safety analysis for learning-enabled systems,”arXiv preprint arXiv:2206.10216, 2022

work page arXiv 2022

[71] [71]

Dataset fault tree analysis for systematic evaluation of machine learning systems,

T. Aoki, D. Kawakami, N. Chida, and T. Tomita, “Dataset fault tree analysis for systematic evaluation of machine learning systems,” in 2020 IEEE 25th Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 2020, pp. 100–109

work page 2020

[72] [72]

Introducing the ml fmea,

P. Schmitt, H. B. Seifert, M. Bijelic, K. Pennar, J. Lopez, and F. Heide, “Introducing the ml fmea,” SAE Technical Paper, Tech. Rep., 2025

work page 2025

[73] [73]

Stpa for learning-enabled systems: a survey and a new practice,

Y . Qi, Y . Dong, S. Khastgir, P. Jennings, X. Zhao, and X. Huang, “Stpa for learning-enabled systems: a survey and a new practice,” in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2023, pp. 1381–1388

work page 2023

[74] [74]

Collaborative perception in autonomous driving: Methods, datasets, and challenges,

Y . Han, H. Zhang, H. Li, Y . Jin, C. Lang, and Y . Li, “Collaborative perception in autonomous driving: Methods, datasets, and challenges,” IEEE Intelligent Transportation Systems Magazine, vol. 15, no. 6, pp. 131–151, 2023

work page 2023