Data Leakage in Automotive Perception: Practitioners' Insights
Pith reviewed 2026-05-10 17:08 UTC · model grok-4.3
The pith
Data leakage in automotive perception systems is managed as a socio-technical coordination issue spread across engineering roles and workflows.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Reflexive thematic analysis of the interviews shows that practitioners' understanding of data leakage fragments along role lines, detection arises reactively from anomalies, and mitigation relies on experiential and collaborative practices. The resulting insight is that leakage control constitutes a socio-technical coordination problem distributed across roles and workflows in automotive perception development.
What carries the argument
Reflexive thematic analysis of semi-structured interviews with ten automotive perception engineers, used to map role-specific conceptualizations and mitigation approaches to data leakage.
Load-bearing premise
The ten interviewed engineers represent typical industry views on data leakage and the thematic analysis captures their perceptions without major distortion from selection or researcher interpretation.
What would settle it
A larger-scale survey or observational study across multiple automotive companies showing uniform technical understanding of data leakage and reliance on standardized detection tools across all roles would contradict the reported fragmentation and experiential patterns.
Figures
read the original abstract
Data leakage is the inadvertent transfer of information between training and evaluation datasets that poses a subtle, yet critical, risk to the reliability of machine learning (ML) models in safety-critical systems such as automotive perception. While leakage is widely recognized in research, little is known about how industrial practitioners actually perceive and manage it in practice. This study investigates practitioners' knowledge, experiences, and mitigation strategies around data leakage through ten semi-structured interviews with system design, development, and verification engineers working on automotive perception functions development. Using reflexive thematic analysis, we identify that knowledge of data leakage is widespread and fragmented along role boundaries: ML engineers conceptualize it as a data-splitting or validation issue, whereas design and verification roles interpret it in terms of representativeness and scenario coverage. Detection commonly arises through generic considerations and observed performance anomalies rather than implying specific tools. However, data leakage prevention is more commonly practiced, which depends mostly on experience and knowledge sharing. These findings suggest that leakage control is a socio-technical coordination problem distributed across roles and workflows. We discuss implications for ML reliability engineering, highlighting the need for shared definitions, traceable data practices, and continuous cross-role communication to institutionalize data leakage awareness within automotive ML development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from ten semi-structured interviews with system design, development, and verification engineers working on automotive perception functions. Using reflexive thematic analysis, it identifies fragmented knowledge of data leakage across roles (ML engineers framing it as a data-splitting/validation issue; design/verification roles framing it as representativeness and scenario coverage), ad-hoc detection via performance anomalies rather than dedicated tools, and prevention practices that rely primarily on experience and knowledge sharing. The central claim is that data leakage control constitutes a socio-technical coordination problem distributed across roles and workflows, with implications for ML reliability engineering including needs for shared definitions, traceable data practices, and cross-role communication.
Significance. If the identified themes hold, the work supplies practitioner-grounded evidence that data leakage in safety-critical automotive ML is not solely a technical data-management issue but one requiring organizational coordination. This aligns with and extends existing calls for socio-technical approaches in ML reliability. The use of reflexive thematic analysis is a strength, as it explicitly incorporates researcher positionality in interpreting perceptions. However, the small non-random sample constrains the strength of any industry-wide suggestions.
major comments (2)
- [§3] §3 (Methodology): The description of participant recruitment, sampling strategy (e.g., convenience vs. purposive), inclusion/exclusion criteria, role distribution among the ten interviewees, and any saturation or validation procedures (member checking, audit trails) is insufficiently detailed. This is load-bearing for the central claim because the suggestion that leakage control is a 'socio-technical coordination problem distributed across roles and workflows' depends on the themes reflecting genuine industry fragmentation rather than selection or interpretive artifacts from a small convenience sample.
- [§4, §5] §4 (Findings) and §5 (Discussion): The claim that 'knowledge of data leakage is widespread and fragmented along role boundaries' is presented without quantitative indicators of theme prevalence (e.g., number of participants per role endorsing each sub-theme) or direct quotes tied to specific roles. This weakens the evidential link between the interview data and the distributed-coordination conclusion.
minor comments (2)
- [Abstract] Abstract: The abstract states the sample size and method but omits even high-level demographics (e.g., years of experience, company types) that would help readers gauge scope; adding one sentence would improve transparency without altering length.
- [§2] §2 (Related Work): The positioning against prior leakage literature is clear, but a brief reference to existing socio-technical studies in automotive ML (e.g., on safety case development) would strengthen the novelty claim.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help us improve the transparency and evidential grounding of the manuscript. We address each major comment below and will incorporate revisions to strengthen the methodology and findings sections.
read point-by-point responses
-
Referee: [§3] §3 (Methodology): The description of participant recruitment, sampling strategy (e.g., convenience vs. purposive), inclusion/exclusion criteria, role distribution among the ten interviewees, and any saturation or validation procedures (member checking, audit trails) is insufficiently detailed. This is load-bearing for the central claim because the suggestion that leakage control is a 'socio-technical coordination problem distributed across roles and workflows' depends on the themes reflecting genuine industry fragmentation rather than selection or interpretive artifacts from a small convenience sample.
Authors: We agree that the current description of the methodology is insufficiently detailed and that greater transparency is needed to support the central claim. In the revised manuscript we will expand §3 to specify the recruitment process, the sampling strategy (purposive sampling of practitioners with direct experience in automotive perception), the inclusion criteria (current or recent involvement in ML-based perception functions), exclusion criteria (no direct exposure to perception development), the role distribution across the ten participants, and the reflexive thematic analysis procedures including how thematic saturation was assessed through iterative coding and reflexive positionality checks. These additions will clarify that the observed role-based fragmentation derives from the interview data rather than sampling artifacts, while retaining our existing acknowledgment of the small, non-random sample as a limitation. revision: yes
-
Referee: [§4, §5] §4 (Findings) and §5 (Discussion): The claim that 'knowledge of data leakage is widespread and fragmented along role boundaries' is presented without quantitative indicators of theme prevalence (e.g., number of participants per role endorsing each sub-theme) or direct quotes tied to specific roles. This weakens the evidential link between the interview data and the distributed-coordination conclusion.
Authors: We accept that the evidential link can be made more explicit. In the revision we will add direct quotes from participants, explicitly attributing each quote to the interviewee's role (ML engineer, design engineer, or verification engineer) to illustrate the differing framings of data leakage. We will also indicate theme prevalence by role where the data support it (e.g., noting that data-splitting conceptualizations were endorsed by all ML engineers while scenario-coverage framings appeared primarily among design and verification roles). As a reflexive thematic analysis rather than a quantitative survey, we will avoid overstating numerical counts but will use these additions to strengthen the connection between the empirical material and the socio-technical coordination conclusion. revision: yes
Circularity Check
No significant circularity; findings are direct outputs of interview analysis
full rationale
The paper contains no mathematical derivations, equations, fitted parameters, or predictive models. Its central claim—that data leakage control is a socio-technical coordination problem—arises solely from reflexive thematic analysis of the ten interview transcripts. No self-citations, ansatzes, or uniqueness theorems are invoked to justify the result; the derivation chain is self-contained as empirical interpretation of primary data without reduction to prior inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Reflexive thematic analysis produces valid representations of practitioners' knowledge and experiences.
Reference graph
Works this paper leans on
-
[1]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300
work page 2019
-
[2]
Md Abu Ahammed Babu, Sushant Kumar Pandey, Darko Durisic, Ashok Chai- tanya Koppisetty, and Miroslaw Staron. 2025. D-LeDe: A Data Leakage Detection Method for Automotive Perception Systems. In11th International Conference on Vehicle Technology and Intelligent Transport Systems, VEHITS 2025. Science and Technology Publications, Lda, 210–221
work page 2025
-
[3]
Markus Borg, Cristofer Englund, Krzysztof Wnuk, Boris Duran, Christoffer Levandowski, Shenjian Gao, Yanwen Tan, Henrik Kaijser, Henrik Lönn, and Jonas Törnqvist. 2020. Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry.Journal of Automotive Software Engineering1, 1 ...
work page 2020
-
[4]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101
work page 2006
-
[5]
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D Sculley. 2017. The ML test score: A rubric for ML production readiness and technical debt reduction. In2017 IEEE international conference on big data (big data). IEEE, 1123–1132
work page 2017
-
[6]
Steve Campbell, Melanie Greenwood, Sarah Prior, Toniele Shearer, Kerrie Walkem, Sarah Young, Danielle Bywaters, and Kim Walker. 2020. Purposive sampling: complex or simple? Research case examples.Journal of research in Nursing25, 8 (2020), 652–661
work page 2020
-
[7]
Erwin de Gelder, Maren Buermann, and Olaf Op Den Camp. 2024. Coverage Metrics for a Scenario Database for the Scenario-Based Assessment of Auto- mated Driving Systems. In2024 IEEE International Automated Vehicle Validation Conference (IA VVC). IEEE, 1–8
work page 2024
-
[8]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The faiss library.IEEE Transactions on Big Data(2025)
work page 2025
-
[9]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM64, 12 (2021), 86–92
work page 2021
-
[10]
Greg Guest, Arwen Bunce, and Laura Johnson. 2006. How many interviews are enough? An experiment with data saturation and variability.Field methods18, 1 (2006), 59–82
work page 2006
-
[11]
Markus Hafner, Maria Katsantoni, Tino Köster, James Marks, Joyita Mukherjee, Dorothee Staiger, Jernej Ule, and Mihaela Zavolan. 2021. CLIP and complementary methods.Nature Reviews Methods Primers1, 1 (2021), 20
work page 2021
-
[12]
Monique M Hennink, Bonnie N Kaiser, and Vincent C Marconi. 2017. Code satu- ration versus meaning saturation: how many interviews are enough?Qualitative health research27, 4 (2017), 591–608
work page 2017
-
[13]
Hans-Martin Heyn, Khan Mohammad Habibullah, Eric Knauss, Jennifer Horkoff, Markus Borg, Alessia Knauss, and Polly Jing Li. 2023. Automotive perception software development: An empirical investigation into data, annotation, and ecosystem challenges.arXiv preprint arXiv:2303.05947(2023)
-
[14]
Michael Hoss, Maike Scholtes, and Lutz Eckstein. 2022. A review of testing object-based environment perception for safe automated driving.Automotive Innovation5, 3 (2022), 223–250
work page 2022
-
[15]
2018.ISO 26262:2018 (all parts), Road Vehicles — Functional Safety
International Organization for Standardization. 2018.ISO 26262:2018 (all parts), Road Vehicles — Functional Safety. Standard. International Organization for Standardization
work page 2018
-
[16]
Narendra Kandregula. 2020. Exploring Software-Defined Vehicles: A Comparative Analysis of AI and ML Models for Enhanced Autonomy and Performance. (2020)
work page 2020
-
[17]
Sayash Kapoor and Arvind Narayanan. 2023. Leakage and the reproducibility crisis in machine-learning-based science.Patterns4, 9 (2023)
work page 2023
-
[18]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. InProceedings of the conference on fairness, accountability, and transparency. 220–229
work page 2019
-
[19]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2019. Do imagenet classifiers generalize to imagenet?. InInternational conference on machine learning. PMLR, 5389–5400
work page 2019
-
[21]
Francisca Rosique, Pedro J Navarro, Carlos Fernández, and Antonio Padilla. 2019. A systematic review of perception system and simulators for autonomous vehicles research.Sensors19, 3 (2019), 648
work page 2019
-
[22]
Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empirical software engineering14, 2 (2009), 131–164
work page 2009
-
[23]
L Sasse, E Nicolaisen-Sobesky, J Dukart, SB Eickhoff, M Götz, S Hamdan, V Komeyer, A Kulkarni, JM Lahnakoski, Bradley Carl Love, et al. 2025. Overview of leakage scenarios in supervised machine learning.Journal of Big Data12, 1 (2025), 135
work page 2025
-
[24]
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Diet- mar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems.Advances in neural information processing systems28 (2015)
work page 2015
-
[25]
Alex Serban, Koen Van der Blom, Holger Hoos, and Joost Visser. 2020. Adoption and effects of software engineering best practices in machine learning. InPro- ceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12
work page 2020
-
[26]
Alex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. 2024. Soft- ware engineering practices for machine learning—Adoption, effects, and team assessment.Journal of Systems and Software209 (2024), 111907
work page 2024
-
[27]
Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, and Yannis Avrithis. 2024. On train-test class overlap and detection for image retrieval. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. 17375–17384
work page 2024
-
[28]
KK Thyagharajan and G Kalaiarasi. 2021. A review on near-duplicate detection of images using computer vision techniques.Archives of Computational Methods in Engineering28, 3 (2021), 897–916
work page 2021
-
[29]
Allison Tong, Peter Sainsbury, and Jonathan Craig. 2007. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups.International journal for quality in health care19, 6 (2007), 349–357
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.