pith. sign in

arxiv: 2604.06899 · v1 · submitted 2026-04-08 · 💻 cs.CR · cs.LG· cs.SE

Data Leakage in Automotive Perception: Practitioners' Insights

Pith reviewed 2026-05-10 17:08 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.SE
keywords data leakageautomotive perceptionmachine learningpractitioner interviewssocio-technical systemsthematic analysissafety-critical systemsrole coordination
0
0 comments X

The pith

Data leakage in automotive perception systems is managed as a socio-technical coordination issue spread across engineering roles and workflows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how industrial practitioners perceive and address data leakage in automotive machine learning through ten semi-structured interviews with engineers in system design, development, and verification. Knowledge of leakage is found to be widespread but interpreted differently by role, with ML engineers viewing it primarily as a data-splitting or validation concern and design or verification roles focusing on representativeness and scenario coverage. Detection typically occurs through generic checks or observed performance problems rather than dedicated tools, while prevention depends on experience and informal knowledge sharing. These patterns lead to the central conclusion that effective leakage control requires cross-role coordination and institutional practices rather than isolated technical steps.

Core claim

Reflexive thematic analysis of the interviews shows that practitioners' understanding of data leakage fragments along role lines, detection arises reactively from anomalies, and mitigation relies on experiential and collaborative practices. The resulting insight is that leakage control constitutes a socio-technical coordination problem distributed across roles and workflows in automotive perception development.

What carries the argument

Reflexive thematic analysis of semi-structured interviews with ten automotive perception engineers, used to map role-specific conceptualizations and mitigation approaches to data leakage.

Load-bearing premise

The ten interviewed engineers represent typical industry views on data leakage and the thematic analysis captures their perceptions without major distortion from selection or researcher interpretation.

What would settle it

A larger-scale survey or observational study across multiple automotive companies showing uniform technical understanding of data leakage and reliance on standardized detection tools across all roles would contradict the reported fragmentation and experiential patterns.

Figures

Figures reproduced from arXiv: 2604.06899 by Andras Balint, Darko Durisic, Md Abu Ahammed Babu, Miroslaw Staron, Sushant Kumar Pandey.

Figure 1
Figure 1. Figure 1: A graphical overview of the study. not designed for data-driven artifacts and offer limited prescrip￾tive guidance on preventing dataset contamination [3]. Developers therefore operate at the intersection of two practices: rigorous, standards-driven verification and validation (V&V) on one hand, and the exploratory data engineering realities of ML on the other. Several applied studies in automotive percept… view at source ↗
Figure 2
Figure 2. Figure 2: A mindmap of the main themes and sub-themes produced from the participants’ responses and also motivated by the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Data leakage is the inadvertent transfer of information between training and evaluation datasets that poses a subtle, yet critical, risk to the reliability of machine learning (ML) models in safety-critical systems such as automotive perception. While leakage is widely recognized in research, little is known about how industrial practitioners actually perceive and manage it in practice. This study investigates practitioners' knowledge, experiences, and mitigation strategies around data leakage through ten semi-structured interviews with system design, development, and verification engineers working on automotive perception functions development. Using reflexive thematic analysis, we identify that knowledge of data leakage is widespread and fragmented along role boundaries: ML engineers conceptualize it as a data-splitting or validation issue, whereas design and verification roles interpret it in terms of representativeness and scenario coverage. Detection commonly arises through generic considerations and observed performance anomalies rather than implying specific tools. However, data leakage prevention is more commonly practiced, which depends mostly on experience and knowledge sharing. These findings suggest that leakage control is a socio-technical coordination problem distributed across roles and workflows. We discuss implications for ML reliability engineering, highlighting the need for shared definitions, traceable data practices, and continuous cross-role communication to institutionalize data leakage awareness within automotive ML development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports results from ten semi-structured interviews with system design, development, and verification engineers working on automotive perception functions. Using reflexive thematic analysis, it identifies fragmented knowledge of data leakage across roles (ML engineers framing it as a data-splitting/validation issue; design/verification roles framing it as representativeness and scenario coverage), ad-hoc detection via performance anomalies rather than dedicated tools, and prevention practices that rely primarily on experience and knowledge sharing. The central claim is that data leakage control constitutes a socio-technical coordination problem distributed across roles and workflows, with implications for ML reliability engineering including needs for shared definitions, traceable data practices, and cross-role communication.

Significance. If the identified themes hold, the work supplies practitioner-grounded evidence that data leakage in safety-critical automotive ML is not solely a technical data-management issue but one requiring organizational coordination. This aligns with and extends existing calls for socio-technical approaches in ML reliability. The use of reflexive thematic analysis is a strength, as it explicitly incorporates researcher positionality in interpreting perceptions. However, the small non-random sample constrains the strength of any industry-wide suggestions.

major comments (2)
  1. [§3] §3 (Methodology): The description of participant recruitment, sampling strategy (e.g., convenience vs. purposive), inclusion/exclusion criteria, role distribution among the ten interviewees, and any saturation or validation procedures (member checking, audit trails) is insufficiently detailed. This is load-bearing for the central claim because the suggestion that leakage control is a 'socio-technical coordination problem distributed across roles and workflows' depends on the themes reflecting genuine industry fragmentation rather than selection or interpretive artifacts from a small convenience sample.
  2. [§4, §5] §4 (Findings) and §5 (Discussion): The claim that 'knowledge of data leakage is widespread and fragmented along role boundaries' is presented without quantitative indicators of theme prevalence (e.g., number of participants per role endorsing each sub-theme) or direct quotes tied to specific roles. This weakens the evidential link between the interview data and the distributed-coordination conclusion.
minor comments (2)
  1. [Abstract] Abstract: The abstract states the sample size and method but omits even high-level demographics (e.g., years of experience, company types) that would help readers gauge scope; adding one sentence would improve transparency without altering length.
  2. [§2] §2 (Related Work): The positioning against prior leakage literature is clear, but a brief reference to existing socio-technical studies in automotive ML (e.g., on safety case development) would strengthen the novelty claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help us improve the transparency and evidential grounding of the manuscript. We address each major comment below and will incorporate revisions to strengthen the methodology and findings sections.

read point-by-point responses
  1. Referee: [§3] §3 (Methodology): The description of participant recruitment, sampling strategy (e.g., convenience vs. purposive), inclusion/exclusion criteria, role distribution among the ten interviewees, and any saturation or validation procedures (member checking, audit trails) is insufficiently detailed. This is load-bearing for the central claim because the suggestion that leakage control is a 'socio-technical coordination problem distributed across roles and workflows' depends on the themes reflecting genuine industry fragmentation rather than selection or interpretive artifacts from a small convenience sample.

    Authors: We agree that the current description of the methodology is insufficiently detailed and that greater transparency is needed to support the central claim. In the revised manuscript we will expand §3 to specify the recruitment process, the sampling strategy (purposive sampling of practitioners with direct experience in automotive perception), the inclusion criteria (current or recent involvement in ML-based perception functions), exclusion criteria (no direct exposure to perception development), the role distribution across the ten participants, and the reflexive thematic analysis procedures including how thematic saturation was assessed through iterative coding and reflexive positionality checks. These additions will clarify that the observed role-based fragmentation derives from the interview data rather than sampling artifacts, while retaining our existing acknowledgment of the small, non-random sample as a limitation. revision: yes

  2. Referee: [§4, §5] §4 (Findings) and §5 (Discussion): The claim that 'knowledge of data leakage is widespread and fragmented along role boundaries' is presented without quantitative indicators of theme prevalence (e.g., number of participants per role endorsing each sub-theme) or direct quotes tied to specific roles. This weakens the evidential link between the interview data and the distributed-coordination conclusion.

    Authors: We accept that the evidential link can be made more explicit. In the revision we will add direct quotes from participants, explicitly attributing each quote to the interviewee's role (ML engineer, design engineer, or verification engineer) to illustrate the differing framings of data leakage. We will also indicate theme prevalence by role where the data support it (e.g., noting that data-splitting conceptualizations were endorsed by all ML engineers while scenario-coverage framings appeared primarily among design and verification roles). As a reflexive thematic analysis rather than a quantitative survey, we will avoid overstating numerical counts but will use these additions to strengthen the connection between the empirical material and the socio-technical coordination conclusion. revision: yes

Circularity Check

0 steps flagged

No significant circularity; findings are direct outputs of interview analysis

full rationale

The paper contains no mathematical derivations, equations, fitted parameters, or predictive models. Its central claim—that data leakage control is a socio-technical coordination problem—arises solely from reflexive thematic analysis of the ten interview transcripts. No self-citations, ansatzes, or uniqueness theorems are invoked to justify the result; the derivation chain is self-contained as empirical interpretation of primary data without reduction to prior inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that qualitative interview data analyzed via reflexive thematic analysis yields reliable insights into industry practices; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Reflexive thematic analysis produces valid representations of practitioners' knowledge and experiences.
    Invoked implicitly when interpreting interview responses as evidence for role fragmentation and socio-technical nature.

pith-pipeline@v0.9.0 · 5528 in / 1134 out tokens · 40663 ms · 2026-05-10T17:08:50.592292+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

  1. [1]

    Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300

  2. [2]

    Md Abu Ahammed Babu, Sushant Kumar Pandey, Darko Durisic, Ashok Chai- tanya Koppisetty, and Miroslaw Staron. 2025. D-LeDe: A Data Leakage Detection Method for Automotive Perception Systems. In11th International Conference on Vehicle Technology and Intelligent Transport Systems, VEHITS 2025. Science and Technology Publications, Lda, 210–221

  3. [3]

    Markus Borg, Cristofer Englund, Krzysztof Wnuk, Boris Duran, Christoffer Levandowski, Shenjian Gao, Yanwen Tan, Henrik Kaijser, Henrik Lönn, and Jonas Törnqvist. 2020. Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry.Journal of Automotive Software Engineering1, 1 ...

  4. [4]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101

  5. [5]

    Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D Sculley. 2017. The ML test score: A rubric for ML production readiness and technical debt reduction. In2017 IEEE international conference on big data (big data). IEEE, 1123–1132

  6. [6]

    Steve Campbell, Melanie Greenwood, Sarah Prior, Toniele Shearer, Kerrie Walkem, Sarah Young, Danielle Bywaters, and Kim Walker. 2020. Purposive sampling: complex or simple? Research case examples.Journal of research in Nursing25, 8 (2020), 652–661

  7. [7]

    Erwin de Gelder, Maren Buermann, and Olaf Op Den Camp. 2024. Coverage Metrics for a Scenario Database for the Scenario-Based Assessment of Auto- mated Driving Systems. In2024 IEEE International Automated Vehicle Validation Conference (IA VVC). IEEE, 1–8

  8. [8]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The faiss library.IEEE Transactions on Big Data(2025)

  9. [9]

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM64, 12 (2021), 86–92

  10. [10]

    Greg Guest, Arwen Bunce, and Laura Johnson. 2006. How many interviews are enough? An experiment with data saturation and variability.Field methods18, 1 (2006), 59–82

  11. [11]

    Markus Hafner, Maria Katsantoni, Tino Köster, James Marks, Joyita Mukherjee, Dorothee Staiger, Jernej Ule, and Mihaela Zavolan. 2021. CLIP and complementary methods.Nature Reviews Methods Primers1, 1 (2021), 20

  12. [12]

    Monique M Hennink, Bonnie N Kaiser, and Vincent C Marconi. 2017. Code satu- ration versus meaning saturation: how many interviews are enough?Qualitative health research27, 4 (2017), 591–608

  13. [13]

    Hans-Martin Heyn, Khan Mohammad Habibullah, Eric Knauss, Jennifer Horkoff, Markus Borg, Alessia Knauss, and Polly Jing Li. 2023. Automotive perception software development: An empirical investigation into data, annotation, and ecosystem challenges.arXiv preprint arXiv:2303.05947(2023)

  14. [14]

    Michael Hoss, Maike Scholtes, and Lutz Eckstein. 2022. A review of testing object-based environment perception for safe automated driving.Automotive Innovation5, 3 (2022), 223–250

  15. [15]

    2018.ISO 26262:2018 (all parts), Road Vehicles — Functional Safety

    International Organization for Standardization. 2018.ISO 26262:2018 (all parts), Road Vehicles — Functional Safety. Standard. International Organization for Standardization

  16. [16]

    Narendra Kandregula. 2020. Exploring Software-Defined Vehicles: A Comparative Analysis of AI and ML Models for Enhanced Autonomy and Performance. (2020)

  17. [17]

    Sayash Kapoor and Arvind Narayanan. 2023. Leakage and the reproducibility crisis in machine-learning-based science.Patterns4, 9 (2023)

  18. [18]

    Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. InProceedings of the conference on fairness, accountability, and transparency. 220–229

  19. [19]

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193(2023)

  20. [20]

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2019. Do imagenet classifiers generalize to imagenet?. InInternational conference on machine learning. PMLR, 5389–5400

  21. [21]

    Francisca Rosique, Pedro J Navarro, Carlos Fernández, and Antonio Padilla. 2019. A systematic review of perception system and simulators for autonomous vehicles research.Sensors19, 3 (2019), 648

  22. [22]

    Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empirical software engineering14, 2 (2009), 131–164

  23. [23]

    L Sasse, E Nicolaisen-Sobesky, J Dukart, SB Eickhoff, M Götz, S Hamdan, V Komeyer, A Kulkarni, JM Lahnakoski, Bradley Carl Love, et al. 2025. Overview of leakage scenarios in supervised machine learning.Journal of Big Data12, 1 (2025), 135

  24. [24]

    David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Diet- mar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems.Advances in neural information processing systems28 (2015)

  25. [25]

    Alex Serban, Koen Van der Blom, Holger Hoos, and Joost Visser. 2020. Adoption and effects of software engineering best practices in machine learning. InPro- ceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12

  26. [26]

    Alex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. 2024. Soft- ware engineering practices for machine learning—Adoption, effects, and team assessment.Journal of Systems and Software209 (2024), 111907

  27. [27]

    Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, and Yannis Avrithis. 2024. On train-test class overlap and detection for image retrieval. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. 17375–17384

  28. [28]

    KK Thyagharajan and G Kalaiarasi. 2021. A review on near-duplicate detection of images using computer vision techniques.Archives of Computational Methods in Engineering28, 3 (2021), 897–916

  29. [29]

    Allison Tong, Peter Sainsbury, and Jonathan Craig. 2007. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups.International journal for quality in health care19, 6 (2007), 349–357