pith. sign in

arxiv: 2604.21599 · v1 · submitted 2026-04-23 · 💻 cs.SE · cs.LG

Verifying Machine Learning Interpretability Requirements through Provenance

Pith reviewed 2026-05-09 20:50 UTC · model grok-4.3

classification 💻 cs.SE cs.LG
keywords machine learninginterpretabilityprovenancerequirements engineeringnon-functional requirementsfunctional requirementsML engineering
0
0 comments X

The pith

Saving model and data provenance during machine learning development creates measurable functional requirements that verify interpretability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that interpretability, treated as a non-functional requirement in machine learning, can be verified by recording provenance information about models and data. This recording turns opaque model behavior into a set of specific, checkable functional requirements whose satisfaction confirms the broader interpretability goal. A sympathetic reader would care because it supplies a concrete verification path for a quality that existing literature leaves unmeasurable. The approach imports ideas from requirements engineering to increase rigor in machine learning engineering.

Core claim

The paper claims that saving various types of model and data provenance makes the model's behavior transparent and interpretable. This data forms the basis of quantifiable functional requirements whose verification in turn verifies the interpretability non-functional requirement.

What carries the argument

ML provenance, consisting of saved records of model and data details, serves as the central mechanism that renders behavior transparent and supports the creation of verifiable functional requirements.

If this is right

  • Engineers obtain a practical method to verify interpretability non-functional requirements for machine learning models.
  • Quantifiable functional requirements derived from provenance become the operational checks that stand in for the abstract interpretability goal.
  • Machine learning development gains a verification technique drawn from requirements engineering.
  • Transparency of model behavior increases when provenance is systematically saved.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same provenance approach might apply to verifying other machine learning non-functional requirements such as fairness or robustness.
  • Embedding provenance capture into standard machine learning pipelines could narrow the gap between machine learning practice and traditional software engineering.
  • Empirical tests in deployed systems could show whether verified functional requirements actually improve human understanding of model outputs.

Load-bearing premise

The premise that recording provenance data will make model behavior transparent and interpretable enough for functional-requirement checks to confirm the non-functional interpretability requirement.

What would settle it

An ML model in which all relevant provenance is recorded and the derived functional requirements are verified, yet experts or users still cannot interpret the model's decisions.

Figures

Figures reproduced from arXiv: 2604.21599 by Daryela Cisneros, Juan Couder, Lynn Vonderhaar, Omar Ochoa.

Figure 1
Figure 1. Figure 1: The three “Starting Point” classes and some of their subclasses in PROV [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example Python code for saving provenance data. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Section of the linear regression provenance graph [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Model parameters saved to the provenance graph [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Machine Learning (ML) Engineering is a growing field that necessitates an increase in the rigor of ML development. It draws many ideas from software engineering and more specifically, from requirements engineering. Existing literature on ML Engineering defines quality models and Non-Functional Requirements (NFRs) specific to ML, in particular interpretability being one such NFR. However, a major challenge occurs in verifying ML NFRs, including interpretability. Although existing literature defines interpretability in terms of ML, it remains an immeasurable requirement, making it impossible to definitively confirm whether a model meets its interpretability requirement. This paper shows how ML provenance can be used to verify ML interpretability requirements. This work provides an approach for how ML engineers can save various types of model and data provenance to make the model's behavior transparent and interpretable. Saving this data forms the basis of quantifiable Functional Requirements (FRs) whose verification in turn verifies the interpretability NFR. Ultimately, this paper contributes a method to verify interpretability NFRs for ML models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that ML provenance (including model and data lineage, hyperparameters, and training logs) can be saved to make model behavior transparent and interpretable. This provenance data underpins quantifiable Functional Requirements (FRs) whose verification directly confirms the Non-Functional Requirement (NFR) of interpretability, which existing literature treats as immeasurable. The work contributes a high-level approach for ML engineers to operationalize interpretability verification via requirements engineering techniques.

Significance. If the proposed mapping from provenance-derived FRs to interpretability NFR holds and is validated, the paper would provide a practical bridge between software requirements engineering and ML development, enabling auditable and verifiable interpretability in ML pipelines. This addresses a recognized gap in making ML NFRs rigorous without relying solely on post-hoc explanation techniques.

major comments (2)
  1. [Abstract] Abstract: The assertion that 'Saving this data forms the basis of quantifiable Functional Requirements (FRs) whose verification in turn verifies the interpretability NFR' lacks any concrete example, logical derivation, or reduction showing how FR satisfaction (e.g., confirming training data source or model version) entails satisfaction of interpretability properties such as feature contributions or local decision explanations. This entailment is load-bearing for the central claim.
  2. The manuscript presents only a conceptual framework with no case study, formal model, or validation steps demonstrating that provenance records address prediction-level interpretability questions rather than solely enabling reproducibility and traceability.
minor comments (2)
  1. [Abstract] The abstract would benefit from a brief inline definition or citation for 'ML provenance' and 'quantifiable FRs' to improve accessibility for readers unfamiliar with the intersection of requirements engineering and ML.
  2. Consider including a diagram or table that explicitly links specific provenance types (lineage, hyperparameters, logs) to example FRs and the interpretability aspects they purportedly verify.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments correctly identify that the central claim requires clearer support. We address each point below and outline planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that 'Saving this data forms the basis of quantifiable Functional Requirements (FRs) whose verification in turn verifies the interpretability NFR' lacks any concrete example, logical derivation, or reduction showing how FR satisfaction (e.g., confirming training data source or model version) entails satisfaction of interpretability properties such as feature contributions or local decision explanations. This entailment is load-bearing for the central claim.

    Authors: We acknowledge that the abstract states the entailment at a high level without an explicit example or derivation. The manuscript defines interpretability via transparency and traceability (Section 2), arguing that provenance-derived FRs (e.g., 'training data source and version are recorded and match the deployed model') provide the necessary context for any downstream interpretability analysis, including feature contributions. However, we agree a concrete illustration is missing. In revision we will expand the abstract and add a short example in the introduction showing how verification of a data-lineage FR enables assessment of whether a local explanation (such as LIME) is based on the intended training distribution. revision: yes

  2. Referee: The manuscript presents only a conceptual framework with no case study, formal model, or validation steps demonstrating that provenance records address prediction-level interpretability questions rather than solely enabling reproducibility and traceability.

    Authors: The paper's stated contribution is a conceptual mapping from provenance to verifiable FRs that operationalize the interpretability NFR; it does not include empirical validation or a formal model. We maintain that provenance supports prediction-level questions by supplying the exact data and model context required for local explanations, but we accept that the current text does not demonstrate this link beyond reproducibility. We will add an illustrative scenario (not a full case study) in a new subsection showing how logged prediction-specific provenance can be used to verify that a local explanation was generated from the correct input slice. revision: partial

Circularity Check

0 steps flagged

No circularity: conceptual proposal without derivations or self-referential reductions

full rationale

The paper is a requirements-engineering proposal that links provenance records to FRs whose verification is asserted to confirm the interpretability NFR. No equations, parameters, derivations, or formal reductions appear in the abstract or described content. The central mapping is presented as a methodological contribution rather than a result derived from prior quantities or self-citations. No load-bearing self-citation chains, ansatzes, or renamings of known results are exhibited. The work is therefore self-contained as a high-level suggestion and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that provenance records can be turned into verifiable FRs that confirm interpretability, without independent evidence or justification supplied.

axioms (1)
  • domain assumption Provenance data can be saved to make model behavior transparent and interpretable
    Invoked as the basis for turning NFR into FRs but not derived or evidenced in the abstract.

pith-pipeline@v0.9.0 · 5482 in / 1056 out tokens · 41110 ms · 2026-05-09T20:50:26.181022+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    Machine Learning Interpretability: A Survey on Methods and Metrics,

    D. V. Carvalho, E. M. Pereira and J. S. Cardoso, "Machine Learning Interpretability: A Survey on Methods and Metrics," Electronics, vol. 8, no. 8, 2019

  2. [2]

    Non-functional requirements for machine learning: an exploration of system scope and interest,

    K. M. Habibullah, G. Gay and J. Horkoff, "Non-functional requirements for machine learning: an exploration of system scope and interest," in SE4RAI '22: Proceedings of the 1st Workshop on Software Engineering for Responsible AI, Pittsburg, PA, USA, 2022

  3. [3]

    Non -Functional Requirements for Machine Learning: Challenges and New Directions,

    J. Horkoff, "Non -Functional Requirements for Machine Learning: Challenges and New Directions," in 2019 IEEE 27th International Requirements Engineering Conference (RE), Jeju, South Korea, 2019

  4. [4]

    xxx, "xxx," in IEEE Artificial Intelligence x Software Engineering (AIxSE), Laguna Hills, CA, USA, 2025

  5. [5]

    Provenance Documentation to Enable Explainable and Trustworthy AI: A Literature Review,

    A. Kale, T. Nguyen, F. C. Harris Jr., C. Li, J. Zhang and X. Ma, "Provenance Documentation to Enable Explainable and Trustworthy AI: A Literature Review," Data Intelligence, vol. 5, no. 1, pp. 139-162, 2023

  6. [6]

    The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning,

    S. Scherzinger, C. Seifert and L. Wiese, "The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning," in IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 2019

  7. [7]

    Requirements engineering: a roadmap,

    B. Nuseibeh and S. Easterbrook, "Requirements engineering: a roadmap," in Proceedings of the Conference on the Future of Software Engineering, 2000

  8. [8]

    L. A. Macaulay, Requirements engineering, Springer Science & Business Media, 2012

  9. [9]

    Requirements engineering for machine learning: A systematic mapping study,

    H. Villamizar, T. Escovedo and M. Kalinowski, "Requirements engineering for machine learning: A systematic mapping study," in 2021 47th Euromicro conference on software engineering and advanced applications (SEAA), 2021

  10. [10]

    Toward requirements specification for machine -learned components,

    M. Rahimi, J. L. Guo, S. Kokaly and M. Chechik, "Toward requirements specification for machine -learned components," in 2019 IEEE 27th international requirements engineering conference workshops (REW) , 2019

  11. [11]

    Requirements engineering for machine learning: A review and reflection,

    Z. Pei, L. Liu, C. Wang and J. Wang, "Requirements engineering for machine learning: A review and reflection," in 2022 IEEE 30th International Requirements Engineering Conference Workshops (REW), 2022

  12. [12]

    Structured verification of machine learning models in industrial settings,

    S. R. Kaminwar, J. Goschenhofer, J. Thomas, I. Thon and B. Bischl, "Structured verification of machine learning models in industrial settings," Big Data, vol. 11, no. 3, pp. 181-198, 2023

  13. [13]

    Interpretability versus Explainability: Classification for Understanding Deep Learning Systems and Models,

    I. Namatēvs, K. Sudars and A. Dobrājs, "Interpretability versus Explainability: Classification for Understanding Deep Learning Systems and Models," Engineering Optimization, vol. 29, no. 4, pp. 297 -356, 2022

  14. [14]

    Interpretability in Healthcare: A Comparative Study of Local Machine Learning Interpretability Techniques,

    R. Elshawi, Y. Sherif, M. Al -Mallah and S. Sakr, "Interpretability in Healthcare: A Comparative Study of Local Machine Learning Interpretability Techniques," Computational Intelligence, vol. 37, pp. 1633-1650, 2021

  15. [15]

    Interpretable Machine Learning: Definitions, Methods, and Applications,

    J. W. Murdoch, C. Singh, K. Kumbier, R. Abbasi -Asl and B. Yu, "Interpretable Machine Learning: Definitions, Methods, and Applications," arXiv, 2019

  16. [16]

    Model -Agnostic Interpretability of Machine Learning,

    M. Tulio Ribeiro, S. Singh and C. Guestrin, "Model -Agnostic Interpretability of Machine Learning," arXiv, 2016

  17. [17]

    Interpretable and explainable machine learning: A methods-centric overview with concrete examples,

    R. Marcinkevičs and J. E. Vogt, "Interpretable and explainable machine learning: A methods-centric overview with concrete examples," WIREs Data Mining and Knowledge Discovery, vol. 13, no. 3, 2023

  18. [18]

    Explaining Explanations: An Overview of Interpretability of Machine Learning,

    L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter and L. Kagal, "Explaining Explanations: An Overview of Interpretability of Machine Learning," in 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 2018

  19. [19]

    Machine learning pipelines: provenance, reproducibility and FAIR data principles,

    S. Samuel, F. Löffler and . B. König-Ries, "Machine learning pipelines: provenance, reproducibility and FAIR data principles," in International Provenance and Annotation Workshop, 2020

  20. [20]

    Establishing data provenance for responsible artificial intelligence systems,

    K. Werder, B. Ramesh and R. Zhang, "Establishing data provenance for responsible artificial intelligence systems," ACM Transactions on Management Information Systems (TMIS), vol. 13, no. 2, pp. 1-23, 22

  21. [21]

    Management of machine learning lifecycle artifacts: A survey,

    M. Schlegel and K. -U. Sattler, "Management of machine learning lifecycle artifacts: A survey," ACM SIGMOD Record, vol. 51, no. 4, pp. 18-35, 2023

  22. [22]

    Deliver production -ready AI,

    "Deliver production -ready AI," MLFlow, 2025. [Online]. Available: https://mlflow.org/. [Accessed 2025]

  23. [23]

    Data Version Control,

    "Data Version Control," DVC, 2025. [Online]. Available: https://dvc.org/. [Accessed 2025]

  24. [24]

    Weights & Biases,

    "Weights & Biases," CoreWeave, 2025. [Online]. Available: https://wandb.ai/site/. [Accessed 2025]

  25. [25]

    Neptune.AI,

    "Neptune.AI," Neptune.AI, 2025. [Online]. Available: https://neptune.ai/. [Accessed 2025]

  26. [26]

    Where AI Developers Build,

    "Where AI Developers Build," Comet, 2025. [Online]. Available: https://www.comet.com/site/. [Accessed 2025]

  27. [27]

    yProv4ML: Effortless provenance tracking for machine learning systems,

    G. Padovani, V. Anantharaj and S. Fiore, "yProv4ML: Effortless provenance tracking for machine learning systems," SoftwareX, vol. 31, 2025

  28. [28]

    PROV-O: The PROV Ontology,

    "PROV-O: The PROV Ontology," World Wide Web Consortium (W3C), 2013. [Online]. Available: https://www.w3.org/TR/prov -o/. [Accessed 2025]

  29. [29]

    PROV-DM: The PROV Data Model,

    "PROV-DM: The PROV Data Model," World Wide Web Consortium,

  30. [30]

    Available: https://www.w3.org/TR/2013/REC-prov-dm- 20130430/

    [Online]. Available: https://www.w3.org/TR/2013/REC-prov-dm- 20130430/. [Accessed 2025]

  31. [31]

    R. Arp, B. Smith and A. D. Spear, Building Ontologies with Basic Formal Ontology, MIT Press, 2015

  32. [32]

    Linear Regression from Scratch,

    F. Elmenshawii, "Linear Regression from Scratch," Kaggle, 2023. [Online]. Available: https://www.kaggle.com/code/fareselmenshawii/linear-regression- from-scratch/notebook. [Accessed 2025]

  33. [33]

    Use the Analysis ToolPak to perform complex data analysis,

    "Use the Analysis ToolPak to perform complex data analysis," Microsoft, 2025. [Online]. Available: https://support.microsoft.com/en- us/office/use-the-analysis-toolpak-to-perform-complex-data-analysis- 6c67ccf0-f4a9-487c-8dec-bdb5a2cefab6. [Accessed 2025]

  34. [34]

    Diverse Counterfactual Explanations (DiCE) for ML,

    R. K. Mothilal, A. Sharma and C. Tan, "Diverse Counterfactual Explanations (DiCE) for ML," InterpretML, 2020. [Online]. Available: https://interpret.ml/DiCE/. [Accessed 2025]

  35. [35]

    Quick introduction to generating counterfactual explanations using DiCE,

    A. Sharma, "Quick introduction to generating counterfactual explanations using DiCE," GitHub, 2022. [Online]. Available: https://github.com/interpretml/DiCE/blob/main/docs/source/notebooks/ DiCE_getting_started.ipynb. [Accessed 2025]

  36. [36]

    Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Munich, Germany: Self-published, 2025

    C. Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Munich, Germany: Self-published, 2025

  37. [37]

    How Provenance helps Quality Assurance Activities in AI/ML Systems,

    T. Nakagawa, K. Narita and K.-S. Kim, "How Provenance helps Quality Assurance Activities in AI/ML Systems," in AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems, Bangalore, India, 2022

  38. [38]

    Explaining machine learning classifiers through diverse counterfactual explanations,

    R. K. Mothilal, A. Sharma and C. Tan, "Explaining machine learning classifiers through diverse counterfactual explanations," in FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 2020

  39. [39]

    Inherently Interpretable Tree Ensemble Learning,

    Z. Yang, A. Sudjianto, X. Li and A. Zhang, "Inherently Interpretable Tree Ensemble Learning," arXiv, 2024

  40. [40]

    Why and where: A characterization of data provenance,

    P. Buneman, S. Khanna and T. Wang -Chiew, "Why and where: A characterization of data provenance," in International conference on database theory, 2001

  41. [41]

    The PROV -JSONLD Serialization,

    "The PROV -JSONLD Serialization," World Wide Web Consortium,

  42. [42]

    Available: https://www.w3.org/submissions/2024/SUBM-prov-jsonld-20240825/

    [Online]. Available: https://www.w3.org/submissions/2024/SUBM-prov-jsonld-20240825/. [Accessed 2025]

  43. [43]

    JSON-LD 1.1,

    "JSON-LD 1.1," World Wide Web Consortium, 2020. [Online]. Available: https://www.w3.org/TR/json-ld11/. [Accessed 2025]

  44. [44]

    JSON for Linking Data,

    "JSON for Linking Data," JSON -LD, 2025. [Online]. Available: https://json-ld.org/. [Accessed 2025]

  45. [45]

    [Online]

    digitalbazaar, "PyLD," GitHub, 2025. [Online]. Available: https://github.com/digitalbazaar/pyld. [Accessed 2025]