Verifying Machine Learning Interpretability Requirements through Provenance
Pith reviewed 2026-05-09 20:50 UTC · model grok-4.3
The pith
Saving model and data provenance during machine learning development creates measurable functional requirements that verify interpretability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that saving various types of model and data provenance makes the model's behavior transparent and interpretable. This data forms the basis of quantifiable functional requirements whose verification in turn verifies the interpretability non-functional requirement.
What carries the argument
ML provenance, consisting of saved records of model and data details, serves as the central mechanism that renders behavior transparent and supports the creation of verifiable functional requirements.
If this is right
- Engineers obtain a practical method to verify interpretability non-functional requirements for machine learning models.
- Quantifiable functional requirements derived from provenance become the operational checks that stand in for the abstract interpretability goal.
- Machine learning development gains a verification technique drawn from requirements engineering.
- Transparency of model behavior increases when provenance is systematically saved.
Where Pith is reading between the lines
- The same provenance approach might apply to verifying other machine learning non-functional requirements such as fairness or robustness.
- Embedding provenance capture into standard machine learning pipelines could narrow the gap between machine learning practice and traditional software engineering.
- Empirical tests in deployed systems could show whether verified functional requirements actually improve human understanding of model outputs.
Load-bearing premise
The premise that recording provenance data will make model behavior transparent and interpretable enough for functional-requirement checks to confirm the non-functional interpretability requirement.
What would settle it
An ML model in which all relevant provenance is recorded and the derived functional requirements are verified, yet experts or users still cannot interpret the model's decisions.
Figures
read the original abstract
Machine Learning (ML) Engineering is a growing field that necessitates an increase in the rigor of ML development. It draws many ideas from software engineering and more specifically, from requirements engineering. Existing literature on ML Engineering defines quality models and Non-Functional Requirements (NFRs) specific to ML, in particular interpretability being one such NFR. However, a major challenge occurs in verifying ML NFRs, including interpretability. Although existing literature defines interpretability in terms of ML, it remains an immeasurable requirement, making it impossible to definitively confirm whether a model meets its interpretability requirement. This paper shows how ML provenance can be used to verify ML interpretability requirements. This work provides an approach for how ML engineers can save various types of model and data provenance to make the model's behavior transparent and interpretable. Saving this data forms the basis of quantifiable Functional Requirements (FRs) whose verification in turn verifies the interpretability NFR. Ultimately, this paper contributes a method to verify interpretability NFRs for ML models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that ML provenance (including model and data lineage, hyperparameters, and training logs) can be saved to make model behavior transparent and interpretable. This provenance data underpins quantifiable Functional Requirements (FRs) whose verification directly confirms the Non-Functional Requirement (NFR) of interpretability, which existing literature treats as immeasurable. The work contributes a high-level approach for ML engineers to operationalize interpretability verification via requirements engineering techniques.
Significance. If the proposed mapping from provenance-derived FRs to interpretability NFR holds and is validated, the paper would provide a practical bridge between software requirements engineering and ML development, enabling auditable and verifiable interpretability in ML pipelines. This addresses a recognized gap in making ML NFRs rigorous without relying solely on post-hoc explanation techniques.
major comments (2)
- [Abstract] Abstract: The assertion that 'Saving this data forms the basis of quantifiable Functional Requirements (FRs) whose verification in turn verifies the interpretability NFR' lacks any concrete example, logical derivation, or reduction showing how FR satisfaction (e.g., confirming training data source or model version) entails satisfaction of interpretability properties such as feature contributions or local decision explanations. This entailment is load-bearing for the central claim.
- The manuscript presents only a conceptual framework with no case study, formal model, or validation steps demonstrating that provenance records address prediction-level interpretability questions rather than solely enabling reproducibility and traceability.
minor comments (2)
- [Abstract] The abstract would benefit from a brief inline definition or citation for 'ML provenance' and 'quantifiable FRs' to improve accessibility for readers unfamiliar with the intersection of requirements engineering and ML.
- Consider including a diagram or table that explicitly links specific provenance types (lineage, hyperparameters, logs) to example FRs and the interpretability aspects they purportedly verify.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments correctly identify that the central claim requires clearer support. We address each point below and outline planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that 'Saving this data forms the basis of quantifiable Functional Requirements (FRs) whose verification in turn verifies the interpretability NFR' lacks any concrete example, logical derivation, or reduction showing how FR satisfaction (e.g., confirming training data source or model version) entails satisfaction of interpretability properties such as feature contributions or local decision explanations. This entailment is load-bearing for the central claim.
Authors: We acknowledge that the abstract states the entailment at a high level without an explicit example or derivation. The manuscript defines interpretability via transparency and traceability (Section 2), arguing that provenance-derived FRs (e.g., 'training data source and version are recorded and match the deployed model') provide the necessary context for any downstream interpretability analysis, including feature contributions. However, we agree a concrete illustration is missing. In revision we will expand the abstract and add a short example in the introduction showing how verification of a data-lineage FR enables assessment of whether a local explanation (such as LIME) is based on the intended training distribution. revision: yes
-
Referee: The manuscript presents only a conceptual framework with no case study, formal model, or validation steps demonstrating that provenance records address prediction-level interpretability questions rather than solely enabling reproducibility and traceability.
Authors: The paper's stated contribution is a conceptual mapping from provenance to verifiable FRs that operationalize the interpretability NFR; it does not include empirical validation or a formal model. We maintain that provenance supports prediction-level questions by supplying the exact data and model context required for local explanations, but we accept that the current text does not demonstrate this link beyond reproducibility. We will add an illustrative scenario (not a full case study) in a new subsection showing how logged prediction-specific provenance can be used to verify that a local explanation was generated from the correct input slice. revision: partial
Circularity Check
No circularity: conceptual proposal without derivations or self-referential reductions
full rationale
The paper is a requirements-engineering proposal that links provenance records to FRs whose verification is asserted to confirm the interpretability NFR. No equations, parameters, derivations, or formal reductions appear in the abstract or described content. The central mapping is presented as a methodological contribution rather than a result derived from prior quantities or self-citations. No load-bearing self-citation chains, ansatzes, or renamings of known results are exhibited. The work is therefore self-contained as a high-level suggestion and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Provenance data can be saved to make model behavior transparent and interpretable
Reference graph
Works this paper leans on
-
[1]
Machine Learning Interpretability: A Survey on Methods and Metrics,
D. V. Carvalho, E. M. Pereira and J. S. Cardoso, "Machine Learning Interpretability: A Survey on Methods and Metrics," Electronics, vol. 8, no. 8, 2019
work page 2019
-
[2]
Non-functional requirements for machine learning: an exploration of system scope and interest,
K. M. Habibullah, G. Gay and J. Horkoff, "Non-functional requirements for machine learning: an exploration of system scope and interest," in SE4RAI '22: Proceedings of the 1st Workshop on Software Engineering for Responsible AI, Pittsburg, PA, USA, 2022
work page 2022
-
[3]
Non -Functional Requirements for Machine Learning: Challenges and New Directions,
J. Horkoff, "Non -Functional Requirements for Machine Learning: Challenges and New Directions," in 2019 IEEE 27th International Requirements Engineering Conference (RE), Jeju, South Korea, 2019
work page 2019
-
[4]
xxx, "xxx," in IEEE Artificial Intelligence x Software Engineering (AIxSE), Laguna Hills, CA, USA, 2025
work page 2025
-
[5]
Provenance Documentation to Enable Explainable and Trustworthy AI: A Literature Review,
A. Kale, T. Nguyen, F. C. Harris Jr., C. Li, J. Zhang and X. Ma, "Provenance Documentation to Enable Explainable and Trustworthy AI: A Literature Review," Data Intelligence, vol. 5, no. 1, pp. 139-162, 2023
work page 2023
-
[6]
S. Scherzinger, C. Seifert and L. Wiese, "The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning," in IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, TX, USA, 2019
work page 2019
-
[7]
Requirements engineering: a roadmap,
B. Nuseibeh and S. Easterbrook, "Requirements engineering: a roadmap," in Proceedings of the Conference on the Future of Software Engineering, 2000
work page 2000
-
[8]
L. A. Macaulay, Requirements engineering, Springer Science & Business Media, 2012
work page 2012
-
[9]
Requirements engineering for machine learning: A systematic mapping study,
H. Villamizar, T. Escovedo and M. Kalinowski, "Requirements engineering for machine learning: A systematic mapping study," in 2021 47th Euromicro conference on software engineering and advanced applications (SEAA), 2021
work page 2021
-
[10]
Toward requirements specification for machine -learned components,
M. Rahimi, J. L. Guo, S. Kokaly and M. Chechik, "Toward requirements specification for machine -learned components," in 2019 IEEE 27th international requirements engineering conference workshops (REW) , 2019
work page 2019
-
[11]
Requirements engineering for machine learning: A review and reflection,
Z. Pei, L. Liu, C. Wang and J. Wang, "Requirements engineering for machine learning: A review and reflection," in 2022 IEEE 30th International Requirements Engineering Conference Workshops (REW), 2022
work page 2022
-
[12]
Structured verification of machine learning models in industrial settings,
S. R. Kaminwar, J. Goschenhofer, J. Thomas, I. Thon and B. Bischl, "Structured verification of machine learning models in industrial settings," Big Data, vol. 11, no. 3, pp. 181-198, 2023
work page 2023
-
[13]
I. Namatēvs, K. Sudars and A. Dobrājs, "Interpretability versus Explainability: Classification for Understanding Deep Learning Systems and Models," Engineering Optimization, vol. 29, no. 4, pp. 297 -356, 2022
work page 2022
-
[14]
R. Elshawi, Y. Sherif, M. Al -Mallah and S. Sakr, "Interpretability in Healthcare: A Comparative Study of Local Machine Learning Interpretability Techniques," Computational Intelligence, vol. 37, pp. 1633-1650, 2021
work page 2021
-
[15]
Interpretable Machine Learning: Definitions, Methods, and Applications,
J. W. Murdoch, C. Singh, K. Kumbier, R. Abbasi -Asl and B. Yu, "Interpretable Machine Learning: Definitions, Methods, and Applications," arXiv, 2019
work page 2019
-
[16]
Model -Agnostic Interpretability of Machine Learning,
M. Tulio Ribeiro, S. Singh and C. Guestrin, "Model -Agnostic Interpretability of Machine Learning," arXiv, 2016
work page 2016
-
[17]
Interpretable and explainable machine learning: A methods-centric overview with concrete examples,
R. Marcinkevičs and J. E. Vogt, "Interpretable and explainable machine learning: A methods-centric overview with concrete examples," WIREs Data Mining and Knowledge Discovery, vol. 13, no. 3, 2023
work page 2023
-
[18]
Explaining Explanations: An Overview of Interpretability of Machine Learning,
L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter and L. Kagal, "Explaining Explanations: An Overview of Interpretability of Machine Learning," in 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 2018
work page 2018
-
[19]
Machine learning pipelines: provenance, reproducibility and FAIR data principles,
S. Samuel, F. Löffler and . B. König-Ries, "Machine learning pipelines: provenance, reproducibility and FAIR data principles," in International Provenance and Annotation Workshop, 2020
work page 2020
-
[20]
Establishing data provenance for responsible artificial intelligence systems,
K. Werder, B. Ramesh and R. Zhang, "Establishing data provenance for responsible artificial intelligence systems," ACM Transactions on Management Information Systems (TMIS), vol. 13, no. 2, pp. 1-23, 22
-
[21]
Management of machine learning lifecycle artifacts: A survey,
M. Schlegel and K. -U. Sattler, "Management of machine learning lifecycle artifacts: A survey," ACM SIGMOD Record, vol. 51, no. 4, pp. 18-35, 2023
work page 2023
-
[22]
"Deliver production -ready AI," MLFlow, 2025. [Online]. Available: https://mlflow.org/. [Accessed 2025]
work page 2025
-
[23]
"Data Version Control," DVC, 2025. [Online]. Available: https://dvc.org/. [Accessed 2025]
work page 2025
-
[24]
"Weights & Biases," CoreWeave, 2025. [Online]. Available: https://wandb.ai/site/. [Accessed 2025]
work page 2025
-
[25]
"Neptune.AI," Neptune.AI, 2025. [Online]. Available: https://neptune.ai/. [Accessed 2025]
work page 2025
-
[26]
"Where AI Developers Build," Comet, 2025. [Online]. Available: https://www.comet.com/site/. [Accessed 2025]
work page 2025
-
[27]
yProv4ML: Effortless provenance tracking for machine learning systems,
G. Padovani, V. Anantharaj and S. Fiore, "yProv4ML: Effortless provenance tracking for machine learning systems," SoftwareX, vol. 31, 2025
work page 2025
-
[28]
"PROV-O: The PROV Ontology," World Wide Web Consortium (W3C), 2013. [Online]. Available: https://www.w3.org/TR/prov -o/. [Accessed 2025]
work page 2013
- [29]
-
[30]
Available: https://www.w3.org/TR/2013/REC-prov-dm- 20130430/
[Online]. Available: https://www.w3.org/TR/2013/REC-prov-dm- 20130430/. [Accessed 2025]
work page 2013
-
[31]
R. Arp, B. Smith and A. D. Spear, Building Ontologies with Basic Formal Ontology, MIT Press, 2015
work page 2015
-
[32]
Linear Regression from Scratch,
F. Elmenshawii, "Linear Regression from Scratch," Kaggle, 2023. [Online]. Available: https://www.kaggle.com/code/fareselmenshawii/linear-regression- from-scratch/notebook. [Accessed 2025]
work page 2023
-
[33]
Use the Analysis ToolPak to perform complex data analysis,
"Use the Analysis ToolPak to perform complex data analysis," Microsoft, 2025. [Online]. Available: https://support.microsoft.com/en- us/office/use-the-analysis-toolpak-to-perform-complex-data-analysis- 6c67ccf0-f4a9-487c-8dec-bdb5a2cefab6. [Accessed 2025]
work page 2025
-
[34]
Diverse Counterfactual Explanations (DiCE) for ML,
R. K. Mothilal, A. Sharma and C. Tan, "Diverse Counterfactual Explanations (DiCE) for ML," InterpretML, 2020. [Online]. Available: https://interpret.ml/DiCE/. [Accessed 2025]
work page 2020
-
[35]
Quick introduction to generating counterfactual explanations using DiCE,
A. Sharma, "Quick introduction to generating counterfactual explanations using DiCE," GitHub, 2022. [Online]. Available: https://github.com/interpretml/DiCE/blob/main/docs/source/notebooks/ DiCE_getting_started.ipynb. [Accessed 2025]
work page 2022
-
[36]
C. Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Munich, Germany: Self-published, 2025
work page 2025
-
[37]
How Provenance helps Quality Assurance Activities in AI/ML Systems,
T. Nakagawa, K. Narita and K.-S. Kim, "How Provenance helps Quality Assurance Activities in AI/ML Systems," in AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems, Bangalore, India, 2022
work page 2022
-
[38]
Explaining machine learning classifiers through diverse counterfactual explanations,
R. K. Mothilal, A. Sharma and C. Tan, "Explaining machine learning classifiers through diverse counterfactual explanations," in FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 2020
work page 2020
-
[39]
Inherently Interpretable Tree Ensemble Learning,
Z. Yang, A. Sudjianto, X. Li and A. Zhang, "Inherently Interpretable Tree Ensemble Learning," arXiv, 2024
work page 2024
-
[40]
Why and where: A characterization of data provenance,
P. Buneman, S. Khanna and T. Wang -Chiew, "Why and where: A characterization of data provenance," in International conference on database theory, 2001
work page 2001
-
[41]
The PROV -JSONLD Serialization,
"The PROV -JSONLD Serialization," World Wide Web Consortium,
-
[42]
Available: https://www.w3.org/submissions/2024/SUBM-prov-jsonld-20240825/
[Online]. Available: https://www.w3.org/submissions/2024/SUBM-prov-jsonld-20240825/. [Accessed 2025]
work page 2024
-
[43]
"JSON-LD 1.1," World Wide Web Consortium, 2020. [Online]. Available: https://www.w3.org/TR/json-ld11/. [Accessed 2025]
work page 2020
-
[44]
"JSON for Linking Data," JSON -LD, 2025. [Online]. Available: https://json-ld.org/. [Accessed 2025]
work page 2025
- [45]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.