Recognition: unknown
Making AI Compliance Evidence Machine-Readable
Pith reviewed 2026-05-10 12:27 UTC · model grok-4.3
The pith
OSCAL with 16 property extensions and a three-layer architecture generates machine-readable AI compliance evidence as a byproduct of model training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OSCAL is proposed as a candidate interchange format for AI governance that complements the JTC21 standards stack. Sixteen property extensions are defined to cover lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification. A three-layer architecture for Compliance-as-Code, consisting of policy, evidence, and enforcement layers, generates assurance evidence as a byproduct of model training. The resulting SDK produces native OSCAL Assessment Results that are validated against the NIST JSON schema. The approach is tested on two Annex III high-risk systems: a credit scoring model and a medical imaging segmentation system.
What carries the argument
The three-layer Compliance-as-Code architecture with policy, evidence, and enforcement layers, supported by 16 OSCAL property extensions for generating and validating assurance evidence during model training.
Where Pith is reading between the lines
- This could enable automated verification tools to check compliance evidence across supply chains.
- The format might support consistent auditing practices for multiple overlapping AI regulations.
- Similar extensions could be developed for other types of machine learning assurance beyond the tested cases.
- Organizations might integrate this into existing MLOps pipelines to reduce compliance overhead.
Load-bearing premise
The 16 property extensions and three-layer architecture will integrate smoothly with emerging AI standards and cover all required governance evidence without needing significant further changes.
What would settle it
Observing whether the OSCAL output from the system is accepted as sufficient evidence in an actual regulatory review or third-party audit for an Annex III AI system, or if key elements are missing.
Figures
read the original abstract
AI Assurance -- producing the machine-readable evidence required to demonstrate compliance with AI governance frameworks -- has mature policy scaffolding but lacks the infrastructure to operationalize it. Organizations building high-risk AI systems under the EU AI Act face a gap: frameworks such as the EU AI Act, ISO/IEC 42001, and NIST AI RMF specify what to assure but provide no executable format for how. This paper proposes OSCAL -- the NIST standard adopted for FedRAMP cybersecurity compliance -- as a candidate interchange format for AI governance, complementing rather than replacing the emerging JTC21 standards stack. We define 16 property extensions covering lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification, and present a three-layer Compliance-as-Code architecture (policy, evidence, enforcement) that generates assurance evidence as a byproduct of model training. The SDK produces native OSCAL Assessment Results validated against the NIST JSON schema. We test the approach on two Annex III high-risk systems: a credit scoring model and a medical imaging segmentation system. The architecture and reference implementation are open-source under Apache 2.0.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes OSCAL (the NIST standard for cybersecurity compliance) as an interchange format for AI governance evidence under frameworks such as the EU AI Act. It defines 16 property extensions addressing lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification; introduces a three-layer Compliance-as-Code architecture (policy, evidence, enforcement) that generates native OSCAL Assessment Results as a byproduct of model training; and reports successful application to two Annex III high-risk systems (credit scoring and medical imaging segmentation). The reference implementation is released open-source under Apache 2.0.
Significance. If the extensions and architecture prove compatible with the JTC21 stack and deliver sufficient coverage, the work would supply a concrete, executable mechanism for producing and exchanging machine-readable AI assurance evidence, directly addressing the operational gap between high-level governance frameworks and implementable compliance processes. The open-source release and concrete system tests constitute reproducible starting points for further validation.
major comments (3)
- [Architecture and Extensions sections] The central claim that the 16 OSCAL extensions plus three-layer architecture complement (rather than conflict with) the emerging JTC21 standards stack and provide sufficient coverage for EU AI Act obligations (Articles 9-15 and Annex III) is unsupported by any explicit mapping, combined schema validation, or gap analysis against JTC21 artifacts.
- [Evaluation section] The evaluation on two Annex III systems states that OSCAL output was successfully generated but supplies no quantitative metrics, error rates, coverage statistics, or comparison against alternative evidence formats or manual compliance processes.
- [Compliance-as-Code architecture description] The assumption that the generated assurance evidence satisfies all required governance obligations without further major modifications remains untested; no verification against the full set of Annex III requirements or risk-management obligations is presented.
minor comments (3)
- Include a dedicated table or appendix listing the exact JSON schema definitions, allowed values, and rationale for each of the 16 property extensions.
- Clarify the precise integration points between the SDK and common ML training frameworks (e.g., PyTorch or TensorFlow) to aid reproducibility.
- Add a short discussion of potential maintenance overhead for keeping the OSCAL extensions synchronized with future revisions of the NIST AI RMF or JTC21 standards.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments highlight important areas for strengthening the manuscript's claims regarding compatibility, evaluation rigor, and scope. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Architecture and Extensions sections] The central claim that the 16 OSCAL extensions plus three-layer architecture complement (rather than conflict with) the emerging JTC21 standards stack and provide sufficient coverage for EU AI Act obligations (Articles 9-15 and Annex III) is unsupported by any explicit mapping, combined schema validation, or gap analysis against JTC21 artifacts.
Authors: We agree that an explicit mapping and gap analysis would strengthen the complementarity claim. The manuscript positions the OSCAL extensions as additive to the JTC21 stack (focusing on machine-readable evidence interchange) rather than a replacement, but does not include a full mapping because JTC21 artifacts remain under active development. We will revise the Architecture and Extensions sections to add a high-level compatibility discussion, including how the 16 properties align with key EU AI Act requirements without claiming full coverage. A complete schema validation against finalized JTC21 is noted as future work. revision: partial
-
Referee: [Evaluation section] The evaluation on two Annex III systems states that OSCAL output was successfully generated but supplies no quantitative metrics, error rates, coverage statistics, or comparison against alternative evidence formats or manual compliance processes.
Authors: The Evaluation section demonstrates feasibility through successful generation of schema-validated OSCAL Assessment Results on two Annex III systems as a training byproduct. Schema validation ensures structural correctness with no generation errors by design. Traditional quantitative metrics such as error rates are less applicable here, as the contribution is automation of evidence production rather than predictive accuracy. We will revise the Evaluation section to include additional details on property coverage in the generated artifacts and qualitative observations on process efficiency from the open-source implementation. No head-to-head comparison with alternative formats was conducted, as the paper focuses on the OSCAL approach. revision: partial
-
Referee: [Compliance-as-Code architecture description] The assumption that the generated assurance evidence satisfies all required governance obligations without further major modifications remains untested; no verification against the full set of Annex III requirements or risk-management obligations is presented.
Authors: The manuscript does not claim or assume that the generated evidence satisfies all governance obligations without modifications. The three-layer architecture is presented as a mechanism to produce native OSCAL evidence automatically, which organizations can then incorporate into their broader compliance processes. The tests on Annex III systems validate the generation pipeline but do not constitute a full audit of Articles 9-15. We will revise the Compliance-as-Code architecture description to explicitly qualify the scope, clarifying that the approach facilitates evidence creation but requires integration with human-led risk management and does not replace full obligation verification. revision: yes
Circularity Check
No circularity: proposal extends external NIST standard without self-referential reductions
full rationale
The manuscript proposes 16 OSCAL property extensions and a three-layer Compliance-as-Code architecture that generates evidence from model training. It builds directly on the independently maintained NIST OSCAL standard (cited as external) and reports concrete outputs on two Annex III systems. No equations, fitted parameters, or derivations appear; the central claims consist of definitional extensions and architectural description rather than any step that reduces by construction to inputs defined within the paper. Self-citations are absent from load-bearing positions, and the JTC21 complementarity claim is an untested assumption rather than a circular derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existing AI governance frameworks specify what must be assured but provide no executable interchange format
- domain assumption OSCAL extensions can complement rather than replace the JTC21 standards stack
invented entities (2)
-
16 OSCAL property extensions for AI
no independent evidence
-
Three-layer Compliance-as-Code architecture
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification
DEMM defines four executable evidence-sufficiency categories plus a conflicting category for agentic AI decisions and rolls per-property verdicts into a five-level maturity rubric.
Reference graph
Works this paper leans on
-
[1]
Economics of artificial intelligence governance,
N. Kshetri, “Economics of artificial intelligence governance,”Com- puter, vol. 57, no. 4, pp. 113–118, 2024
2024
-
[2]
Computational compliance for AI regulation: Blueprint for a new research domain,
B. Marino and N. D. Lane, “Computational compliance for AI regulation: Blueprint for a new research domain,” 2026
2026
-
[3]
Simplifying software compliance: AI technologies in drafting technical docu- mentation for the AI act,
F. Sovrano, E. Hine, S. Anzolut, and A. Bacchelli, “Simplifying software compliance: AI technologies in drafting technical docu- mentation for the AI act,”Empirical Software Engineering, vol. 30, no. 91, 2025
2025
-
[4]
Charting the course for NIST OSCAL,
M. Iorga and M. Nguyen, “Charting the course for NIST OSCAL,” National Institute of Standards and Technology, Tech. Rep. NIST CSWP 53 (Initial Public Draft), Dec. 2025
2025
-
[5]
Regulatory technology and supervisory technology: Current status, facilitators, and barriers,
N. Kshetri, “Regulatory technology and supervisory technology: Current status, facilitators, and barriers,”Computer, vol. 56, no. 1, pp. 64–75, 2023
2023
-
[6]
Interplay between the AI Act and the EU digital legislative framework,
H. Graux, K. Garstka, N. Murali, J. Cave, and M. Botterman, “Interplay between the AI Act and the EU digital legislative framework,” European Parliament, ITRE Committee, Tech. Rep. PE 778.575, 2025. [Online]. Available: https://www.europarl.europa.eu/RegData/etudes/ STUD/2025/778575/ECTI_STU(2025)778575_EN.pdf
2025
-
[7]
Governance of high-risk AI systems in healthcare and credit scoring,
S. Bartsch, O. Behn, A. Benlian, R. Brownsword, S. Bücker, M. Düwell, N. Formánek, M. Jungtäubl, M. Leyer, A. Richter, J.- H. Schmidt, and M. Will-Zocholl, “Governance of high-risk AI systems in healthcare and credit scoring,”Business & Information Systems Engineering, vol. 67, no. 4, pp. 563–581, 2025
2025
-
[8]
Navigating the upcoming European Union AI Act,
M. Wagner, M. Borg, and P . Runeson, “Navigating the upcoming European Union AI Act,”IEEE Software, vol. 41, no. 1, pp. 19–24, 2024
2024
-
[9]
An AI harms and governance framework for trustworthy AI,
J. B. Peckham, “An AI harms and governance framework for trustworthy AI,”Computer, vol. 57, no. 3, pp. 59–68, 2024
2024
-
[10]
Model cards for model reporting,
M. Mitchell, S. Wu, A. Zaldivar, P . Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, and T. Gebru, “Model cards for model reporting,” inProc. Conf. Fairness, Accountability, and Transparency (FAT* ’19), 2019, pp. 220–229
2019
-
[11]
FactSheets: Increasing trust in AI services through supplier’s declarations of conformity,
M. Arnold, R. K. E. Bellamy, M. Hind, S. Houde, S. Mehta, A. Mo- jsilovic, R. Nair, K. Natesan Ramamurthy, D. Reimer, A. Olteanu, D. Piorkowski, J. Tsay, and K. R. Varshney, “FactSheets: Increasing trust in AI services through supplier’s declarations of conformity,” IBM Journal of Research and Development, vol. 63, no. 4/5, pp. 6:1– 6:13, 2019
2019
-
[12]
The possibility of fairness: Revisiting the impossibility theorem in practice,
A. Bell, L. Bynum, N. Drushchak, T. Herasymova, L. Rosenblatt, and J. Stoyanovich, “The possibility of fairness: Revisiting the impossibility theorem in practice,” inProc. Conf. Fairness, Accountability, and Transparency (FAccT ’23), 2023. [Online]. Available: https://arxiv.org/abs/2302.06347
-
[13]
What we know about AIBOMs: Results from a multivocal litera- ture review on artificial intelligence bill of materials,
S. Nocera, M. Di Penta, F. Ahmed, S. Romano, and G. Scanniello, “What we know about AIBOMs: Results from a multivocal litera- ture review on artificial intelligence bill of materials,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 6, 2025
2025
-
[14]
Statlog (German Credit Data),
H. Hofmann, “Statlog (German Credit Data),” UCI Machine Learning Repository, 1994. [Online]. Available: https://archive. ics.uci.edu/ml/datasets/statlog+(german+credit+data)
1994
-
[15]
Automation bias in the AI Act: On the legal implications of attempting to de-bias human oversight of AI,
J. Laux and H. Ruschemeier, “Automation bias in the AI Act: On the legal implications of attempting to de-bias human oversight of AI,”European Journal of Risk Regulation, vol. 16, pp. 1519–1534, 2025
2025
-
[16]
Governing agentic AI: Security, identity, and over- sight in the age of autonomous intelligent systems,
N. Kshetri, “Governing agentic AI: Security, identity, and over- sight in the age of autonomous intelligent systems,”Computer, vol. 58, no. 8, pp. 123–129, 2025
2025
-
[17]
L. Nannini, A. L. Smith, M. J. Maggini, E. Panai, S. Feliciano, A. Tiulkanov, E. Maran, J. Gealy, and P . Bisconti, “AI agents under EU law,” arXiv:2604.04604, 2026. [Online]. Available: https://arxiv.org/abs/2604.04604 8
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[18]
Agent drift: Quantifying behavioral degradation in multi-agent LLM systems over extended interactions,
A. Rath, “Agent drift: Quantifying behavioral degradation in multi-agent LLM systems over extended interactions,”arXiv preprint, 2026. [Online]. Available: https://arxiv.org/abs/2601. 04170
2026
-
[19]
MI9: An integrated runtime governance framework for agentic AI,
C. L. Wang, T. Singhal, A. Kelkar, and J. Tuo, “MI9: An integrated runtime governance framework for agentic AI,”arXiv preprint,
-
[20]
MI9 – agent intelligence protocol: Runtime governance for agentic AI systems,
[Online]. Available: https://arxiv.org/abs/2508.03858
-
[21]
D. Furman and M. Goldszmidt, “Runtime governance for AI agents: Policies on paths,”arXiv preprint, 2026. [Online]. Available: https://arxiv.org/abs/2603.16586 PLACE PHOTO HERE Rodrigo Cilla Ugarteis founder of Venturalítica S.L. and holds a PhD in Computer Science from Universidad Carlos III de Madrid. He pre- viously led CE marking for medical device sof...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.