arxiv: 2604.13767 · v1 · submitted 2026-04-15 · 💻 cs.CY

Recognition: unknown

Making AI Compliance Evidence Machine-Readable

Rodrigo Cilla Ugarte , Miguel \'Angel Patricio Guisado , Antonio Berlanga de Jes\'us , Jos\'e Manuel Molina L\'opez

Authors on Pith no claims yet

Pith reviewed 2026-05-10 12:27 UTC · model grok-4.3

classification 💻 cs.CY

keywords OSCALAI compliancemachine-readable evidencegovernancehigh-risk AICompliance-as-Coderisk traceabilityassurance

0 comments

The pith

OSCAL with 16 property extensions and a three-layer architecture generates machine-readable AI compliance evidence as a byproduct of model training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that OSCAL serves as a viable interchange format for AI governance assurance evidence. It introduces 16 property extensions to handle lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification. A three-layer Compliance-as-Code architecture is presented that produces this evidence automatically during model training. Sympathetic readers would care because governance frameworks specify compliance needs without providing executable formats, creating an operational gap for high-risk AI developers. If correct, this allows assurance data to be generated and validated as part of normal development workflows.

Core claim

OSCAL is proposed as a candidate interchange format for AI governance that complements the JTC21 standards stack. Sixteen property extensions are defined to cover lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification. A three-layer architecture for Compliance-as-Code, consisting of policy, evidence, and enforcement layers, generates assurance evidence as a byproduct of model training. The resulting SDK produces native OSCAL Assessment Results that are validated against the NIST JSON schema. The approach is tested on two Annex III high-risk systems: a credit scoring model and a medical imaging segmentation system.

What carries the argument

The three-layer Compliance-as-Code architecture with policy, evidence, and enforcement layers, supported by 16 OSCAL property extensions for generating and validating assurance evidence during model training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could enable automated verification tools to check compliance evidence across supply chains.
The format might support consistent auditing practices for multiple overlapping AI regulations.
Similar extensions could be developed for other types of machine learning assurance beyond the tested cases.
Organizations might integrate this into existing MLOps pipelines to reduce compliance overhead.

Load-bearing premise

The 16 property extensions and three-layer architecture will integrate smoothly with emerging AI standards and cover all required governance evidence without needing significant further changes.

What would settle it

Observing whether the OSCAL output from the system is accepted as sufficient evidence in an actual regulatory review or third-party audit for an Annex III AI system, or if key elements are missing.

Figures

Figures reproduced from arXiv: 2604.13767 by Antonio Berlanga de Jes\'us, Jos\'e Manuel Molina L\'opez, Miguel \'Angel Patricio Guisado, Rodrigo Cilla Ugarte.

**Figure 1.** Figure 1: Three-layer architecture. OSCAL policies define controls with the proposed AI extensions (including metric selection and thresholds). Seven [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Excerpt from a real OSCAL policy file (abridged for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

AI Assurance -- producing the machine-readable evidence required to demonstrate compliance with AI governance frameworks -- has mature policy scaffolding but lacks the infrastructure to operationalize it. Organizations building high-risk AI systems under the EU AI Act face a gap: frameworks such as the EU AI Act, ISO/IEC 42001, and NIST AI RMF specify what to assure but provide no executable format for how. This paper proposes OSCAL -- the NIST standard adopted for FedRAMP cybersecurity compliance -- as a candidate interchange format for AI governance, complementing rather than replacing the emerging JTC21 standards stack. We define 16 property extensions covering lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification, and present a three-layer Compliance-as-Code architecture (policy, evidence, enforcement) that generates assurance evidence as a byproduct of model training. The SDK produces native OSCAL Assessment Results validated against the NIST JSON schema. We test the approach on two Annex III high-risk systems: a credit scoring model and a medical imaging segmentation system. The architecture and reference implementation are open-source under Apache 2.0.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a usable OSCAL extension plus a three-layer code pipeline that turns model training into machine-readable compliance artifacts for two Annex III systems.

read the letter

The useful part is the concrete implementation: 16 OSCAL property extensions for lifecycle, risk traceability, and enforcement, wrapped in an open-source SDK that emits validated Assessment Results as a side effect of training. They ran it on a credit scorer and a medical imaging model and got clean output against the NIST schema. That is real engineering work that addresses the gap between policy text and executable evidence, and the Apache 2.0 release lets others try it immediately.

Referee Report

3 major / 3 minor

Summary. The paper proposes OSCAL (the NIST standard for cybersecurity compliance) as an interchange format for AI governance evidence under frameworks such as the EU AI Act. It defines 16 property extensions addressing lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification; introduces a three-layer Compliance-as-Code architecture (policy, evidence, enforcement) that generates native OSCAL Assessment Results as a byproduct of model training; and reports successful application to two Annex III high-risk systems (credit scoring and medical imaging segmentation). The reference implementation is released open-source under Apache 2.0.

Significance. If the extensions and architecture prove compatible with the JTC21 stack and deliver sufficient coverage, the work would supply a concrete, executable mechanism for producing and exchanging machine-readable AI assurance evidence, directly addressing the operational gap between high-level governance frameworks and implementable compliance processes. The open-source release and concrete system tests constitute reproducible starting points for further validation.

major comments (3)

[Architecture and Extensions sections] The central claim that the 16 OSCAL extensions plus three-layer architecture complement (rather than conflict with) the emerging JTC21 standards stack and provide sufficient coverage for EU AI Act obligations (Articles 9-15 and Annex III) is unsupported by any explicit mapping, combined schema validation, or gap analysis against JTC21 artifacts.
[Evaluation section] The evaluation on two Annex III systems states that OSCAL output was successfully generated but supplies no quantitative metrics, error rates, coverage statistics, or comparison against alternative evidence formats or manual compliance processes.
[Compliance-as-Code architecture description] The assumption that the generated assurance evidence satisfies all required governance obligations without further major modifications remains untested; no verification against the full set of Annex III requirements or risk-management obligations is presented.

minor comments (3)

Include a dedicated table or appendix listing the exact JSON schema definitions, allowed values, and rationale for each of the 16 property extensions.
Clarify the precise integration points between the SDK and common ML training frameworks (e.g., PyTorch or TensorFlow) to aid reproducibility.
Add a short discussion of potential maintenance overhead for keeping the OSCAL extensions synchronized with future revisions of the NIST AI RMF or JTC21 standards.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important areas for strengthening the manuscript's claims regarding compatibility, evaluation rigor, and scope. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Architecture and Extensions sections] The central claim that the 16 OSCAL extensions plus three-layer architecture complement (rather than conflict with) the emerging JTC21 standards stack and provide sufficient coverage for EU AI Act obligations (Articles 9-15 and Annex III) is unsupported by any explicit mapping, combined schema validation, or gap analysis against JTC21 artifacts.

Authors: We agree that an explicit mapping and gap analysis would strengthen the complementarity claim. The manuscript positions the OSCAL extensions as additive to the JTC21 stack (focusing on machine-readable evidence interchange) rather than a replacement, but does not include a full mapping because JTC21 artifacts remain under active development. We will revise the Architecture and Extensions sections to add a high-level compatibility discussion, including how the 16 properties align with key EU AI Act requirements without claiming full coverage. A complete schema validation against finalized JTC21 is noted as future work. revision: partial
Referee: [Evaluation section] The evaluation on two Annex III systems states that OSCAL output was successfully generated but supplies no quantitative metrics, error rates, coverage statistics, or comparison against alternative evidence formats or manual compliance processes.

Authors: The Evaluation section demonstrates feasibility through successful generation of schema-validated OSCAL Assessment Results on two Annex III systems as a training byproduct. Schema validation ensures structural correctness with no generation errors by design. Traditional quantitative metrics such as error rates are less applicable here, as the contribution is automation of evidence production rather than predictive accuracy. We will revise the Evaluation section to include additional details on property coverage in the generated artifacts and qualitative observations on process efficiency from the open-source implementation. No head-to-head comparison with alternative formats was conducted, as the paper focuses on the OSCAL approach. revision: partial
Referee: [Compliance-as-Code architecture description] The assumption that the generated assurance evidence satisfies all required governance obligations without further major modifications remains untested; no verification against the full set of Annex III requirements or risk-management obligations is presented.

Authors: The manuscript does not claim or assume that the generated evidence satisfies all governance obligations without modifications. The three-layer architecture is presented as a mechanism to produce native OSCAL evidence automatically, which organizations can then incorporate into their broader compliance processes. The tests on Annex III systems validate the generation pipeline but do not constitute a full audit of Articles 9-15. We will revise the Compliance-as-Code architecture description to explicitly qualify the scope, clarifying that the approach facilitates evidence creation but requires integration with human-led risk management and does not replace full obligation verification. revision: yes

Circularity Check

0 steps flagged

No circularity: proposal extends external NIST standard without self-referential reductions

full rationale

The manuscript proposes 16 OSCAL property extensions and a three-layer Compliance-as-Code architecture that generates evidence from model training. It builds directly on the independently maintained NIST OSCAL standard (cited as external) and reports concrete outputs on two Annex III systems. No equations, fitted parameters, or derivations appear; the central claims consist of definitional extensions and architectural description rather than any step that reduces by construction to inputs defined within the paper. Self-citations are absent from load-bearing positions, and the JTC21 complementarity claim is an untested assumption rather than a circular derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the domain assumption that OSCAL is extensible to AI governance and that the proposed architecture will produce valid, regulator-accepted evidence without additional standards conflicts.

axioms (2)

domain assumption Existing AI governance frameworks specify what must be assured but provide no executable interchange format
Stated directly in the abstract as the motivating gap.
domain assumption OSCAL extensions can complement rather than replace the JTC21 standards stack
The paper positions its contribution as complementary without demonstrating integration.

invented entities (2)

16 OSCAL property extensions for AI no independent evidence
purpose: Cover lifecycle phases, enforcement semantics, risk traceability, and risk-acceptance justification
New additions defined by the paper to adapt the cybersecurity format to AI.
Three-layer Compliance-as-Code architecture no independent evidence
purpose: Generate assurance evidence automatically during model training
Proposed structure whose effectiveness is asserted but not independently verified in the abstract.

pith-pipeline@v0.9.0 · 5506 in / 1459 out tokens · 35262 ms · 2026-05-10T12:27:49.184380+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification
cs.CY 2026-04 unverdicted novelty 4.0

DEMM defines four executable evidence-sufficiency categories plus a conflicting category for agentic AI decisions and rolls per-property verdicts into a five-level maturity rubric.

Reference graph

Works this paper leans on

21 extracted references · 4 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Economics of artificial intelligence governance,

N. Kshetri, “Economics of artificial intelligence governance,”Com- puter, vol. 57, no. 4, pp. 113–118, 2024

2024
[2]

Computational compliance for AI regulation: Blueprint for a new research domain,

B. Marino and N. D. Lane, “Computational compliance for AI regulation: Blueprint for a new research domain,” 2026

2026
[3]

Simplifying software compliance: AI technologies in drafting technical docu- mentation for the AI act,

F. Sovrano, E. Hine, S. Anzolut, and A. Bacchelli, “Simplifying software compliance: AI technologies in drafting technical docu- mentation for the AI act,”Empirical Software Engineering, vol. 30, no. 91, 2025

2025
[4]

Charting the course for NIST OSCAL,

M. Iorga and M. Nguyen, “Charting the course for NIST OSCAL,” National Institute of Standards and Technology, Tech. Rep. NIST CSWP 53 (Initial Public Draft), Dec. 2025

2025
[5]

Regulatory technology and supervisory technology: Current status, facilitators, and barriers,

N. Kshetri, “Regulatory technology and supervisory technology: Current status, facilitators, and barriers,”Computer, vol. 56, no. 1, pp. 64–75, 2023

2023
[6]

Interplay between the AI Act and the EU digital legislative framework,

H. Graux, K. Garstka, N. Murali, J. Cave, and M. Botterman, “Interplay between the AI Act and the EU digital legislative framework,” European Parliament, ITRE Committee, Tech. Rep. PE 778.575, 2025. [Online]. Available: https://www.europarl.europa.eu/RegData/etudes/ STUD/2025/778575/ECTI_STU(2025)778575_EN.pdf

2025
[7]

Governance of high-risk AI systems in healthcare and credit scoring,

S. Bartsch, O. Behn, A. Benlian, R. Brownsword, S. Bücker, M. Düwell, N. Formánek, M. Jungtäubl, M. Leyer, A. Richter, J.- H. Schmidt, and M. Will-Zocholl, “Governance of high-risk AI systems in healthcare and credit scoring,”Business & Information Systems Engineering, vol. 67, no. 4, pp. 563–581, 2025

2025
[8]

Navigating the upcoming European Union AI Act,

M. Wagner, M. Borg, and P . Runeson, “Navigating the upcoming European Union AI Act,”IEEE Software, vol. 41, no. 1, pp. 19–24, 2024

2024
[9]

An AI harms and governance framework for trustworthy AI,

J. B. Peckham, “An AI harms and governance framework for trustworthy AI,”Computer, vol. 57, no. 3, pp. 59–68, 2024

2024
[10]

Model cards for model reporting,

M. Mitchell, S. Wu, A. Zaldivar, P . Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, and T. Gebru, “Model cards for model reporting,” inProc. Conf. Fairness, Accountability, and Transparency (FAT* ’19), 2019, pp. 220–229

2019
[11]

FactSheets: Increasing trust in AI services through supplier’s declarations of conformity,

M. Arnold, R. K. E. Bellamy, M. Hind, S. Houde, S. Mehta, A. Mo- jsilovic, R. Nair, K. Natesan Ramamurthy, D. Reimer, A. Olteanu, D. Piorkowski, J. Tsay, and K. R. Varshney, “FactSheets: Increasing trust in AI services through supplier’s declarations of conformity,” IBM Journal of Research and Development, vol. 63, no. 4/5, pp. 6:1– 6:13, 2019

2019
[12]

The possibility of fairness: Revisiting the impossibility theorem in practice,

A. Bell, L. Bynum, N. Drushchak, T. Herasymova, L. Rosenblatt, and J. Stoyanovich, “The possibility of fairness: Revisiting the impossibility theorem in practice,” inProc. Conf. Fairness, Accountability, and Transparency (FAccT ’23), 2023. [Online]. Available: https://arxiv.org/abs/2302.06347

work page arXiv 2023
[13]

What we know about AIBOMs: Results from a multivocal litera- ture review on artificial intelligence bill of materials,

S. Nocera, M. Di Penta, F. Ahmed, S. Romano, and G. Scanniello, “What we know about AIBOMs: Results from a multivocal litera- ture review on artificial intelligence bill of materials,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 6, 2025

2025
[14]

Statlog (German Credit Data),

H. Hofmann, “Statlog (German Credit Data),” UCI Machine Learning Repository, 1994. [Online]. Available: https://archive. ics.uci.edu/ml/datasets/statlog+(german+credit+data)

1994
[15]

Automation bias in the AI Act: On the legal implications of attempting to de-bias human oversight of AI,

J. Laux and H. Ruschemeier, “Automation bias in the AI Act: On the legal implications of attempting to de-bias human oversight of AI,”European Journal of Risk Regulation, vol. 16, pp. 1519–1534, 2025

2025
[16]

Governing agentic AI: Security, identity, and over- sight in the age of autonomous intelligent systems,

N. Kshetri, “Governing agentic AI: Security, identity, and over- sight in the age of autonomous intelligent systems,”Computer, vol. 58, no. 8, pp. 123–129, 2025

2025
[17]

AI Agents Under EU Law

L. Nannini, A. L. Smith, M. J. Maggini, E. Panai, S. Feliciano, A. Tiulkanov, E. Maran, J. Gealy, and P . Bisconti, “AI agents under EU law,” arXiv:2604.04604, 2026. [Online]. Available: https://arxiv.org/abs/2604.04604 8

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Agent drift: Quantifying behavioral degradation in multi-agent LLM systems over extended interactions,

A. Rath, “Agent drift: Quantifying behavioral degradation in multi-agent LLM systems over extended interactions,”arXiv preprint, 2026. [Online]. Available: https://arxiv.org/abs/2601. 04170

2026
[19]

MI9: An integrated runtime governance framework for agentic AI,

C. L. Wang, T. Singhal, A. Kelkar, and J. Tuo, “MI9: An integrated runtime governance framework for agentic AI,”arXiv preprint,
[20]

MI9 – agent intelligence protocol: Runtime governance for agentic AI systems,

[Online]. Available: https://arxiv.org/abs/2508.03858

work page arXiv
[21]

Kaptein, V.-J

D. Furman and M. Goldszmidt, “Runtime governance for AI agents: Policies on paths,”arXiv preprint, 2026. [Online]. Available: https://arxiv.org/abs/2603.16586 PLACE PHOTO HERE Rodrigo Cilla Ugarteis founder of Venturalítica S.L. and holds a PhD in Computer Science from Universidad Carlos III de Madrid. He pre- viously led CE marking for medical device sof...

work page arXiv 2026