pith. sign in

arxiv: 2605.19755 · v1 · pith:5KULLJN5new · submitted 2026-03-17 · 💻 cs.SE · cs.AI· cs.CR· cs.LG· cs.MA

Operationalising Artificial Intelligence Bills of Materials (AIBOMs) for Verifiable AI Provenance and Lifecycle Assurance

Pith reviewed 2026-05-21 10:07 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.CRcs.LGcs.MA
keywords Artificial Intelligence Bill of MaterialsAIBOMprovenancereproducibilitysoftware supply chainCycloneDXvulnerability enrichmentAI lifecycle assurance
0
0 comments X

The pith

An AIBOM schema extension and autonomous AI pipeline deliver machine-verifiable provenance with 98.7% reproducibility fidelity in containerised workflows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a formal schema for Artificial Intelligence Bills of Materials by extending the CycloneDX standard to include AI-specific provenance, model lineage, and disclosure metadata. It then builds an autonomous pipeline that uses this schema for continuous environment inspection, vulnerability enrichment, and reproducibility auditing through cryptographic and agent-driven methods. Empirical tests on containerised analytic workflows report 98.7% reproducibility fidelity, 96.2% vulnerability match precision, and 63% less manual oversight. A sympathetic reader would care because these results point to a practical way to make complex AI supply chains more transparent and secure without constant human intervention. This approach advances efforts to assure AI systems meet reproducibility and security standards.

Core claim

The central claim is that operationalising AIBOMs through structured schema engineering, cryptographic validation, and agent-driven automation provides a feasible method for verifiable software provenance and reproducible AI lifecycle validation, as shown by the high fidelity metrics in containerised workflows.

What carries the argument

The AIBOM schema extension of CycloneDX, which captures AI-specific provenance and lineage, serving as the foundation for the autonomous pipeline's machine-verifiable chains.

Load-bearing premise

The AIBOM schema extension and agent-driven automation capture all necessary AI-specific provenance and lineage details to enable generalisable verifiable assurance.

What would settle it

A demonstration of an AI workflow where the AIBOM pipeline claims high reproducibility fidelity but the actual model outputs or environment states cannot be reproduced due to missing lineage information.

read the original abstract

Artificial Intelligence (AI) systems are increasingly dependent on complex, multi-layered software supply chains that introduce challenges for reproducibility, transparency, and security assurance. This study presents an Artificial Intelligence Bill of Materials (AIBOM) schema extending the CycloneDX standard to capture AI-specific provenance, model lineage, and disclosure metadata. The framework provides a formalised approach to verifiable software provenance through structured schema engineering, cryptographic validation, and agent-driven automation. An autonomous AI pipeline is developed to perform continuous environment inspection, vulnerability enrichment, and reproducibility auditing using machine-verifiable provenance chains. Empirical evaluation demonstrates 98.7% reproducibility fidelity, 96.2% vulnerability match precision, and a 63% reduction in manual oversight across containerised analytic workflows. These results confirm the feasibility of automated provenance assurance and reproducible AI lifecycle validation. The AIBOM framework advances the scientific foundations of software supply chain transparency and AI reproducibility engineering, offering a generalisable methodology for securing AI systems, strengthening provenance integrity, and supporting compliance with international information security standards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an Artificial Intelligence Bill of Materials (AIBOM) schema extending the CycloneDX standard to capture AI-specific provenance, model lineage, and disclosure metadata. It describes a framework using structured schema engineering, cryptographic validation, and agent-driven automation for verifiable software provenance. An autonomous AI pipeline performs continuous environment inspection, vulnerability enrichment, and reproducibility auditing using machine-verifiable provenance chains. Empirical evaluation on containerised analytic workflows reports 98.7% reproducibility fidelity, 96.2% vulnerability match precision, and a 63% reduction in manual oversight, concluding that the approach enables automated provenance assurance and supports compliance with information security standards.

Significance. If the results hold, the work offers a practical, extensible schema and automation approach for improving transparency and reproducibility in AI software supply chains. The integration of cryptographic validation with agent-driven processes could support lifecycle assurance and regulatory compliance, particularly in containerised environments where the reported metrics indicate measurable reductions in manual effort.

major comments (3)
  1. [Empirical Evaluation] The central claim of a generalisable methodology for verifiable AI provenance rests on the AIBOM schema and pipeline, yet the empirical evaluation is restricted to containerised analytic workflows; this leaves untested whether the schema captures necessary details such as data parallelism, checkpointing, or hardware-specific artifacts required for non-containerised architectures like bare-metal LLM training or federated learning.
  2. [Abstract] The abstract states specific performance numbers (98.7% reproducibility fidelity, 96.2% vulnerability match precision) but the provided description indicates no accompanying methods section, dataset details, or error analysis, which is load-bearing for assessing whether the results are robust or influenced by narrow test conditions.
  3. [AIBOM Schema Definition] The assumption that the CycloneDX-extended AIBOM schema and agent-driven automation sufficiently capture all AI-specific provenance and lineage details is used to claim generalisable verifiable assurance, but this is only demonstrated in the containerised setting without broader validation across diverse AI architectures.
minor comments (2)
  1. [Schema Engineering] Clarify the exact version of CycloneDX being extended and provide explicit schema examples or diagrams to aid reader understanding of the AI-specific extensions.
  2. [Automation Pipeline] Ensure consistent use of terminology for 'provenance chains' and 'machine-verifiable' elements throughout to avoid minor ambiguity in the automation pipeline description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have reviewed each major comment carefully and provide point-by-point responses below, indicating where revisions will be incorporated in the next version of the paper.

read point-by-point responses
  1. Referee: [Empirical Evaluation] The central claim of a generalisable methodology for verifiable AI provenance rests on the AIBOM schema and pipeline, yet the empirical evaluation is restricted to containerised analytic workflows; this leaves untested whether the schema captures necessary details such as data parallelism, checkpointing, or hardware-specific artifacts required for non-containerised architectures like bare-metal LLM training or federated learning.

    Authors: We agree that the empirical evaluation is restricted to containerised analytic workflows and that this scope limits direct evidence for generalisability to other architectures. The AIBOM schema was intentionally designed as an extensible extension of CycloneDX to support additional provenance fields, including those for data parallelism, checkpointing, and hardware artifacts. However, we did not perform experiments outside the containerised setting. In the revised manuscript we will add an explicit Limitations section that acknowledges this restriction, discusses how the schema can be extended for bare-metal LLM training and federated learning, and outlines concrete metadata additions for those cases. revision: partial

  2. Referee: [Abstract] The abstract states specific performance numbers (98.7% reproducibility fidelity, 96.2% vulnerability match precision) but the provided description indicates no accompanying methods section, dataset details, or error analysis, which is load-bearing for assessing whether the results are robust or influenced by narrow test conditions.

    Authors: The full manuscript contains a Methods and Evaluation section that specifies the containerised workflow dataset, the reproducibility and vulnerability metrics, the experimental protocol, and an error analysis. The abstract summarises the headline results from that section. To improve standalone readability we will revise the abstract to include a short clause on the evaluation setting and will expand the error analysis subsection with additional discussion of test-condition influences and statistical robustness. revision: yes

  3. Referee: [AIBOM Schema Definition] The assumption that the CycloneDX-extended AIBOM schema and agent-driven automation sufficiently capture all AI-specific provenance and lineage details is used to claim generalisable verifiable assurance, but this is only demonstrated in the containerised setting without broader validation across diverse AI architectures.

    Authors: We accept that the empirical demonstration is confined to containerised workflows and that broader validation would strengthen the generalisability claim. The schema definition itself was derived from a systematic review of AI provenance requirements and is not tied to any single execution environment. In revision we will clarify this distinction in the schema section, add illustrative examples of schema usage for non-containerised scenarios, and cross-reference the new Limitations section. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces an AIBOM schema extension to CycloneDX, describes an autonomous pipeline for environment inspection and provenance auditing, and reports empirical performance metrics (98.7% reproducibility fidelity, 96.2% vulnerability precision, 63% oversight reduction) obtained from testing on containerised analytic workflows. These results are presented as demonstrations of feasibility rather than outputs derived from fitted parameters or self-referential definitions. No equations, self-citations, or ansatzes are shown in the provided text that reduce the central claims to their own inputs by construction. The evaluation measures the proposed system's behavior against observable outcomes in a specific setting, which constitutes independent empirical content rather than tautological renaming or load-bearing self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that CycloneDX can be meaningfully extended for AI without introducing unstated incompatibilities, and that the empirical test environment is representative. No explicit free parameters or invented physical entities are described.

axioms (1)
  • domain assumption CycloneDX standard provides a suitable extensible base for AI-specific provenance metadata.
    Invoked when the paper states it extends CycloneDX to capture AI-specific elements.
invented entities (1)
  • AIBOM schema no independent evidence
    purpose: Structured metadata format for AI model lineage and disclosure
    New schema introduced to extend CycloneDX; no independent falsifiable evidence provided beyond the authors' implementation.

pith-pipeline@v0.9.0 · 5732 in / 1415 out tokens · 46618 ms · 2026-05-21T10:07:40.528923+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages

  1. [1]

    JSON template for a SACRO-specific AIBOM schema 2

  2. [2]

    bomFormat

    "bomFormat": "CycloneDX",

  3. [3]

    specVersion

    "specVersion": "1.5",

  4. [4]

    timestamp

    "timestamp": "2025-06-20T14:30:00Z",

  5. [5]

    name": "AIBOM Generator

    "name": "AIBOM Generator",

  6. [6]

    type": "application

    "type": "application",

  7. [7]

    name": "Cancer Risk Prediction Pipeline

    "name": "Cancer Risk Prediction Pipeline",

  8. [8]

    content":

    "content": "8f14e45fceea167a5a36dedd4bea2543eaa0d5e5ac4f6dc0fa6efb2c73d153a3"

  9. [9]

    name": "XGBoost_Cancer_Predictor

    "name": "XGBoost_Cancer_Predictor",

  10. [10]

    content":

    "content": "27e52e8e2bc6d7814e212f3f334b2e7184c87e993a6c9e712b02c0e47bb8a1c1"

  11. [11]

    name": "trainingDataSource

    "name": "trainingDataSource",

  12. [12]

    value":

    "value": "Cancer Registry Dataset v5 (UUID: 123e4567-e89b-12d3-a456-426614174000)"

  13. [13]

    name": "inferenceContext

    "name": "inferenceContext",

  14. [14]

    value":

    "value": "{ \"batchSize\": 128, \"precision\": \"fp32\", \"hardware\": \"NVIDIA A100 GPU\", \"quantisation\": \"none\" }" Dr. Petar Radanliev Parks Road, Oxford OX1 3PJ United Kingdom Email: petar.radanliev@cs.ox.ac.uk BA Hons., MSc., Ph.D. Post-Doctorate 15

  15. [15]

    name": "treContainerHash

    "name": "treContainerHash",

  16. [16]

    value":

    "value": "sha256:c9d02c39b5f2129e9f3a9fd680e7e99909b77f9e12e71029f1c2ae38e8c0a120"

  17. [17]

    name": "disclosureControlType

    "name": "disclosureControlType",

  18. [18]

    value":

    "value": "diff-privacy-laplace"

  19. [19]

    name": "outputDigest

    "name": "outputDigest",

  20. [20]

    value":

    "value": "sha256:b4b147bc522828731f1a016bfa72c073e5c57c3b0e6c2cfb64ccda31cbe48f4c"

  21. [21]

    externalReferences

    "externalReferences": [

  22. [22]

    type": "vulnerability

    "type": "vulnerability",

  23. [23]

    url": "https://osv.dev/vulnerability/CVE-2023-2953

    "url": "https://osv.dev/vulnerability/CVE-2023-2953"

  24. [24]

    publicKey

    "publicKey": "MFYwEAYHKoZIzj0CAQYFK4EEAAoDQgAEEZkYwBq...",

  25. [25]

    signature

    "signature": "MEUCIQDrIk6SNmz9Vi7...",

  26. [26]

    timestamp

    "timestamp": "2025-06-20T14:31:00Z"

  27. [27]

    Notes about the schema: • All properties fields represent SACRO-specific extensions

    } 76. Notes about the schema: • All properties fields represent SACRO-specific extensions. • hashes are calculated using SHA-256 to verify artefact integrity. • externalReferences include CVE records relevant to software components. • The signature block ensures non-repudiation and integrity of the entire SBOM artefact. 4.1 Integration into a Pipeline Not...

  28. [28]

    Environment Setup: Install necessary Python packages:

  29. [29]

    !pip install jsonschema

  30. [30]

    Import Modules and Load AIBOM JSON:

  31. [32]

    Petar Radanliev Parks Road, Oxford OX1 3PJ United Kingdom Email: petar.radanliev@cs.ox.ac.uk BA Hons., MSc., Ph.D

    from jsonschema.exceptions import ValidationError with open('aibom_output.json') as f: Dr. Petar Radanliev Parks Road, Oxford OX1 3PJ United Kingdom Email: petar.radanliev@cs.ox.ac.uk BA Hons., MSc., Ph.D. Post-Doctorate 16

  32. [33]

    aibom = json.load(f)

  33. [34]

    Define SACRO-Specific AIBOM Schema (Use the schema provided above.)

  34. [35]

    validate(instance=aibom, schema=aibom_schema)

  35. [36]

    AIBOM schema validation successful

    print("AIBOM schema validation successful.") except ValidationError as ve:

  36. [37]

    Schema validation failed: {ve.message}

    print(f"Schema validation failed: {ve.message}")

  37. [38]

    o Signed and archived for reproducibility

    Output Log or Store Result Output can be: o Stored in a pipeline results directory. o Signed and archived for reproducibility. o Passed to downstream audit or VEX/CSAF modules. Here is the command-line version:

  38. [39]

    #!/usr/bin/env python3 2

  39. [40]

    from jsonschema import validate

  40. [41]

    from jsonschema.exceptions import ValidationError 7

  41. [42]

    # Define the SACRO-specific AIBOM schema

  42. [43]

    bomFormat

    "bomFormat": {"type": "string", "enum": ["CycloneDX"]},

  43. [44]

    specVersion

    "specVersion": {"type": "string", "pattern": "^1\.5$"},

  44. [45]

    version": {

    "version": {"type": "integer"},

  45. [46]

    timestamp

    "timestamp": {"type": "string", "format": "date-time"},

  46. [47]

    vendor": {

    "vendor": {"type": "string"},

  47. [49]

    version": {

    "version": {"type": "string"}

  48. [50]

    required

    "required": ["vendor", "name", "version"]

  49. [54]

    Petar Radanliev Parks Road, Oxford OX1 3PJ United Kingdom Email: petar.radanliev@cs.ox.ac.uk BA Hons., MSc., Ph.D

    "hashes": { Dr. Petar Radanliev Parks Road, Oxford OX1 3PJ United Kingdom Email: petar.radanliev@cs.ox.ac.uk BA Hons., MSc., Ph.D. Post-Doctorate 17

  50. [58]

    required

    "required": ["type", "name", "version", "hashes"]

  51. [59]

    required

    "required": ["timestamp", "tools", "component"]

  52. [60]

    type": {

    "type": {"type": "string"},

  53. [62]

    version": {

    "version": {"type": "string"},

  54. [63]

    alg": {"type

    "alg": {"type": "string"},

  55. [64]

    content": {

    "content": {"type": "string"}

  56. [65]

    required

    "required": ["alg", "content"]

  57. [66]

    name": {

    "name": {"type": "string"},

  58. [67]

    value": {

    "value": {"type": "string"}

  59. [68]

    required

    "required": ["name", "value"]

  60. [69]

    required

    "required": ["type", "name", "version", "hashes", "properties"]

  61. [70]

    required

    "required": ["bomFormat", "specVersion", "version", "metadata", "components"]

  62. [71]

    # Command-line interface

  63. [72]

    Validate a SACRO-specific AIBOM JSON file

    parser = argparse.ArgumentParser(description="Validate a SACRO-specific AIBOM JSON file.")

  64. [73]

    json_file

    parser.add_argument("json_file", help="Path to the AIBOM JSON file")

  65. [74]

    args = parser.parse_args() 97. 98

  66. [75]

    with open(args.json_file, "r") as f:

  67. [76]

    Petar Radanliev Parks Road, Oxford OX1 3PJ United Kingdom Email: petar.radanliev@cs.ox.ac.uk BA Hons., MSc., Ph.D

    validate(instance=data, schema=aibom_schema) Dr. Petar Radanliev Parks Road, Oxford OX1 3PJ United Kingdom Email: petar.radanliev@cs.ox.ac.uk BA Hons., MSc., Ph.D. Post-Doctorate 18

  68. [77]

    Validation successful: AIBOM conforms to SACRO-specific schema

    print(" Validation successful: AIBOM conforms to SACRO-specific schema.")

  69. [78]

    except ValidationError as ve:

  70. [79]

    Validation failed: {ve.message}

    print(f" Validation failed: {ve.message}")

  71. [80]

    except Exception as e:

  72. [81]

    Error: {str(e)}

    print(f" Error: {str(e)}") 107

  73. [82]

    __main__

    if __name__ == "__main__":

  74. [83]

    This tool can be executed via terminal as:

    main() 110. This tool can be executed via terminal as:

  75. [84]

    aibom-toolkit

    python3 sacro_aibom_validator.py path/to/your/aibom_file.json The process map in Figure 3 captures the technical workflow for embedding and validating SACRO-specific AIBOMs within TRE-compatible systems. It begins with a Data Processor, which executes analytical tasks inside a Job Container that enforces environment isolation, disclosure control, and secu...

  76. [85]

    TREvolution, ‘TREvolution - DARE UK’, 2025, URL: https://dareuk.org.uk/how- we-work/ongoing-activities/trevolution/

  77. [86]

    O’Sullivan, Katherine., Markovic, Milan., Dymiter, Jaroslaw., Scheliga, Bernhard., Odo, Chinasa., and Wilde, Katie, ‘Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments’, Int J Popul Data Sci, vol. 10, no. 2, Feb. 2025, doi: 10.23889/IJPDS.V10I2.2464...

  78. [87]

    EGI TRE Working Group, ‘Trusted Research Environments Landscape Report’, 2024, URL: https://documents.egi.eu/public/ShowDocument?docid=4169

  79. [88]

    GRAIMatter, ‘GRAIMATTER: Guidelines and Resources for Artificial Intelligence Model Access from Trusted Research Environments - DARE UK’, 2023, URL: https://dareuk.org.uk/how-we-work/previous-activities/dare-uk- phase-1-sprint-exemplar-projects/graimatter-guidelines-and-resources-for- artificial-intelligence-model-access-from-trusted-research-environments/

  80. [89]

    [Online]

    OWASP, ‘OWASP AIBOM | OWASP Foundation’, 2025. [Online]. Available: https://owasp.org/www-project-aibom/. [Accessed: 12-Jul-2025], URL: https://owasp.org/www-project-aibom/

Showing first 80 references.