From Runtime Records to Legal Findings: An Evidentiary-Adequacy Criterion for Agentic AI Oversight

Jeroen Janssen

arxiv: 2607.00941 · v1 · pith:KAPSVXUDnew · submitted 2026-07-01 · 💻 cs.CY

From Runtime Records to Legal Findings: An Evidentiary-Adequacy Criterion for Agentic AI Oversight

Jeroen Janssen This is my paper

Pith reviewed 2026-07-02 06:07 UTC · model grok-4.3

classification 💻 cs.CY

keywords agentic AIruntime recordsevidentiary adequacyEU AI Actoversightlegal findingsbinary factsprovenance

0 comments

The pith

A runtime record supports a binary legal finding about events only if it supplies both event typing to the legal category and the specific relation the finding depends on.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out an evidentiary-adequacy criterion that limits when logs and traces from agentic AI can be treated as sufficient for certain legal questions. It claims that a record answers a binary determination, such as whether protected data crossed a boundary or delegated authority remained valid, only when it contains both a typing that links recorded events to the operative legal category and the relation, such as provenance or temporal validity, on which the answer rests. The criterion is framed as a necessity condition for a bounded class of factual findings rather than a claim that the two elements are always enough. It is then applied to selected obligations under the EU AI Act to show why tamper-proof logs, generic frameworks, and standalone provenance structures fall short. The argument is tied to ideas from systems theory and runtime verification to explain the boundary between traces and the properties they must support.

Core claim

A runtime record can answer a binary finding of fact about specific events and their relations only if it carries both a typing that maps recorded events to the legally operative category and the relation, such as provenance, authority, derivation, or temporal validity, on which the determination's truth depends.

What carries the argument

The evidentiary-adequacy criterion, which requires both typing of events to legal categories and the specific relation on which a binary factual finding depends.

If this is right

Tamper-proof logs by themselves cannot establish the relevant findings.
Generic process frameworks cannot establish the relevant findings.
Provenance structures alone cannot establish the relevant findings.
The criterion must be met for selected EU AI Act oversight obligations to be satisfied from runtime records.
The requirement aligns with the trace-versus-hyperproperty boundary in runtime verification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Record-generation mechanisms may need explicit fields for both typing and the relevant relations to meet oversight needs.
Tools for checking runtime records could be extended to verify the presence of the two required elements rather than integrity alone.
The same dual requirement may surface in oversight settings outside the EU AI Act when binary factual questions about events arise.

Load-bearing premise

The legal determinations in question are limited to a bounded class of binary findings of fact about specific events and their relations.

What would settle it

A concrete runtime record that answers one of the listed binary findings of fact without containing either the required typing or the relation the finding depends upon.

read the original abstract

Agentic AI systems generate runtime records, logs, traces, and audit artefacts, but the existence or integrity of such records does not by itself establish that legally operative oversight findings can be recovered from them. This technical report defines an evidentiary-adequacy criterion for a bounded class of determinations: binary findings of fact about specific events and their relations, such as whether protected data crossed a boundary, whether a human could intervene, whether an information barrier held, or whether delegated authority was valid at the moment of use. The criterion states that a runtime record can answer such a determination only if it carries both a typing that maps recorded events to the legally operative category and the relation, such as provenance, authority, derivation, or temporal validity, on which the determination's truth depends. The claim is one of necessity, not sufficiency. The report instantiates the criterion against selected EU AI Act oversight obligations and explains why tamper-proof logs, generic process frameworks, and provenance structures alone cannot establish the relevant findings. It further relates the argument to requisite variety, the Good Regulator Theorem, and the trace-versus-hyperproperty boundary of runtime verification. Companion materials and the experiment protocol are archived on Zenodo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a necessity condition for runtime records to recover binary legal findings but frames it as definitional without derivations or counterexamples.

read the letter

The main point to take from this is that the paper introduces an evidentiary-adequacy criterion: for a bounded set of binary findings of fact about events and relations under rules like the EU AI Act, a runtime record works only if it supplies both a typing to the legal category and the specific relation (provenance, authority, temporal validity) the finding depends on. It is explicit that this is necessity, not sufficiency, and that generic tamper-proof logs or provenance structures fall short on their own.

What the paper does is connect this scoping to existing ideas in runtime verification and the trace-versus-hyperproperty distinction, plus requisite variety and the Good Regulator Theorem. The instantiation against selected EU AI Act obligations shows where current logging practices leave gaps for oversight of agentic systems. That pairing of typing plus relation as the required pair is the clearest new formulation relative to the provenance literature it cites.

The soft spot is that the necessity claim is presented directly from the semantics of what it means to answer such a determination, without visible derivation steps, formalization, or worked counterexamples in the abstract and described sections. The bounded class of binary findings keeps the argument tight but also means the reach is limited to that class. No circularity appears, and the stress-test note that the claim follows from the scoping holds up on the given material.

This is for people designing or regulating audit systems for agentic AI who need to separate what logs can technically store from what regulators can actually recover as findings. It deserves a serious referee because the conceptual gap it names is real for EU AI Act-style obligations, even if the report would benefit from more explicit examples or the archived experiment protocol to strengthen the case.

Referee Report

2 major / 2 minor

Summary. The paper defines an evidentiary-adequacy criterion for a bounded class of binary legal findings of fact about specific events and relations in agentic AI systems. The criterion states that a runtime record answers such a finding only if it supplies both a typing that maps recorded events to the legally operative category and the relation (e.g., provenance, authority, derivation, or temporal validity) on which the finding's truth depends. The report instantiates the criterion against selected EU AI Act oversight obligations, argues that tamper-proof logs, generic provenance, and process frameworks are insufficient by themselves, and connects the argument to requisite variety, the Good Regulator Theorem, and the trace-versus-hyperproperty distinction in runtime verification.

Significance. If the criterion holds, it supplies a precise conceptual filter for evaluating whether AI runtime artifacts can support legal determinations, exposing gaps in current logging and audit practices. The explicit links to Ashby's requisite variety and the Good Regulator Theorem, together with the trace/hyperproperty distinction, provide independent conceptual grounding rather than purely ad-hoc stipulation. The bounded scoping to binary findings of fact about events and relations keeps the claim falsifiable in principle.

major comments (2)

[Definition of the evidentiary-adequacy criterion (early sections)] The necessity claim (both typing and the specific relation are required) is introduced directly from the semantics of 'answering' a determination without an explicit derivation, reduction to prior principles, or counter-example analysis showing that omitting either component blocks recovery of the finding. This makes the central claim difficult to assess beyond the stated intuition.
[Instantiation against EU AI Act obligations] In the EU AI Act instantiations, the argument that generic provenance structures alone cannot establish the relevant findings is asserted at the level of the criterion but does not include a concrete mapping of a specific obligation (e.g., data-boundary crossing or authority validity) to the exact typing-plus-relation pair that would be required, leaving the insufficiency claim at a high level of generality.

minor comments (2)

The manuscript references companion materials and an experiment protocol on Zenodo but does not include a DOI or direct citation that would allow immediate retrieval.
Notation for the criterion itself (the conjunction of typing and relation) is described in prose but not given a compact symbolic form that could be referenced in later sections or instantiations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The two major comments identify areas where the presentation of the evidentiary-adequacy criterion and its instantiations can be strengthened. We address each below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Definition of the evidentiary-adequacy criterion (early sections)] The necessity claim (both typing and the specific relation are required) is introduced directly from the semantics of 'answering' a determination without an explicit derivation, reduction to prior principles, or counter-example analysis showing that omitting either component blocks recovery of the finding. This makes the central claim difficult to assess beyond the stated intuition.

Authors: The criterion is derived from the logical structure of binary findings of fact, which require both categorization of events and the relational facts on which the finding turns. However, we accept that an explicit derivation and counter-example analysis would improve assessability. In the revised manuscript, we will insert a new subsection immediately following the criterion definition that provides a step-by-step reduction from the semantics of legal determinations and includes counterexamples demonstrating that the absence of either typing or the required relation prevents recovery of the finding. revision: yes
Referee: [Instantiation against EU AI Act obligations] In the EU AI Act instantiations, the argument that generic provenance structures alone cannot establish the relevant findings is asserted at the level of the criterion but does not include a concrete mapping of a specific obligation (e.g., data-boundary crossing or authority validity) to the exact typing-plus-relation pair that would be required, leaving the insufficiency claim at a high level of generality.

Authors: We agree that greater concreteness would strengthen the instantiations. The revised version will expand the relevant section to include at least two detailed mappings: one for a data-boundary obligation specifying the required event typing and provenance relation, and one for an authority-validity obligation specifying the typing and temporal-validity relation. Each mapping will explicitly show why generic provenance structures fail to supply the necessary components. revision: yes

Circularity Check

0 steps flagged

No significant circularity; criterion is a standalone semantic necessity claim

full rationale

The paper introduces an evidentiary-adequacy criterion as a necessity statement grounded in the semantics of what it means for a runtime record to 'answer' a binary finding of fact about events and relations. This is not derived from prior equations, fitted parameters, or self-referential definitions within the paper. The instantiations against EU AI Act obligations and relations to external concepts (requisite variety, Good Regulator Theorem, trace/hyperproperty distinction) supply independent conceptual support rather than reducing the central claim to its own inputs. No load-bearing self-citations, ansatzes smuggled via citation, or renamings of known results are present. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the domain assumption that the relevant legal questions are binary findings of fact about events and relations, and introduces the evidentiary-adequacy criterion itself as the central new construct without independent empirical grounding in the abstract.

axioms (1)

domain assumption Legal oversight determinations of interest are binary findings of fact about specific events and relations such as data boundary crossing or authority validity.
The criterion is explicitly scoped to this bounded class in the abstract.

invented entities (1)

evidentiary-adequacy criterion no independent evidence
purpose: To specify the necessary conditions on runtime records for recovering legal findings of fact.
Newly defined construct in the report; no independent falsifiable handle provided in abstract.

pith-pipeline@v0.9.1-grok · 5741 in / 1367 out tokens · 20386 ms · 2026-07-02T06:07:06.958375+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 8 canonical work pages · 2 internal anchors

[1]

(2025).Control Inversion: Why the Superintelligent AI Agents We Are Racing to Create Would Absorb Power, Not Grant It

Aguirre, A. (2025).Control Inversion: Why the Superintelligent AI Agents We Are Racing to Create Would Absorb Power, Not Grant It. Future of Life Institute. https://control-i nversion.ai/

2025
[2]

Anderson, J. P. (1972).Computer Security Technology Planning Study. ESD-TR-73-51. United States Air Force Electronic Systems Division

1972
[3]

and Schön, D

Argyris, C. and Schön, D. A. (1978).Organizational Learning: A Theory of Action Perspec- tive. Addison-Wesley

1978
[4]

Ashby, W. R. (1956).An Introduction to Cybernetics. Chapman & Hall

1956
[5]

(1979).The Heart of Enterprise

Beer, S. (1979).The Heart of Enterprise. John Wiley & Sons

1979
[6]

and Burden, J

Chiodo, M., Müller, D., Siewert, P., Wetherall, J.-L., Yasmine, Z. and Burden, J. (2026). Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal- Moral Responsibility. Preprint

2026
[7]

Clarkson, M. R. and Schneider, F. B. (2010). Hyperproperties.Journal of Computer Security, 18(6), 1157–1210

2010
[8]

Conant, R. C. and Ashby, W. R. (1970). Every good regulator of a system must be a model of that system.International Journal of Systems Science, 1(2), 89–97

1970
[9]

Espejo, R. (2001). Auditing as a trust creation process.Systemic Practice and Action Research, 14(2), 215–236

2001
[10]

Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)

European Commission (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). OJ L, 2024/1689

2024
[11]

An agile Digital Rulebook for the EU and Digital Omnibus on AI

European Commission (2026). An agile Digital Rulebook for the EU and Digital Omnibus on AI. Shaping Europe’s Digital Future.https://digital-strategy.ec.europa.eu/en/ policies/digital-rulebook

2026
[12]

Janssen, J. (2026a). From Battlefield to Boardroom: Strategic Red Teaming as an Epistemic Governance Instrument in the Age of AI. Working paper. SSRN.https://papers.ssrn. com/sol3/papers.cfm?abstract_id=6860020
[13]

Janssen, J. (2026b). A Supervisory-Evidence Ontology for Agentic AI under EU Law: Candidate Minimum Conceptual Set and Temporal Extension. Working paper. Zenodo. DOI: 10.5281/zenodo.19758441

work page doi:10.5281/zenodo.19758441
[14]

Janssen, J. (2026c). From Record to Finding: Why Tamper-Proof Logs Cannot Establish Legal Oversight of Agentic AI. Working paper. Zenodo. DOI: 10.5281/zenodo.21025237

work page doi:10.5281/zenodo.21025237
[15]

AI Agents Under EU Law

Nannini, L., Leon Smith, A., Maggini, M. J., Panai, E., Feliciano, S., Tiulkanov, A., Maran, E., Gealy, J. and Bisconti, P. (2026). AI Agents Under EU Law: A Compliance Architecture for AI Providers. Preprint, arXiv:2604.04604

work page internal anchor Pith review Pith/arXiv arXiv 2026
[16]

(2007).Organized Uncertainty: Designing a World of Risk Management

Power, M. (2007).Organized Uncertainty: Designing a World of Risk Management. Oxford University Press

2007
[17]

Saltzer, J. H. and Schroeder, M. D. (1975). The protection of information in computer systems.Proceedings of the IEEE, 63(9), 1278–1308. 11

1975
[18]

Schneider, F. B. (2000). Enforceable security policies.ACM Transactions on Information and System Security, 3(1), 30–50

2000
[19]

and Bonakdarpour, B

Stucki, S., Sánchez, C., Schneider, G. and Bonakdarpour, B. (2019). Gray-box monitoring of hyperproperties. InFormal Methods - The Next 30 Years, LNCS 11800, 406–424. DOI: 10.1007/978-3-030-30942-8_25

work page doi:10.1007/978-3-030-30942-8_25 2019
[20]

Thobani, I. (2024). A triviality worry for the internal model principle.Synthese, 204(1), article 36. DOI: 10.1007/s11229-024-04693-x

work page doi:10.1007/s11229-024-04693-x 2024
[21]

and Lloyd, S

Touchette, H. and Lloyd, S. (2000). Information-theoretic limits of control.Physical Review Letters, 84(6), 1156–1159

2000
[22]

Good Regulator Theorem

Virgo, N., Biehl, M., Baltieri, M. and Capucci, M. (2025). A “Good Regulator Theorem” for embodied agents. Preprint, arXiv:2508.06326

work page arXiv 2025
[23]

MI9 — agent intelligence protocol: Runtime governance for agentic AI systems,

Wang, C. L., Singhal, T., Kelkar, A. and Tuo, J. (2025). MI9: An integrated runtime governance framework for agentic AI. Preprint, arXiv:2508.03858

work page arXiv 2025
[24]

Wang, H., Poskitt, C. M. and Sun, J. (2025). AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents. Preprint, arXiv:2503.18666. 12

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

(2025).Control Inversion: Why the Superintelligent AI Agents We Are Racing to Create Would Absorb Power, Not Grant It

Aguirre, A. (2025).Control Inversion: Why the Superintelligent AI Agents We Are Racing to Create Would Absorb Power, Not Grant It. Future of Life Institute. https://control-i nversion.ai/

2025

[2] [2]

Anderson, J. P. (1972).Computer Security Technology Planning Study. ESD-TR-73-51. United States Air Force Electronic Systems Division

1972

[3] [3]

and Schön, D

Argyris, C. and Schön, D. A. (1978).Organizational Learning: A Theory of Action Perspec- tive. Addison-Wesley

1978

[4] [4]

Ashby, W. R. (1956).An Introduction to Cybernetics. Chapman & Hall

1956

[5] [5]

(1979).The Heart of Enterprise

Beer, S. (1979).The Heart of Enterprise. John Wiley & Sons

1979

[6] [6]

and Burden, J

Chiodo, M., Müller, D., Siewert, P., Wetherall, J.-L., Yasmine, Z. and Burden, J. (2026). Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal- Moral Responsibility. Preprint

2026

[7] [7]

Clarkson, M. R. and Schneider, F. B. (2010). Hyperproperties.Journal of Computer Security, 18(6), 1157–1210

2010

[8] [8]

Conant, R. C. and Ashby, W. R. (1970). Every good regulator of a system must be a model of that system.International Journal of Systems Science, 1(2), 89–97

1970

[9] [9]

Espejo, R. (2001). Auditing as a trust creation process.Systemic Practice and Action Research, 14(2), 215–236

2001

[10] [10]

Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)

European Commission (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). OJ L, 2024/1689

2024

[11] [11]

An agile Digital Rulebook for the EU and Digital Omnibus on AI

European Commission (2026). An agile Digital Rulebook for the EU and Digital Omnibus on AI. Shaping Europe’s Digital Future.https://digital-strategy.ec.europa.eu/en/ policies/digital-rulebook

2026

[12] [12]

Janssen, J. (2026a). From Battlefield to Boardroom: Strategic Red Teaming as an Epistemic Governance Instrument in the Age of AI. Working paper. SSRN.https://papers.ssrn. com/sol3/papers.cfm?abstract_id=6860020

[13] [13]

Janssen, J. (2026b). A Supervisory-Evidence Ontology for Agentic AI under EU Law: Candidate Minimum Conceptual Set and Temporal Extension. Working paper. Zenodo. DOI: 10.5281/zenodo.19758441

work page doi:10.5281/zenodo.19758441

[14] [14]

Janssen, J. (2026c). From Record to Finding: Why Tamper-Proof Logs Cannot Establish Legal Oversight of Agentic AI. Working paper. Zenodo. DOI: 10.5281/zenodo.21025237

work page doi:10.5281/zenodo.21025237

[15] [15]

AI Agents Under EU Law

Nannini, L., Leon Smith, A., Maggini, M. J., Panai, E., Feliciano, S., Tiulkanov, A., Maran, E., Gealy, J. and Bisconti, P. (2026). AI Agents Under EU Law: A Compliance Architecture for AI Providers. Preprint, arXiv:2604.04604

work page internal anchor Pith review Pith/arXiv arXiv 2026

[16] [16]

(2007).Organized Uncertainty: Designing a World of Risk Management

Power, M. (2007).Organized Uncertainty: Designing a World of Risk Management. Oxford University Press

2007

[17] [17]

Saltzer, J. H. and Schroeder, M. D. (1975). The protection of information in computer systems.Proceedings of the IEEE, 63(9), 1278–1308. 11

1975

[18] [18]

Schneider, F. B. (2000). Enforceable security policies.ACM Transactions on Information and System Security, 3(1), 30–50

2000

[19] [19]

and Bonakdarpour, B

Stucki, S., Sánchez, C., Schneider, G. and Bonakdarpour, B. (2019). Gray-box monitoring of hyperproperties. InFormal Methods - The Next 30 Years, LNCS 11800, 406–424. DOI: 10.1007/978-3-030-30942-8_25

work page doi:10.1007/978-3-030-30942-8_25 2019

[20] [20]

Thobani, I. (2024). A triviality worry for the internal model principle.Synthese, 204(1), article 36. DOI: 10.1007/s11229-024-04693-x

work page doi:10.1007/s11229-024-04693-x 2024

[21] [21]

and Lloyd, S

Touchette, H. and Lloyd, S. (2000). Information-theoretic limits of control.Physical Review Letters, 84(6), 1156–1159

2000

[22] [22]

Good Regulator Theorem

Virgo, N., Biehl, M., Baltieri, M. and Capucci, M. (2025). A “Good Regulator Theorem” for embodied agents. Preprint, arXiv:2508.06326

work page arXiv 2025

[23] [23]

MI9 — agent intelligence protocol: Runtime governance for agentic AI systems,

Wang, C. L., Singhal, T., Kelkar, A. and Tuo, J. (2025). MI9: An integrated runtime governance framework for agentic AI. Preprint, arXiv:2508.03858

work page arXiv 2025

[24] [24]

Wang, H., Poskitt, C. M. and Sun, J. (2025). AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents. Preprint, arXiv:2503.18666. 12

work page internal anchor Pith review Pith/arXiv arXiv 2025