From Runtime Records to Legal Findings: An Evidentiary-Adequacy Criterion for Agentic AI Oversight
Pith reviewed 2026-07-02 06:07 UTC · model grok-4.3
The pith
A runtime record supports a binary legal finding about events only if it supplies both event typing to the legal category and the specific relation the finding depends on.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A runtime record can answer a binary finding of fact about specific events and their relations only if it carries both a typing that maps recorded events to the legally operative category and the relation, such as provenance, authority, derivation, or temporal validity, on which the determination's truth depends.
What carries the argument
The evidentiary-adequacy criterion, which requires both typing of events to legal categories and the specific relation on which a binary factual finding depends.
If this is right
- Tamper-proof logs by themselves cannot establish the relevant findings.
- Generic process frameworks cannot establish the relevant findings.
- Provenance structures alone cannot establish the relevant findings.
- The criterion must be met for selected EU AI Act oversight obligations to be satisfied from runtime records.
- The requirement aligns with the trace-versus-hyperproperty boundary in runtime verification.
Where Pith is reading between the lines
- Record-generation mechanisms may need explicit fields for both typing and the relevant relations to meet oversight needs.
- Tools for checking runtime records could be extended to verify the presence of the two required elements rather than integrity alone.
- The same dual requirement may surface in oversight settings outside the EU AI Act when binary factual questions about events arise.
Load-bearing premise
The legal determinations in question are limited to a bounded class of binary findings of fact about specific events and their relations.
What would settle it
A concrete runtime record that answers one of the listed binary findings of fact without containing either the required typing or the relation the finding depends upon.
read the original abstract
Agentic AI systems generate runtime records, logs, traces, and audit artefacts, but the existence or integrity of such records does not by itself establish that legally operative oversight findings can be recovered from them. This technical report defines an evidentiary-adequacy criterion for a bounded class of determinations: binary findings of fact about specific events and their relations, such as whether protected data crossed a boundary, whether a human could intervene, whether an information barrier held, or whether delegated authority was valid at the moment of use. The criterion states that a runtime record can answer such a determination only if it carries both a typing that maps recorded events to the legally operative category and the relation, such as provenance, authority, derivation, or temporal validity, on which the determination's truth depends. The claim is one of necessity, not sufficiency. The report instantiates the criterion against selected EU AI Act oversight obligations and explains why tamper-proof logs, generic process frameworks, and provenance structures alone cannot establish the relevant findings. It further relates the argument to requisite variety, the Good Regulator Theorem, and the trace-versus-hyperproperty boundary of runtime verification. Companion materials and the experiment protocol are archived on Zenodo.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines an evidentiary-adequacy criterion for a bounded class of binary legal findings of fact about specific events and relations in agentic AI systems. The criterion states that a runtime record answers such a finding only if it supplies both a typing that maps recorded events to the legally operative category and the relation (e.g., provenance, authority, derivation, or temporal validity) on which the finding's truth depends. The report instantiates the criterion against selected EU AI Act oversight obligations, argues that tamper-proof logs, generic provenance, and process frameworks are insufficient by themselves, and connects the argument to requisite variety, the Good Regulator Theorem, and the trace-versus-hyperproperty distinction in runtime verification.
Significance. If the criterion holds, it supplies a precise conceptual filter for evaluating whether AI runtime artifacts can support legal determinations, exposing gaps in current logging and audit practices. The explicit links to Ashby's requisite variety and the Good Regulator Theorem, together with the trace/hyperproperty distinction, provide independent conceptual grounding rather than purely ad-hoc stipulation. The bounded scoping to binary findings of fact about events and relations keeps the claim falsifiable in principle.
major comments (2)
- [Definition of the evidentiary-adequacy criterion (early sections)] The necessity claim (both typing and the specific relation are required) is introduced directly from the semantics of 'answering' a determination without an explicit derivation, reduction to prior principles, or counter-example analysis showing that omitting either component blocks recovery of the finding. This makes the central claim difficult to assess beyond the stated intuition.
- [Instantiation against EU AI Act obligations] In the EU AI Act instantiations, the argument that generic provenance structures alone cannot establish the relevant findings is asserted at the level of the criterion but does not include a concrete mapping of a specific obligation (e.g., data-boundary crossing or authority validity) to the exact typing-plus-relation pair that would be required, leaving the insufficiency claim at a high level of generality.
minor comments (2)
- The manuscript references companion materials and an experiment protocol on Zenodo but does not include a DOI or direct citation that would allow immediate retrieval.
- Notation for the criterion itself (the conjunction of typing and relation) is described in prose but not given a compact symbolic form that could be referenced in later sections or instantiations.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. The two major comments identify areas where the presentation of the evidentiary-adequacy criterion and its instantiations can be strengthened. We address each below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Definition of the evidentiary-adequacy criterion (early sections)] The necessity claim (both typing and the specific relation are required) is introduced directly from the semantics of 'answering' a determination without an explicit derivation, reduction to prior principles, or counter-example analysis showing that omitting either component blocks recovery of the finding. This makes the central claim difficult to assess beyond the stated intuition.
Authors: The criterion is derived from the logical structure of binary findings of fact, which require both categorization of events and the relational facts on which the finding turns. However, we accept that an explicit derivation and counter-example analysis would improve assessability. In the revised manuscript, we will insert a new subsection immediately following the criterion definition that provides a step-by-step reduction from the semantics of legal determinations and includes counterexamples demonstrating that the absence of either typing or the required relation prevents recovery of the finding. revision: yes
-
Referee: [Instantiation against EU AI Act obligations] In the EU AI Act instantiations, the argument that generic provenance structures alone cannot establish the relevant findings is asserted at the level of the criterion but does not include a concrete mapping of a specific obligation (e.g., data-boundary crossing or authority validity) to the exact typing-plus-relation pair that would be required, leaving the insufficiency claim at a high level of generality.
Authors: We agree that greater concreteness would strengthen the instantiations. The revised version will expand the relevant section to include at least two detailed mappings: one for a data-boundary obligation specifying the required event typing and provenance relation, and one for an authority-validity obligation specifying the typing and temporal-validity relation. Each mapping will explicitly show why generic provenance structures fail to supply the necessary components. revision: yes
Circularity Check
No significant circularity; criterion is a standalone semantic necessity claim
full rationale
The paper introduces an evidentiary-adequacy criterion as a necessity statement grounded in the semantics of what it means for a runtime record to 'answer' a binary finding of fact about events and relations. This is not derived from prior equations, fitted parameters, or self-referential definitions within the paper. The instantiations against EU AI Act obligations and relations to external concepts (requisite variety, Good Regulator Theorem, trace/hyperproperty distinction) supply independent conceptual support rather than reducing the central claim to its own inputs. No load-bearing self-citations, ansatzes smuggled via citation, or renamings of known results are present. The derivation is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Legal oversight determinations of interest are binary findings of fact about specific events and relations such as data boundary crossing or authority validity.
invented entities (1)
-
evidentiary-adequacy criterion
no independent evidence
Reference graph
Works this paper leans on
-
[1]
(2025).Control Inversion: Why the Superintelligent AI Agents We Are Racing to Create Would Absorb Power, Not Grant It
Aguirre, A. (2025).Control Inversion: Why the Superintelligent AI Agents We Are Racing to Create Would Absorb Power, Not Grant It. Future of Life Institute. https://control-i nversion.ai/
2025
-
[2]
Anderson, J. P. (1972).Computer Security Technology Planning Study. ESD-TR-73-51. United States Air Force Electronic Systems Division
1972
-
[3]
and Schön, D
Argyris, C. and Schön, D. A. (1978).Organizational Learning: A Theory of Action Perspec- tive. Addison-Wesley
1978
-
[4]
Ashby, W. R. (1956).An Introduction to Cybernetics. Chapman & Hall
1956
-
[5]
(1979).The Heart of Enterprise
Beer, S. (1979).The Heart of Enterprise. John Wiley & Sons
1979
-
[6]
and Burden, J
Chiodo, M., Müller, D., Siewert, P., Wetherall, J.-L., Yasmine, Z. and Burden, J. (2026). Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal- Moral Responsibility. Preprint
2026
-
[7]
Clarkson, M. R. and Schneider, F. B. (2010). Hyperproperties.Journal of Computer Security, 18(6), 1157–1210
2010
-
[8]
Conant, R. C. and Ashby, W. R. (1970). Every good regulator of a system must be a model of that system.International Journal of Systems Science, 1(2), 89–97
1970
-
[9]
Espejo, R. (2001). Auditing as a trust creation process.Systemic Practice and Action Research, 14(2), 215–236
2001
-
[10]
Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)
European Commission (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). OJ L, 2024/1689
2024
-
[11]
An agile Digital Rulebook for the EU and Digital Omnibus on AI
European Commission (2026). An agile Digital Rulebook for the EU and Digital Omnibus on AI. Shaping Europe’s Digital Future.https://digital-strategy.ec.europa.eu/en/ policies/digital-rulebook
2026
-
[12]
Janssen, J. (2026a). From Battlefield to Boardroom: Strategic Red Teaming as an Epistemic Governance Instrument in the Age of AI. Working paper. SSRN.https://papers.ssrn. com/sol3/papers.cfm?abstract_id=6860020
-
[13]
Janssen, J. (2026b). A Supervisory-Evidence Ontology for Agentic AI under EU Law: Candidate Minimum Conceptual Set and Temporal Extension. Working paper. Zenodo. DOI: 10.5281/zenodo.19758441
-
[14]
Janssen, J. (2026c). From Record to Finding: Why Tamper-Proof Logs Cannot Establish Legal Oversight of Agentic AI. Working paper. Zenodo. DOI: 10.5281/zenodo.21025237
-
[15]
Nannini, L., Leon Smith, A., Maggini, M. J., Panai, E., Feliciano, S., Tiulkanov, A., Maran, E., Gealy, J. and Bisconti, P. (2026). AI Agents Under EU Law: A Compliance Architecture for AI Providers. Preprint, arXiv:2604.04604
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
(2007).Organized Uncertainty: Designing a World of Risk Management
Power, M. (2007).Organized Uncertainty: Designing a World of Risk Management. Oxford University Press
2007
-
[17]
Saltzer, J. H. and Schroeder, M. D. (1975). The protection of information in computer systems.Proceedings of the IEEE, 63(9), 1278–1308. 11
1975
-
[18]
Schneider, F. B. (2000). Enforceable security policies.ACM Transactions on Information and System Security, 3(1), 30–50
2000
-
[19]
Stucki, S., Sánchez, C., Schneider, G. and Bonakdarpour, B. (2019). Gray-box monitoring of hyperproperties. InFormal Methods - The Next 30 Years, LNCS 11800, 406–424. DOI: 10.1007/978-3-030-30942-8_25
-
[20]
Thobani, I. (2024). A triviality worry for the internal model principle.Synthese, 204(1), article 36. DOI: 10.1007/s11229-024-04693-x
-
[21]
and Lloyd, S
Touchette, H. and Lloyd, S. (2000). Information-theoretic limits of control.Physical Review Letters, 84(6), 1156–1159
2000
-
[22]
Virgo, N., Biehl, M., Baltieri, M. and Capucci, M. (2025). A “Good Regulator Theorem” for embodied agents. Preprint, arXiv:2508.06326
-
[23]
MI9 — agent intelligence protocol: Runtime governance for agentic AI systems,
Wang, C. L., Singhal, T., Kelkar, A. and Tuo, J. (2025). MI9: An integrated runtime governance framework for agentic AI. Preprint, arXiv:2508.03858
-
[24]
Wang, H., Poskitt, C. M. and Sun, J. (2025). AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents. Preprint, arXiv:2503.18666. 12
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.