arxiv: 2604.05485 · v1 · submitted 2026-04-07 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Auditable Agents

Yi Nian , Aojie Yuan , Haiyue Zhang , Jiate Li , Yue Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:13 UTC · model grok-4.3

classification 💻 cs.AI

keywords agent auditabilityLLM agentsaccountabilityresponsibility attributionevidence integritytamper-evident recordsmechanism classesAuditability Card

0 comments

The pith

No agent system can be accountable without auditability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that once LLM agents can act in the world, accountability requires the system property of auditability so that behavior can be reconstructed from trustworthy evidence after the fact. It defines auditability through five concrete dimensions: action recoverability, lifecycle coverage, policy checkability, responsibility attribution, and evidence integrity. Three mechanism classes—detect, enforce, and recover—are shown to be jointly necessary because their different temporal constraints mean no one class alone can cover the full set of requirements. Supporting evidence includes widespread security gaps in existing open-source agents, low runtime cost for pre-execution mediation with tamper-evident records, and partial recovery of responsibility-relevant information even when standard logs are absent. The work introduces an Auditability Card and leaves six open research problems organized by mechanism class.

Core claim

No agent system can be accountable without auditability. Accountability is the ability to determine compliance and assign responsibility after deployment; auditability is the system property that makes this possible by enabling reconstruction of behavior from trustworthy evidence. The authors operationalize auditability via five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and three mechanism classes (detect, enforce, recover) whose differing temporal information-and-intervention constraints explain why any single class is insufficient in practice.

What carries the argument

The five dimensions of auditability together with the three mechanism classes (detect, enforce, recover) whose temporal constraints make all three necessary.

If this is right

Agent systems must support post-deployment recovery of actions and their effects even when conventional logs are incomplete or missing.
Pre-execution mediation combined with tamper-evident records can be added to agents with only modest runtime cost.
Responsibility attribution becomes feasible only when evidence integrity is preserved across the agent's full lifecycle.
Security prerequisites for auditability remain unmet in most current open-source agent frameworks.
An Auditability Card provides a structured way to evaluate and compare different agent systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Regulatory or deployment standards for agents would need to require evidence of all five dimensions rather than just logging.
Recovery experiments suggest that even partial audit trails can still support useful responsibility attribution in controlled settings.
The temporal mismatch between detect, enforce, and recover mechanisms implies that hybrid architectures will be required for production agents.
Open problems listed by mechanism class point to concrete next steps for tool-use, database-query, and side-effect agents.

Load-bearing premise

The five dimensions and three mechanism classes are both necessary and sufficient to make auditability operational, with temporal constraints explaining why no single mechanism class works alone.

What would settle it

Demonstrate an agent system that achieves accountability—i.e., can reliably determine compliance and assign responsibility after actions—while lacking at least one of the five dimensions or relying on only one mechanism class.

Figures

Figures reproduced from arXiv: 2604.05485 by Aojie Yuan, Haiyue Zhang, Jiate Li, Yi Nian, Yue Zhao.

**Figure 2.** Figure 2: Information-and-intervention asymmetry across the three mechanism classes. Each box [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Security findings across 6 open-source agent projects by OWASP Agentic category (617 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Recovery frontier: IET vs. best baseline (MAMA, 4–6 agents). Y-axis is zoomed into [0.7, 1.0] to highlight differences. The dimensional interpretation is deliberately narrow: • Responsibility Attribution ( ): token attribution and topology recovery let an auditor infer which component produced which portion of the output and how components interacted. • Action Recoverability (G#): segment overlap suggest… view at source ↗

read the original abstract

LLM agents call tools, query databases, delegate tasks, and trigger external side effects. Once an agent system can act in the world, the question is no longer only whether harmful actions can be prevented--it is whether those actions remain answerable after deployment. We distinguish accountability (the ability to determine compliance and assign responsibility), auditability (the system property that makes accountability possible), and auditing (the process of reconstructing behavior from trustworthy evidence). Our claim is direct: no agent system can be accountable without auditability. To make this operational, we define five dimensions of agent auditability, i.e., action recoverability, lifecycle coverage, policy checkability, responsibility attribution, and evidence integrity, and identify three mechanism classes (detect, enforce, recover) whose temporal information-and-intervention constraints explain why, in practice, no single approach suffices. We support the position with layered evidence rather than a single benchmark: lower-bound ecosystem measurements suggest that even basic security prerequisites for auditability are widely unmet (617 security findings across six prominent open-source projects); runtime feasibility results show that pre-execution mediation with tamper-evident records adds only 8.3 ms median overhead; and controlled recovery experiments show that responsibility-relevant information can be partially recovered even when conventional logs are missing. We propose an Auditability Card for agent systems and identify six open research problems organized by mechanism class.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames auditability as essential for accountable LLM agents and introduces five dimensions plus a detect-enforce-recover taxonomy with some practical measurements.

read the letter

The main point is that LLM agents with side effects cannot be accountable without auditability as a built-in property, and the authors break this into five concrete dimensions while classifying mechanisms by their timing constraints. What is new is the specific list—action recoverability, lifecycle coverage, policy checkability, responsibility attribution, and evidence integrity—along with the argument that detect, enforce, and recover approaches each have limits that make all three necessary in practice. They do well by backing the position with layered checks: an ecosystem scan turning up 617 security findings in open-source agent projects, a runtime test showing 8.3 ms median overhead for tamper-evident mediation, and recovery experiments that pull out some responsibility-relevant details even from incomplete logs. The Auditability Card proposal gives a usable output for system builders. The softer spots are that the measurements appear as summary numbers without full methods, error analysis, or raw data, so their robustness is hard to judge from the description alone. The claim that the five dimensions and three classes are necessary and sufficient stays conceptual, resting on definitions and temporal logic rather than formal counterexamples or exhaustive validation. This work is for researchers and engineers focused on agent safety, deployment, and liability questions. Readers who need a structured way to evaluate or design auditable agent systems will get value from the framework and the six open problems. It deserves peer review to tighten the empirical sections and test the completeness of the dimensions.

Referee Report

3 major / 4 minor

Summary. The manuscript argues that LLM-based agent systems, which interact with external tools and produce side effects, require auditability to enable accountability (determining compliance and assigning responsibility). It distinguishes auditability as the enabling system property and auditing as the reconstruction process, claiming directly that no agent system can be accountable without auditability. To operationalize this, the authors define five dimensions of agent auditability (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and three mechanism classes (detect, enforce, recover) whose temporal constraints explain why no single class suffices in practice. Supporting evidence includes ecosystem measurements (617 security findings across six open-source projects), runtime tests showing 8.3 ms median overhead for pre-execution mediation with tamper-evident records, and controlled recovery experiments demonstrating partial recovery of responsibility-relevant information. The paper proposes an Auditability Card for agent systems and identifies six open research problems organized by mechanism class.

Significance. If the definitional framework and operationalization hold, this work provides a structured lens for designing accountable autonomous agents, an area of growing importance as LLM agents move beyond simulation to real-world actions. The layered evidence—highlighting widespread security gaps while showing low-overhead feasibility—strengthens the position beyond pure theory and could inform standards or evaluation practices. The Auditability Card and open problems offer concrete next steps for the community.

major comments (3)

[Definitions and dimensions section] The central claim that the five dimensions are both necessary and sufficient to operationalize auditability (and thus accountability) is asserted in the definitions section but lacks a formal justification, counterexample analysis, or demonstration that omitting any dimension renders accountability impossible. This assumption underpins the entire operationalization and the subsequent mechanism discussion.
[Mechanism classes section] The argument that temporal information-and-intervention constraints make no single mechanism class (detect, enforce, recover) sufficient is presented conceptually but would benefit from a more explicit timeline or concrete scenario per class showing the insufficiency, as this is used to motivate the multi-class approach.
[Ecosystem measurements subsection] The ecosystem measurements report 617 security findings as a lower bound on unmet auditability prerequisites, but the main text provides only summary statistics without detailing the auditing methodology, categorization against the five dimensions, or inter-rater reliability, limiting verification of how these findings directly support the necessity claim.

minor comments (4)

[Runtime feasibility and recovery experiments] The runtime overhead result (8.3 ms median) and recovery experiment outcomes are presented without error bars, sample sizes, or statistical tests; adding these would improve reproducibility and clarity of the feasibility claims.
[Auditability Card proposal] The Auditability Card is proposed but no template, example instantiation, or scoring rubric is provided in the text or appendix, making it difficult for readers to apply immediately.
[Related work] Related work on auditing and accountability in AI systems (e.g., prior work on explainability, logging in multi-agent systems) is referenced only lightly; expanding this section would better situate the contribution.
[Mechanism classes] Notation for the three mechanism classes could be made more consistent (e.g., using a table or diagram) to aid comparison of their temporal constraints.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive review and recommendation of minor revision. We address each major comment below with targeted clarifications and will update the manuscript accordingly to strengthen verifiability while preserving the core conceptual framework.

read point-by-point responses

Referee: [Definitions and dimensions section] The central claim that the five dimensions are both necessary and sufficient to operationalize auditability (and thus accountability) is asserted in the definitions section but lacks a formal justification, counterexample analysis, or demonstration that omitting any dimension renders accountability impossible. This assumption underpins the entire operationalization and the subsequent mechanism discussion.

Authors: The dimensions were derived directly from the information requirements of accountability (determining compliance and assigning responsibility) in systems with external side effects. We will add a dedicated paragraph to the definitions section providing a necessity argument via illustrative counterexamples: e.g., absent action recoverability, post-execution side effects cannot be traced to the agent; absent evidence integrity, logs can be tampered to evade responsibility attribution. While we do not claim a formal axiomatic proof (the domain lacks such formalization), these mappings demonstrate why each dimension is required for accountability to be possible. Sufficiency is framed as necessary but not always sufficient in isolation, consistent with the multi-class mechanism discussion. revision: yes
Referee: [Mechanism classes section] The argument that temporal information-and-intervention constraints make no single mechanism class (detect, enforce, recover) sufficient is presented conceptually but would benefit from a more explicit timeline or concrete scenario per class showing the insufficiency, as this is used to motivate the multi-class approach.

Authors: We agree an explicit illustration would improve clarity. In revision we will add a timeline figure in the mechanism classes section depicting the full lifecycle of an agent action (intent formulation, tool invocation, side-effect execution, post-hoc review). Annotations will show phase-specific failures: detect mechanisms cannot intervene pre-action or recover deleted evidence; enforce mechanisms lack post-facto reconstruction; recover mechanisms cannot prevent harm. A concrete running example (LLM agent performing a database update with external API side effects) will demonstrate why temporal gaps necessitate combining classes. revision: yes
Referee: [Ecosystem measurements subsection] The ecosystem measurements report 617 security findings as a lower bound on unmet auditability prerequisites, but the main text provides only summary statistics without detailing the auditing methodology, categorization against the five dimensions, or inter-rater reliability, limiting verification of how these findings directly support the necessity claim.

Authors: The measurements used automated scanners on the six projects followed by author-led manual categorization against the dimensions. To enable verification we will expand the subsection with a concise methods description (scanners employed, categorization criteria per dimension, and note on author consensus process) while moving per-finding tables to the appendix. This will explicitly link findings (e.g., missing logging for evidence integrity) to the necessity argument without altering the reported 617 count or lower-bound framing. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's core claim—that no agent system can be accountable without auditability—is presented as a direct consequence of the provided definitions of accountability (ability to determine compliance and assign responsibility) and auditability (the enabling system property). The five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and three mechanism classes (detect, enforce, recover) are explicitly introduced as an operationalization to make the distinction actionable, with temporal constraints as a pragmatic explanation. The supporting evidence includes ecosystem measurements (617 security findings), runtime overhead (8.3 ms), and recovery experiments, which are independent empirical observations rather than quantities fitted to or defined by the claim itself. No equations, self-referential derivations, or load-bearing self-citations are evident in the derivation chain. The position is conceptual and definitional, supported by illustrative data, without reducing to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on domain assumptions about the relationship between evidence and responsibility rather than new mathematical axioms or fitted parameters.

axioms (1)

domain assumption Accountability requires the ability to reconstruct behavior from trustworthy evidence after the fact
This premise underpins the direct claim that auditability is necessary for accountability.

invented entities (1)

Five dimensions of agent auditability no independent evidence
purpose: To operationalize the abstract property of auditability into measurable system requirements
Newly defined constructs (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) with no independent evidence provided outside the paper.

pith-pipeline@v0.9.0 · 5541 in / 1244 out tokens · 71071 ms · 2026-05-10T19:13:50.461443+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define five dimensions of agent auditability... and identify three mechanism classes (detect, enforce, recover) whose temporal information-and-intervention constraints explain why, in practice, no single approach suffices.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Evidence Integrity is foundational... Level 3 (signed): entries or batches are digitally signed

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cat-DPO: Category-Adaptive Safety Alignment
cs.CL 2026-04 unverdicted novelty 6.0

Cat-DPO applies per-category adaptive safety margins during direct preference optimization to reduce variance in safety across harm categories.

Reference graph

Works this paper leans on

28 extracted references · 24 canonical work pages · cited by 1 Pith paper · 9 internal anchors

[1]

Barke et al

Shraddha Barke, Arnav Goyal, Alind Khare, Avaljot Singh, Suman Nath, and Chetan Bansal. Agentrx: Diagnosing ai agent failures from execution trajectories.arXiv preprint arXiv:2602.02475,

work page arXiv
[2]

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung

doi: 10.1109/ SaTML59370.2024.00037. Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. Visibility into AI agents. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 958–973. ACM,

work page arXiv 2024
[3]

doi: 10.1145/3630106.3658948. Scott A. Crosby and Dan S. Wallach. Efficient data structures for tamper-evident logging. In Proceedings of the 18th USENIX Security Symposium, pages 317–334. USENIX Association,

work page doi:10.1145/3630106.3658948
[4]

Agentops: Enabling observability of llm agents

Liming Dong, Qinghua Lu, and Liming Zhu. AgentOps: Enabling observability of LLM agents. arXiv preprint arXiv:2411.05285,

work page arXiv
[5]

Gebru, J

doi: 10.1145/3458723. Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. MLAgentBench: Evaluating language agents on machine learning experimentation.arXiv preprint arXiv:2310.03302,

work page doi:10.1145/3458723
[6]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. Llama guard: Llm-based input-output safeguard for human-ai conversations.arXiv preprint arXiv:2312.06674,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et al

doi: 10.18653/v1/2025.acl-long.399. Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et al. Harmbench: a standardized evaluation framework for automated red teaming and robust refusal. InProceedings of the 41st International Conference on Machine Learning, pages 35181–35224,

work page doi:10.18653/v1/2025.acl-long.399 2025
[8]

In: Proceedings of the Conference on Fairness, Accountability, and Transparency

doi: 10.1145/3287560.3287596. Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, and Luciano Floridi. Auditing large lan- guage models: A three-layered approach.AI and Ethics, 4:1085–1115,

work page doi:10.1145/3287560.3287596
[9]

Zachary Newman, John Speed Meyers, and Santiago Torres-Arias

doi: 10.1007/ s43681-023-00289-2. Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for ev- erybody. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2353–2367. ACM,

2022
[10]

Yi Nian, Haosen Cao, Shenzhe Zhu, Henry Peng Zou, Qingqing Luan, and Yue Zhao

doi: 10.1145/3548606.3560596. Yi Nian, Haosen Cao, Shenzhe Zhu, Henry Peng Zou, Qingqing Luan, and Yue Zhao. When only the final text survives: Implicit execution tracing for multi-agent attribution.arXiv preprint arXiv:2603.17445,

work page doi:10.1145/3548606.3560596
[11]

17 Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier

doi: 10.48550/arXiv.2603.17445. 17 Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. Backstabber’s knife collection: A review of open source software supply chain attacks. InInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pages 23–43. Springer,

work page doi:10.48550/arxiv.2603.17445
[12]

(2026), Audit Trails for Accountability in Large Language Models, arXiv preprint arXiv:2601.20727

Victor Ojewale, Harini Suresh, and Suresh Venkatasubramanian. Audit trails for accountability in large language models.arXiv preprint arXiv:2601.20727,

work page arXiv
[13]

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior.arXiv preprint arXiv:2304.03442,

work page internal anchor Pith review arXiv
[14]

Gorilla: Large Language Model Connected with Massive APIs

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive APIs.arXiv preprint arXiv:2305.15334,

work page internal anchor Pith review arXiv
[15]

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. Fine-tuning aligned language models compromises safety, even when users do not intend to!arXiv preprint arXiv:2310.03693,

work page internal anchor Pith review arXiv
[16]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes

Inioluwa Deborah Raji, Andrew Smart, Rebecca N. White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 33–44. ACM,

2020
[18]

White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes

doi: 10.1145/3351095.3372873. Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. Identifying the risks of LM agents with an LM-emulated sandbox. InProceedings of the Twelfth International Conference on Learning Representations,

work page doi:10.1145/3351095.3372873
[19]

Authenticated delegation and authorized ai agents,

URL https: //arxiv.org/abs/2501.09674. Peter Steinberger and OpenClaw Contributors. OpenClaw: Your own personal AI assistant,

work page arXiv
[20]

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

URLhttps://github.com/openclaw/openclaw. Haoyu Wang, Christopher M. Poskitt, and Jun Sun. AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents.arXiv preprint arXiv:2503.18666,

work page internal anchor Pith review arXiv
[21]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. Executable code actions elicit better LLM agents. InInternational Conference on Machine Learning (ICML), 2024a. Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. OpenHands: An open plat...

work page internal anchor Pith review arXiv
[22]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. arXiv preprint arXiv:2405.15793,

work page internal anchor Pith review arXiv
[24]

Aegis: No tool call left unchecked – a pre-execution firewall and audit layer for ai agents,

doi: 10.48550/ arXiv.2603.12621. Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, and Gongshen Liu. R-Judge: Benchmarking safety risk awareness for LLM agents. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 1467–1490. Association for ...

work page arXiv 2024
[25]

Agent audit: A security analysis system for llm agent applications,

doi: 10.18653/ v1/2024.findings-emnlp.79. Haiyue Zhang, Yi Nian, and Yue Zhao. Agent audit: A security analysis system for LLM agent applications.arXiv preprint arXiv:2603.22853,

work page arXiv 2024
[26]

Agent audit: A security analysis system for llm agent applications,

doi: 10.48550/arXiv.2603.22853. Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, and Minlie Huang. Agent-SafetyBench: Evaluating the safety of LLM agents.arXiv preprint arXiv:2412.14470,

work page doi:10.48550/arxiv.2603.22853
[27]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043,

work page internal anchor Pith review Pith/arXiv arXiv
[28]

19 Supplementary Material for Auditable Agents A Appendix Overview This appendix provides the formal machinery underlying the five auditability dimensions (§B), recovery bounds (§B.8), and a supplementary platform-level security scan (§C). Full evidence protocols are documented in the respective tool papers: agent-audit [Zhang et al., 2026] for the ecosys...

2026
[29]

Why both metrics are needed.ACR and RF are not redundant

If |Srel|= 0 , both ACR and RF are 1 by convention: there is nothing to record and nothing to recover. Why both metrics are needed.ACR and RF are not redundant. A system can achieve ACR = 1 (every policy-relevant action appears in the record) while RF≈0 (each entry records only that a tool was called, omitting arguments, output, caller identity, and appro...

2026