Observability for Delegated Execution in Agentic AI Systems
Pith reviewed 2026-06-27 16:18 UTC · model grok-4.3
The pith
Standard audit logs and execution traces cannot distinguish delegation scope in agentic AI systems because the same traces can arise from incompatible delegation assignments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Delegation-scoped execution is not identifiable from standard observables because audit logs and execution traces can be identical under multiple incompatible delegation assignments; an agent-aware observability substrate consisting of a lightweight gateway and common information model binds delegation context at execution time and thereby enables reliable cross-tool reconstruction without heuristic time-window correlation.
What carries the argument
Agent-aware observability substrate (lightweight gateway plus common information model that binds delegation context at execution time).
If this is right
- Direct forensic queries become possible on delegation-scoped footprints instead of relying on post-hoc correlation.
- Reconstruction works across heterogeneous tools and systems once the common information model is adopted.
- Individual actions remain authorized and logged while the delegation assignment itself becomes attributable.
- The approach targets attribution and footprint reconstruction rather than intent or reasoning inference.
Where Pith is reading between the lines
- Security monitoring pipelines would need to adopt the common information model at the gateway layer to gain the reconstruction capability.
- The same substrate could support compliance reporting that distinguishes actions taken under different user or agent delegations.
- Performance overhead of the gateway becomes a practical limit on adoption in high-throughput agent deployments.
Load-bearing premise
Binding delegation context at execution time will enable reliable reconstruction without introducing new fragmentation or performance problems that defeat the purpose.
What would settle it
A set of identical execution traces generated under two different delegation assignments that the proposed gateway and model still cannot separate into unique delegation scopes.
read the original abstract
Delegation-scoped execution is not identifiable from standard observables: audit logs and execution traces can be identical under multiple incompatible delegation assignments. This gap is especially acute in LLM-based agentic systems, where agents dynamically select tools, vary execution sequences across runs for the same instruction, and spawn cooperating sub-agents. These dynamics fragment and interleave traces, making delegation-scoped reconstruction from causal structure alone structurally underdetermined. Although individual actions are authorized and logged, existing audit, tracing, and security schemas lack the semantics to reconstruct what actions occurred under a given delegation across heterogeneous systems. We focus on delegation-scoped attribution and access/share footprint reconstruction, not intent inference or reasoning reconstruction. We present an agent-aware observability substrate consisting of a lightweight gateway and a common information model that binds delegation context at execution time. This enables reliable cross-tool delegation-scoped reconstruction and direct forensic queries without heuristic time-window correlation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that delegation-scoped execution in LLM-based agentic systems cannot be reconstructed from standard audit logs and execution traces because these observables can be identical under multiple incompatible delegation assignments; dynamic tool selection, variable execution sequences, and sub-agent spawning fragment and interleave traces, rendering reconstruction from causal structure alone structurally underdetermined. Existing schemas lack the necessary semantics for delegation-scoped attribution and access/share footprint reconstruction. The authors propose an agent-aware observability substrate consisting of a lightweight gateway and common information model that binds delegation context at execution time to enable reliable cross-tool reconstruction and direct forensic queries.
Significance. If the proposed substrate can be shown to bind context without introducing fragmentation or performance overhead, the work would address a genuine gap in attribution for delegated execution in heterogeneous agentic systems, moving beyond heuristic correlation to direct, semantics-aware reconstruction. The paper receives credit for clearly scoping the problem to attribution rather than intent inference and for framing the issue in terms of structural underdetermination rather than implementation details.
major comments (2)
- [Abstract] Abstract: the central claim that 'audit logs and execution traces can be identical under multiple incompatible delegation assignments' and that reconstruction is 'structurally underdetermined' is asserted without any concrete example, formal definition of the observable signature, or argument showing why additional context (process trees, resource handles, timing) cannot disambiguate; this absence makes the premise that existing schemas 'lack the semantics' an untested assertion rather than a demonstrated gap.
- [Abstract] Abstract: the proposed lightweight gateway and common information model are described only at the architectural level with no specification of the information model, binding mechanism, or query interface, leaving open whether the approach avoids the very fragmentation and performance issues it aims to solve.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'audit logs and execution traces can be identical under multiple incompatible delegation assignments' and that reconstruction is 'structurally underdetermined' is asserted without any concrete example, formal definition of the observable signature, or argument showing why additional context (process trees, resource handles, timing) cannot disambiguate; this absence makes the premise that existing schemas 'lack the semantics' an untested assertion rather than a demonstrated gap.
Authors: We agree that the abstract would benefit from a concrete example to illustrate the central claim. In revision we will add a brief illustrative scenario (e.g., two delegation assignments producing identical tool-call and log sequences due to dynamic sub-agent spawning). The full manuscript already contains a formal argument for structural underdetermination (including why process trees, resource handles, and timing fail to resolve ambiguity under variable execution sequences), but we will ensure the abstract explicitly summarizes this argument rather than merely asserting the gap. revision: yes
-
Referee: [Abstract] Abstract: the proposed lightweight gateway and common information model are described only at the architectural level with no specification of the information model, binding mechanism, or query interface, leaving open whether the approach avoids the very fragmentation and performance issues it aims to solve.
Authors: The manuscript presents the substrate at an architectural level to focus on the novel delegation-scoped semantics. We acknowledge that additional specification would address the referee's concern. We will revise to include a concise specification of the information model (core context fields), the binding mechanism (gateway-mediated context injection at execution time), and example query patterns. A short discussion of overhead will be added to argue that the design avoids fragmentation by construction, as context is bound directly rather than reconstructed post hoc. revision: yes
Circularity Check
No circularity: conceptual proposal without derivations or self-referential reductions
full rationale
The manuscript is a high-level conceptual proposal for an observability substrate. It states the central claim directly (delegation-scoped execution is not identifiable from standard observables) but supplies no equations, fitted parameters, predictions, uniqueness theorems, or ansatzes. No load-bearing step reduces by construction to its own inputs, and no self-citations are invoked to justify the premise. The text therefore contains no instances of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
invented entities (1)
-
agent-aware observability substrate
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Amazon Web Services. 2026. Open Cybersecurity Schema Framework (OCSF) in AWS Security Lake. https://docs.aws.amazon.com/security-lake/latest/ userguide/open-cybersecurity-schema-framework.html
2026
-
[2]
Anthropic. 2025. Agentic Misalignment: How LLMs could be insider threats. Web page. https://www.anthropic.com/research/agentic-misalignment
2025
-
[3]
Adam Bates, Dave Tian, Kevin R. B. Butler, Thomas Moyer, et al. 2015. Trust- worthy Whole-System Provenance for the Linux Kernel. InProceedings of the 24th USENIX Security Symposium
2015
-
[4]
Mert Cemri, Shu Liu, Cathy Chen, Naman Jain, Kushal Arora, Xiangxi Mo, Kannan Ramchandran, Ion Stoica, Kurt Keutzer, and Aditya Parameswaran. 2025. Multi-Agent Systems are Brittle: Failure Modes and Robustness of LLM-Based Agent Pipelines. arXiv:2503.13657 [cs.MA] https://arxiv.org/abs/2503.13657
Pith/arXiv arXiv 2025
-
[5]
Secure two- party quantum evaluation of unitaries against specious adversaries,
Peter Christen. 2012.Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer. doi:10.1007/978-3-642- 31164-2
-
[6]
Cybersecurity and Infrastructure Security Agency (CISA). 2026. Defining In- sider Threats. https://www.cisa.gov/topics/physical-security/insider-threat- mitigation/defining-insider-threats
2026
-
[7]
Ivan P. Fellegi and Alan B. Sunter. 1969. A Theory for Record Linkage.J. Amer. Statist. Assoc.64, 328 (1969), 1183–1210. doi:10.1080/01621459.1969.10501049
-
[8]
Katz, Scott Shenker, and Ion Stoica
Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica
-
[9]
InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI)
X-Trace: A Pervasive Network Tracing Framework. InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 271–284
-
[10]
1994.An Introduction to Software Architecture
David Garlan and Mary Shaw. 1994.An Introduction to Software Architecture. Technical Report CMU-CS-94-166. Carnegie Mellon University
1994
-
[11]
Gustavo González-Granadillo et al. 2021. Security Information and Event Man- agement (SIEM): Analysis, Trends, and Usage in Critical Infrastructures.Sensors 21, 14 (2021). doi:10.3390/s21144759
-
[12]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https: //arxiv.org/abs/2302.12173
Pith/arXiv arXiv 2023
-
[13]
Chawla, Olaf Wiest, and Xiangliang Zhang
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. arXiv:2402.01680 [cs.AI] https://arxiv.org/abs/2402.01680
Pith/arXiv arXiv 2024
-
[14]
Norman Hardy. 1988. The Confused Deputy: (or why capabilities might have been invented).ACM SIGOPS Operating Systems Review22, 4 (1988), 36–38. doi:10.1145/54289.871709
-
[15]
Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, Kangkook Jee, Zhichun Li, and Adam Bates. 2020. Tactical Provenance Analysis for Endpoint De- tection and Response Systems. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1172–1189
2020
-
[16]
Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R
Md Nahid Hossain, Sadegh M. Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R. Sekar, Scott Stoller, and V. N. Venkatakrishnan. 2017. Real- time Attack Scenario Reconstruction from COTS Audit Data. InProceedings of the 26th USENIX Security Symposium. https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/hossain
2017
-
[17]
Evan Hubinger et al. 2024. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv preprint. arXiv:2401.05566 [cs.CR] https: //arxiv.org/abs/2401.05566
Pith/arXiv arXiv 2024
-
[18]
Jiaming Ji and et al. 2023. AI Alignment: A Comprehensive Survey. arXiv preprint. arXiv:2310.19852 [cs.AI] https://arxiv.org/abs/2310.19852
Pith/arXiv arXiv 2023
-
[19]
LangChain. 2026. LangSmith Observability Documentation. https://docs. langchain.com/langsmith/observability
2026
-
[20]
Langfuse. 2026. Langfuse (GitHub Repository). https://github.com/langfuse/ langfuse
2026
-
[21]
Langfuse. 2026. Langfuse Observability Overview. https://langfuse.com/docs/ observability/overview
2026
-
[22]
Xiao Liu et al. 2024. AgentBench: Evaluating LLMs as Agents. InInternational Conference on Learning Representations (ICLR). https://arxiv.org/abs/2308.03688
Pith/arXiv arXiv 2024
-
[23]
Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2018. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems.ACM Transactions on Computer Systems35, 4 (2018), 11:1–11:28. doi:10.1145/3208104
-
[24]
Milajerdi, Rigel Gjomemo, Birhanu Eshete, R
Sadegh M. Milajerdi, Rigel Gjomemo, Birhanu Eshete, R. Sekar, and V. N. Venkatakrishnan. 2019. HOLMES: Real-Time APT Detection Through Cor- relation of Suspicious Information Flows. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1137–1152
2019
-
[25]
2013.Provenance: An Introduction to PROV
Luc Moreau and Paul Groth. 2013.Provenance: An Introduction to PROV. Morgan & Claypool
2013
-
[26]
Luc Moreau, Paul Groth, et al. 2013. PROV-Overview: An Overview of the PROV Family of Documents. https://www.w3.org/TR/prov-overview/
2013
-
[27]
Luc Moreau, Paolo Missier, Khalid Belhajjame, et al. 2013. PROV-DM: The PROV Data Model. https://www.w3.org/TR/prov-dm/
2013
-
[28]
OASIS. 2013. eXtensible Access Control Markup Language (XACML) Version 3.0: Core Specification. https://docs.oasis-open.org/xacml/3.0/xacml-3.0-core- spec-os-en.html
2013
-
[29]
OCSF Community. 2026. OCSF Schema Repository. https://github.com/ocsf/ocsf- schema
2026
-
[30]
OpenTelemetry. 2026. OpenTelemetry Baggage: Concepts. https://opentelemetry. io/docs/concepts/signals/baggage/
2026
-
[31]
OpenTelemetry. 2026. OpenTelemetry Specification. https://opentelemetry.io/ docs/specs/otel/
2026
-
[32]
OpenTelemetry. 2026. OpenTelemetry Specification: Baggage API. https:// opentelemetry.io/docs/specs/otel/baggage/api/
2026
-
[33]
OpenTelemetry. 2026. OpenTelemetry Traces: Concepts. https://opentelemetry. io/docs/concepts/signals/traces/
2026
-
[34]
OWASP Foundation. 2024. OWASP Top 10 for Large Language Model Applica- tions (v2025). PDF. https://owasp.org/www-project-top-10-for-large-language- model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf
2024
-
[35]
OWASP Foundation. 2026. Prompt Injection. Web page. https://owasp.org/www- community/attacks/PromptInjection
2026
-
[36]
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST). 1–22. https://arxiv.org/abs/2304.03442
Pith/arXiv arXiv 2023
-
[37]
Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Ey- ers, Margo Seltzer, and Jean Bacon. 2017. Practical Whole-System Provenance Capture. InProceedings of the ACM Symposium on Cloud Computing (SoCC). doi:10.1145/3127479.3129249
-
[38]
2009.Causality: Models, Reasoning, and Inference(2nd ed.)
Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cam- bridge University Press
2009
-
[39]
Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques for Language Models. https://arxiv.org/abs/2211.09527
Pith/arXiv arXiv 2022
-
[40]
Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler
Devin J. Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler. 2012. Hi-Fi: Collecting High-Fidelity Whole-System Provenance. InProceedings of the Annual Computer Security Applications Conference (ACSAC). 259–268. doi:10. 1145/2420950.2420989
arXiv 2012
-
[41]
Maddison, and Tatsunori Hashimoto
Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto
-
[42]
arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817
Identifying the Risks of LM Agents with an LM-Emulated Sandbox. arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817
-
[43]
Ravi Sandhu, Edward Coyne, Hal Feinstein, and Charles Youman. 1996. Role- Based Access Control Models.IEEE Computer29, 2 (1996), 38–47. doi:10.1109/2. 485845
work page doi:10.1109/2 1996
-
[44]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2302.04761
Pith/arXiv arXiv 2023
-
[45]
Bruce Schneier and John Kelsey. 1999. Secure Audit Logs to Support Computer Forensics.ACM Transactions on Information and System Security2, 2 (1999), 159–176. doi:10.1145/317087.317089
-
[46]
Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag
Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010.Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report Google Technical Report. Google. https://research.google/pubs/pub36356/
2010
-
[47]
W3C. 2021. Trace Context. https://www.w3.org/TR/trace-context/
2021
-
[48]
W3C. 2024. Baggage. https://www.w3.org/TR/baggage/
2024
-
[49]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model Based Autonomous Agents.Frontiers of Computer Science18, 6 (2024)
2024
-
[50]
Qingyun Wu et al. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.arXiv preprint arXiv:2308.08155(2023). https://arxiv.org/abs/2308.08155
Pith/arXiv arXiv 2023
-
[51]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR). https: //arxiv.org/abs/2210.03629
Pith/arXiv arXiv 2023
-
[52]
Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong Wu, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). https://arxiv.org/abs/2402.10753
arXiv 2024
-
[53]
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecA- gent: Benchmarking Indirect Prompt Injections in Tool-Calling LLM Agents. arXiv:2403.02691 [cs.CR] https://arxiv.org/abs/2403.02691 Observability for Delegated Execution in Agentic AI Systems Appendix A Proof of Proposition 2.1 Proof. We make explicit the standard-telemetry assumption ...
Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.