Observability for Delegated Execution in Agentic AI Systems

Abhinav Mishra; Kumar Sharad

arxiv: 2606.09692 · v1 · pith:4JL42VNZnew · submitted 2026-06-08 · 💻 cs.CR · cs.AI

Observability for Delegated Execution in Agentic AI Systems

Abhinav Mishra , Kumar Sharad This is my paper

Pith reviewed 2026-06-27 16:18 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords observabilitydelegationagentic AIaudit logsLLM agentsforensic reconstructionexecution traces

0 comments

The pith

Standard audit logs and execution traces cannot distinguish delegation scope in agentic AI systems because the same traces can arise from incompatible delegation assignments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that delegation-scoped execution in LLM-based agentic systems is structurally underdetermined from existing observables. Agents dynamically choose tools, reorder actions, and spawn sub-agents, which fragments and interleaves traces so that multiple delegation assignments produce identical logs. Existing audit, tracing, and security schemas therefore lack the semantics needed for reliable reconstruction of what actions occurred under a given delegation across heterogeneous tools. The authors introduce an observability substrate that binds delegation context at execution time through a lightweight gateway and common information model. This binding makes cross-tool delegation-scoped reconstruction possible via direct forensic queries rather than heuristic correlation.

Core claim

Delegation-scoped execution is not identifiable from standard observables because audit logs and execution traces can be identical under multiple incompatible delegation assignments; an agent-aware observability substrate consisting of a lightweight gateway and common information model binds delegation context at execution time and thereby enables reliable cross-tool reconstruction without heuristic time-window correlation.

What carries the argument

Agent-aware observability substrate (lightweight gateway plus common information model that binds delegation context at execution time).

If this is right

Direct forensic queries become possible on delegation-scoped footprints instead of relying on post-hoc correlation.
Reconstruction works across heterogeneous tools and systems once the common information model is adopted.
Individual actions remain authorized and logged while the delegation assignment itself becomes attributable.
The approach targets attribution and footprint reconstruction rather than intent or reasoning inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Security monitoring pipelines would need to adopt the common information model at the gateway layer to gain the reconstruction capability.
The same substrate could support compliance reporting that distinguishes actions taken under different user or agent delegations.
Performance overhead of the gateway becomes a practical limit on adoption in high-throughput agent deployments.

Load-bearing premise

Binding delegation context at execution time will enable reliable reconstruction without introducing new fragmentation or performance problems that defeat the purpose.

What would settle it

A set of identical execution traces generated under two different delegation assignments that the proposed gateway and model still cannot separate into unique delegation scopes.

read the original abstract

Delegation-scoped execution is not identifiable from standard observables: audit logs and execution traces can be identical under multiple incompatible delegation assignments. This gap is especially acute in LLM-based agentic systems, where agents dynamically select tools, vary execution sequences across runs for the same instruction, and spawn cooperating sub-agents. These dynamics fragment and interleave traces, making delegation-scoped reconstruction from causal structure alone structurally underdetermined. Although individual actions are authorized and logged, existing audit, tracing, and security schemas lack the semantics to reconstruct what actions occurred under a given delegation across heterogeneous systems. We focus on delegation-scoped attribution and access/share footprint reconstruction, not intent inference or reasoning reconstruction. We present an agent-aware observability substrate consisting of a lightweight gateway and a common information model that binds delegation context at execution time. This enables reliable cross-tool delegation-scoped reconstruction and direct forensic queries without heuristic time-window correlation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper correctly flags trace ambiguity under delegation in agentic systems but supplies only a high-level sketch with no examples, model, or validation.

read the letter

The main thing to know is that this paper identifies how dynamic delegation in LLM agents can produce identical audit logs under different assignments, making reconstruction hard, and suggests a gateway plus common information model to bind context at runtime. That observation is reasonable and the framing around delegation-scoped attribution is a useful way to look at the problem.

What the paper does is lay out the issue clearly: agents pick tools variably, spawn sub-agents, and interleave actions across systems, so standard schemas lack the needed semantics. The proposed substrate aims to enable direct forensic queries without time-window heuristics. If the gateway works, it could reduce reliance on post-hoc correlation.

The soft spots are substantial and central. The claim that reconstruction is structurally underdetermined rests on assertion alone; there is no concrete example of two incompatible delegations yielding the same observables, no definition of the observable signature, and no argument showing why process trees or timing data could not disambiguate. The proposal stays at the architectural level with no implementation details, performance discussion, or evidence that the gateway avoids new fragmentation. The stress-test note holds up.

This is for readers already working on AI security tooling who want design ideas for better attribution. It might spark conversation in that group but does not contain enough substance or grounding for a full paper.

I would not send it to peer review.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that delegation-scoped execution in LLM-based agentic systems cannot be reconstructed from standard audit logs and execution traces because these observables can be identical under multiple incompatible delegation assignments; dynamic tool selection, variable execution sequences, and sub-agent spawning fragment and interleave traces, rendering reconstruction from causal structure alone structurally underdetermined. Existing schemas lack the necessary semantics for delegation-scoped attribution and access/share footprint reconstruction. The authors propose an agent-aware observability substrate consisting of a lightweight gateway and common information model that binds delegation context at execution time to enable reliable cross-tool reconstruction and direct forensic queries.

Significance. If the proposed substrate can be shown to bind context without introducing fragmentation or performance overhead, the work would address a genuine gap in attribution for delegated execution in heterogeneous agentic systems, moving beyond heuristic correlation to direct, semantics-aware reconstruction. The paper receives credit for clearly scoping the problem to attribution rather than intent inference and for framing the issue in terms of structural underdetermination rather than implementation details.

major comments (2)

[Abstract] Abstract: the central claim that 'audit logs and execution traces can be identical under multiple incompatible delegation assignments' and that reconstruction is 'structurally underdetermined' is asserted without any concrete example, formal definition of the observable signature, or argument showing why additional context (process trees, resource handles, timing) cannot disambiguate; this absence makes the premise that existing schemas 'lack the semantics' an untested assertion rather than a demonstrated gap.
[Abstract] Abstract: the proposed lightweight gateway and common information model are described only at the architectural level with no specification of the information model, binding mechanism, or query interface, leaving open whether the approach avoids the very fragmentation and performance issues it aims to solve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'audit logs and execution traces can be identical under multiple incompatible delegation assignments' and that reconstruction is 'structurally underdetermined' is asserted without any concrete example, formal definition of the observable signature, or argument showing why additional context (process trees, resource handles, timing) cannot disambiguate; this absence makes the premise that existing schemas 'lack the semantics' an untested assertion rather than a demonstrated gap.

Authors: We agree that the abstract would benefit from a concrete example to illustrate the central claim. In revision we will add a brief illustrative scenario (e.g., two delegation assignments producing identical tool-call and log sequences due to dynamic sub-agent spawning). The full manuscript already contains a formal argument for structural underdetermination (including why process trees, resource handles, and timing fail to resolve ambiguity under variable execution sequences), but we will ensure the abstract explicitly summarizes this argument rather than merely asserting the gap. revision: yes
Referee: [Abstract] Abstract: the proposed lightweight gateway and common information model are described only at the architectural level with no specification of the information model, binding mechanism, or query interface, leaving open whether the approach avoids the very fragmentation and performance issues it aims to solve.

Authors: The manuscript presents the substrate at an architectural level to focus on the novel delegation-scoped semantics. We acknowledge that additional specification would address the referee's concern. We will revise to include a concise specification of the information model (core context fields), the binding mechanism (gateway-mediated context injection at execution time), and example query patterns. A short discussion of overhead will be added to argue that the design avoids fragmentation by construction, as context is bound directly rather than reconstructed post hoc. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual proposal without derivations or self-referential reductions

full rationale

The manuscript is a high-level conceptual proposal for an observability substrate. It states the central claim directly (delegation-scoped execution is not identifiable from standard observables) but supplies no equations, fitted parameters, predictions, uniqueness theorems, or ansatzes. No load-bearing step reduces by construction to its own inputs, and no self-citations are invoked to justify the premise. The text therefore contains no instances of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only the abstract is available, so the ledger reflects the high-level description without access to any detailed assumptions or parameters in the full manuscript.

invented entities (1)

agent-aware observability substrate no independent evidence
purpose: Binds delegation context at execution time to enable reconstruction
Introduced as the core solution mechanism without reference to prior independent evidence or validation.

pith-pipeline@v0.9.1-grok · 5676 in / 1082 out tokens · 26997 ms · 2026-06-27T16:18:47.536468+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 8 canonical work pages

[1]

Amazon Web Services. 2026. Open Cybersecurity Schema Framework (OCSF) in AWS Security Lake. https://docs.aws.amazon.com/security-lake/latest/ userguide/open-cybersecurity-schema-framework.html

2026
[2]

Anthropic. 2025. Agentic Misalignment: How LLMs could be insider threats. Web page. https://www.anthropic.com/research/agentic-misalignment

2025
[3]

Adam Bates, Dave Tian, Kevin R. B. Butler, Thomas Moyer, et al. 2015. Trust- worthy Whole-System Provenance for the Linux Kernel. InProceedings of the 24th USENIX Security Symposium

2015
[4]

Mert Cemri, Shu Liu, Cathy Chen, Naman Jain, Kushal Arora, Xiangxi Mo, Kannan Ramchandran, Ion Stoica, Kurt Keutzer, and Aditya Parameswaran. 2025. Multi-Agent Systems are Brittle: Failure Modes and Robustness of LLM-Based Agent Pipelines. arXiv:2503.13657 [cs.MA] https://arxiv.org/abs/2503.13657

Pith/arXiv arXiv 2025
[5]

Secure two- party quantum evaluation of unitaries against specious adversaries,

Peter Christen. 2012.Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer. doi:10.1007/978-3-642- 31164-2

work page doi:10.1007/978-3-642- 2012
[6]

Cybersecurity and Infrastructure Security Agency (CISA). 2026. Defining In- sider Threats. https://www.cisa.gov/topics/physical-security/insider-threat- mitigation/defining-insider-threats

2026
[7]

Zelen , title =

Ivan P. Fellegi and Alan B. Sunter. 1969. A Theory for Record Linkage.J. Amer. Statist. Assoc.64, 328 (1969), 1183–1210. doi:10.1080/01621459.1969.10501049

work page doi:10.1080/01621459.1969.10501049 1969
[8]

Katz, Scott Shenker, and Ion Stoica

Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica
[9]

InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI)

X-Trace: A Pervasive Network Tracing Framework. InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 271–284
[10]

1994.An Introduction to Software Architecture

David Garlan and Mary Shaw. 1994.An Introduction to Software Architecture. Technical Report CMU-CS-94-166. Carnegie Mellon University

1994
[11]

Gustavo González-Granadillo et al. 2021. Security Information and Event Man- agement (SIEM): Analysis, Trends, and Usage in Critical Infrastructures.Sensors 21, 14 (2021). doi:10.3390/s21144759

work page doi:10.3390/s21144759 2021
[12]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https: //arxiv.org/abs/2302.12173

Pith/arXiv arXiv 2023
[13]

Chawla, Olaf Wiest, and Xiangliang Zhang

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. arXiv:2402.01680 [cs.AI] https://arxiv.org/abs/2402.01680

Pith/arXiv arXiv 2024
[14]

Norman Hardy. 1988. The Confused Deputy: (or why capabilities might have been invented).ACM SIGOPS Operating Systems Review22, 4 (1988), 36–38. doi:10.1145/54289.871709

work page doi:10.1145/54289.871709 1988
[15]

Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, Kangkook Jee, Zhichun Li, and Adam Bates. 2020. Tactical Provenance Analysis for Endpoint De- tection and Response Systems. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1172–1189

2020
[16]

Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R

Md Nahid Hossain, Sadegh M. Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R. Sekar, Scott Stoller, and V. N. Venkatakrishnan. 2017. Real- time Attack Scenario Reconstruction from COTS Audit Data. InProceedings of the 26th USENIX Security Symposium. https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/hossain

2017
[17]

Evan Hubinger et al. 2024. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv preprint. arXiv:2401.05566 [cs.CR] https: //arxiv.org/abs/2401.05566

Pith/arXiv arXiv 2024
[18]

Jiaming Ji and et al. 2023. AI Alignment: A Comprehensive Survey. arXiv preprint. arXiv:2310.19852 [cs.AI] https://arxiv.org/abs/2310.19852

Pith/arXiv arXiv 2023
[19]

LangChain. 2026. LangSmith Observability Documentation. https://docs. langchain.com/langsmith/observability

2026
[20]

Langfuse. 2026. Langfuse (GitHub Repository). https://github.com/langfuse/ langfuse

2026
[21]

Langfuse. 2026. Langfuse Observability Overview. https://langfuse.com/docs/ observability/overview

2026
[22]

Xiao Liu et al. 2024. AgentBench: Evaluating LLMs as Agents. InInternational Conference on Learning Representations (ICLR). https://arxiv.org/abs/2308.03688

Pith/arXiv arXiv 2024
[23]

Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2018. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems.ACM Transactions on Computer Systems35, 4 (2018), 11:1–11:28. doi:10.1145/3208104

work page doi:10.1145/3208104 2018
[24]

Milajerdi, Rigel Gjomemo, Birhanu Eshete, R

Sadegh M. Milajerdi, Rigel Gjomemo, Birhanu Eshete, R. Sekar, and V. N. Venkatakrishnan. 2019. HOLMES: Real-Time APT Detection Through Cor- relation of Suspicious Information Flows. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1137–1152

2019
[25]

2013.Provenance: An Introduction to PROV

Luc Moreau and Paul Groth. 2013.Provenance: An Introduction to PROV. Morgan & Claypool

2013
[26]

Luc Moreau, Paul Groth, et al. 2013. PROV-Overview: An Overview of the PROV Family of Documents. https://www.w3.org/TR/prov-overview/

2013
[27]

Luc Moreau, Paolo Missier, Khalid Belhajjame, et al. 2013. PROV-DM: The PROV Data Model. https://www.w3.org/TR/prov-dm/

2013
[28]

OASIS. 2013. eXtensible Access Control Markup Language (XACML) Version 3.0: Core Specification. https://docs.oasis-open.org/xacml/3.0/xacml-3.0-core- spec-os-en.html

2013
[29]

OCSF Community. 2026. OCSF Schema Repository. https://github.com/ocsf/ocsf- schema

2026
[30]

OpenTelemetry. 2026. OpenTelemetry Baggage: Concepts. https://opentelemetry. io/docs/concepts/signals/baggage/

2026
[31]

OpenTelemetry. 2026. OpenTelemetry Specification. https://opentelemetry.io/ docs/specs/otel/

2026
[32]

OpenTelemetry. 2026. OpenTelemetry Specification: Baggage API. https:// opentelemetry.io/docs/specs/otel/baggage/api/

2026
[33]

OpenTelemetry. 2026. OpenTelemetry Traces: Concepts. https://opentelemetry. io/docs/concepts/signals/traces/

2026
[34]

OWASP Foundation. 2024. OWASP Top 10 for Large Language Model Applica- tions (v2025). PDF. https://owasp.org/www-project-top-10-for-large-language- model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf

2024
[35]

OWASP Foundation. 2026. Prompt Injection. Web page. https://owasp.org/www- community/attacks/PromptInjection

2026
[36]

O’Brien, Carrie J

Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST). 1–22. https://arxiv.org/abs/2304.03442

Pith/arXiv arXiv 2023
[37]

Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Ey- ers, Margo Seltzer, and Jean Bacon. 2017. Practical Whole-System Provenance Capture. InProceedings of the ACM Symposium on Cloud Computing (SoCC). doi:10.1145/3127479.3129249

work page doi:10.1145/3127479.3129249 2017
[38]

2009.Causality: Models, Reasoning, and Inference(2nd ed.)

Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cam- bridge University Press

2009
[39]

Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques for Language Models. https://arxiv.org/abs/2211.09527

Pith/arXiv arXiv 2022
[40]

Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler

Devin J. Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler. 2012. Hi-Fi: Collecting High-Fidelity Whole-System Provenance. InProceedings of the Annual Computer Security Applications Conference (ACSAC). 259–268. doi:10. 1145/2420950.2420989

arXiv 2012
[41]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto
[42]

arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817

Identifying the Risks of LM Agents with an LM-Emulated Sandbox. arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817

Pith/arXiv arXiv
[43]

Ravi Sandhu, Edward Coyne, Hal Feinstein, and Charles Youman. 1996. Role- Based Access Control Models.IEEE Computer29, 2 (1996), 38–47. doi:10.1109/2. 485845

work page doi:10.1109/2 1996
[44]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2302.04761

Pith/arXiv arXiv 2023
[45]

Bruce Schneier and John Kelsey. 1999. Secure Audit Logs to Support Computer Forensics.ACM Transactions on Information and System Security2, 2 (1999), 159–176. doi:10.1145/317087.317089

work page doi:10.1145/317087.317089 1999
[46]

Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag

Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010.Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report Google Technical Report. Google. https://research.google/pubs/pub36356/

2010
[47]

W3C. 2021. Trace Context. https://www.w3.org/TR/trace-context/

2021
[48]

W3C. 2024. Baggage. https://www.w3.org/TR/baggage/

2024
[49]

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model Based Autonomous Agents.Frontiers of Computer Science18, 6 (2024)

2024
[50]

Qingyun Wu et al. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.arXiv preprint arXiv:2308.08155(2023). https://arxiv.org/abs/2308.08155

Pith/arXiv arXiv 2023
[51]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR). https: //arxiv.org/abs/2210.03629

Pith/arXiv arXiv 2023
[52]

Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong Wu, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). https://arxiv.org/abs/2402.10753

arXiv 2024
[53]

research-agent

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecA- gent: Benchmarking Indirect Prompt Injections in Tool-Calling LLM Agents. arXiv:2403.02691 [cs.CR] https://arxiv.org/abs/2403.02691 Observability for Delegated Execution in Agentic AI Systems Appendix A Proof of Proposition 2.1 Proof. We make explicit the standard-telemetry assumption ...

Pith/arXiv arXiv 2024

[1] [1]

Amazon Web Services. 2026. Open Cybersecurity Schema Framework (OCSF) in AWS Security Lake. https://docs.aws.amazon.com/security-lake/latest/ userguide/open-cybersecurity-schema-framework.html

2026

[2] [2]

Anthropic. 2025. Agentic Misalignment: How LLMs could be insider threats. Web page. https://www.anthropic.com/research/agentic-misalignment

2025

[3] [3]

Adam Bates, Dave Tian, Kevin R. B. Butler, Thomas Moyer, et al. 2015. Trust- worthy Whole-System Provenance for the Linux Kernel. InProceedings of the 24th USENIX Security Symposium

2015

[4] [4]

Mert Cemri, Shu Liu, Cathy Chen, Naman Jain, Kushal Arora, Xiangxi Mo, Kannan Ramchandran, Ion Stoica, Kurt Keutzer, and Aditya Parameswaran. 2025. Multi-Agent Systems are Brittle: Failure Modes and Robustness of LLM-Based Agent Pipelines. arXiv:2503.13657 [cs.MA] https://arxiv.org/abs/2503.13657

Pith/arXiv arXiv 2025

[5] [5]

Secure two- party quantum evaluation of unitaries against specious adversaries,

Peter Christen. 2012.Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer. doi:10.1007/978-3-642- 31164-2

work page doi:10.1007/978-3-642- 2012

[6] [6]

Cybersecurity and Infrastructure Security Agency (CISA). 2026. Defining In- sider Threats. https://www.cisa.gov/topics/physical-security/insider-threat- mitigation/defining-insider-threats

2026

[7] [7]

Zelen , title =

Ivan P. Fellegi and Alan B. Sunter. 1969. A Theory for Record Linkage.J. Amer. Statist. Assoc.64, 328 (1969), 1183–1210. doi:10.1080/01621459.1969.10501049

work page doi:10.1080/01621459.1969.10501049 1969

[8] [8]

Katz, Scott Shenker, and Ion Stoica

Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica

[9] [9]

InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI)

X-Trace: A Pervasive Network Tracing Framework. InProceedings of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 271–284

[10] [10]

1994.An Introduction to Software Architecture

David Garlan and Mary Shaw. 1994.An Introduction to Software Architecture. Technical Report CMU-CS-94-166. Carnegie Mellon University

1994

[11] [11]

Gustavo González-Granadillo et al. 2021. Security Information and Event Man- agement (SIEM): Analysis, Trends, and Usage in Critical Infrastructures.Sensors 21, 14 (2021). doi:10.3390/s21144759

work page doi:10.3390/s21144759 2021

[12] [12]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https: //arxiv.org/abs/2302.12173

Pith/arXiv arXiv 2023

[13] [13]

Chawla, Olaf Wiest, and Xiangliang Zhang

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large Language Model based Multi-Agents: A Survey of Progress and Challenges. arXiv:2402.01680 [cs.AI] https://arxiv.org/abs/2402.01680

Pith/arXiv arXiv 2024

[14] [14]

Norman Hardy. 1988. The Confused Deputy: (or why capabilities might have been invented).ACM SIGOPS Operating Systems Review22, 4 (1988), 36–38. doi:10.1145/54289.871709

work page doi:10.1145/54289.871709 1988

[15] [15]

Wajih Ul Hassan, Shengjian Guo, Ding Li, Zhengzhang Chen, Kangkook Jee, Zhichun Li, and Adam Bates. 2020. Tactical Provenance Analysis for Endpoint De- tection and Response Systems. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1172–1189

2020

[16] [16]

Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R

Md Nahid Hossain, Sadegh M. Milajerdi, Junao Wang, Birhanu Eshete, Rigel Gjomemo, R. Sekar, Scott Stoller, and V. N. Venkatakrishnan. 2017. Real- time Attack Scenario Reconstruction from COTS Audit Data. InProceedings of the 26th USENIX Security Symposium. https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/presentation/hossain

2017

[17] [17]

Evan Hubinger et al. 2024. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. arXiv preprint. arXiv:2401.05566 [cs.CR] https: //arxiv.org/abs/2401.05566

Pith/arXiv arXiv 2024

[18] [18]

Jiaming Ji and et al. 2023. AI Alignment: A Comprehensive Survey. arXiv preprint. arXiv:2310.19852 [cs.AI] https://arxiv.org/abs/2310.19852

Pith/arXiv arXiv 2023

[19] [19]

LangChain. 2026. LangSmith Observability Documentation. https://docs. langchain.com/langsmith/observability

2026

[20] [20]

Langfuse. 2026. Langfuse (GitHub Repository). https://github.com/langfuse/ langfuse

2026

[21] [21]

Langfuse. 2026. Langfuse Observability Overview. https://langfuse.com/docs/ observability/overview

2026

[22] [22]

Xiao Liu et al. 2024. AgentBench: Evaluating LLMs as Agents. InInternational Conference on Learning Representations (ICLR). https://arxiv.org/abs/2308.03688

Pith/arXiv arXiv 2024

[23] [23]

Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2018. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems.ACM Transactions on Computer Systems35, 4 (2018), 11:1–11:28. doi:10.1145/3208104

work page doi:10.1145/3208104 2018

[24] [24]

Milajerdi, Rigel Gjomemo, Birhanu Eshete, R

Sadegh M. Milajerdi, Rigel Gjomemo, Birhanu Eshete, R. Sekar, and V. N. Venkatakrishnan. 2019. HOLMES: Real-Time APT Detection Through Cor- relation of Suspicious Information Flows. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 1137–1152

2019

[25] [25]

2013.Provenance: An Introduction to PROV

Luc Moreau and Paul Groth. 2013.Provenance: An Introduction to PROV. Morgan & Claypool

2013

[26] [26]

Luc Moreau, Paul Groth, et al. 2013. PROV-Overview: An Overview of the PROV Family of Documents. https://www.w3.org/TR/prov-overview/

2013

[27] [27]

Luc Moreau, Paolo Missier, Khalid Belhajjame, et al. 2013. PROV-DM: The PROV Data Model. https://www.w3.org/TR/prov-dm/

2013

[28] [28]

OASIS. 2013. eXtensible Access Control Markup Language (XACML) Version 3.0: Core Specification. https://docs.oasis-open.org/xacml/3.0/xacml-3.0-core- spec-os-en.html

2013

[29] [29]

OCSF Community. 2026. OCSF Schema Repository. https://github.com/ocsf/ocsf- schema

2026

[30] [30]

OpenTelemetry. 2026. OpenTelemetry Baggage: Concepts. https://opentelemetry. io/docs/concepts/signals/baggage/

2026

[31] [31]

OpenTelemetry. 2026. OpenTelemetry Specification. https://opentelemetry.io/ docs/specs/otel/

2026

[32] [32]

OpenTelemetry. 2026. OpenTelemetry Specification: Baggage API. https:// opentelemetry.io/docs/specs/otel/baggage/api/

2026

[33] [33]

OpenTelemetry. 2026. OpenTelemetry Traces: Concepts. https://opentelemetry. io/docs/concepts/signals/traces/

2026

[34] [34]

OWASP Foundation. 2024. OWASP Top 10 for Large Language Model Applica- tions (v2025). PDF. https://owasp.org/www-project-top-10-for-large-language- model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf

2024

[35] [35]

OWASP Foundation. 2026. Prompt Injection. Web page. https://owasp.org/www- community/attacks/PromptInjection

2026

[36] [36]

O’Brien, Carrie J

Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST). 1–22. https://arxiv.org/abs/2304.03442

Pith/arXiv arXiv 2023

[37] [37]

Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Ey- ers, Margo Seltzer, and Jean Bacon. 2017. Practical Whole-System Provenance Capture. InProceedings of the ACM Symposium on Cloud Computing (SoCC). doi:10.1145/3127479.3129249

work page doi:10.1145/3127479.3129249 2017

[38] [38]

2009.Causality: Models, Reasoning, and Inference(2nd ed.)

Judea Pearl. 2009.Causality: Models, Reasoning, and Inference(2nd ed.). Cam- bridge University Press

2009

[39] [39]

Fábio Perez and Ian Ribeiro. 2022. Ignore Previous Prompt: Attack Techniques for Language Models. https://arxiv.org/abs/2211.09527

Pith/arXiv arXiv 2022

[40] [40]

Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler

Devin J. Pohly, Stephen McLaughlin, Patrick McDaniel, and Kevin Butler. 2012. Hi-Fi: Collecting High-Fidelity Whole-System Provenance. InProceedings of the Annual Computer Security Applications Conference (ACSAC). 259–268. doi:10. 1145/2420950.2420989

arXiv 2012

[41] [41]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto

[42] [42]

arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817

Identifying the Risks of LM Agents with an LM-Emulated Sandbox. arXiv:2309.15817 [cs.AI] https://arxiv.org/abs/2309.15817

Pith/arXiv arXiv

[43] [43]

Ravi Sandhu, Edward Coyne, Hal Feinstein, and Charles Youman. 1996. Role- Based Access Control Models.IEEE Computer29, 2 (1996), 38–47. doi:10.1109/2. 485845

work page doi:10.1109/2 1996

[44] [44]

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2302.04761

Pith/arXiv arXiv 2023

[45] [45]

Bruce Schneier and John Kelsey. 1999. Secure Audit Logs to Support Computer Forensics.ACM Transactions on Information and System Security2, 2 (1999), 159–176. doi:10.1145/317087.317089

work page doi:10.1145/317087.317089 1999

[46] [46]

Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag

Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010.Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report Google Technical Report. Google. https://research.google/pubs/pub36356/

2010

[47] [47]

W3C. 2021. Trace Context. https://www.w3.org/TR/trace-context/

2021

[48] [48]

W3C. 2024. Baggage. https://www.w3.org/TR/baggage/

2024

[49] [49]

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model Based Autonomous Agents.Frontiers of Computer Science18, 6 (2024)

2024

[50] [50]

Qingyun Wu et al. 2023. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework.arXiv preprint arXiv:2308.08155(2023). https://arxiv.org/abs/2308.08155

Pith/arXiv arXiv 2023

[51] [51]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR). https: //arxiv.org/abs/2210.03629

Pith/arXiv arXiv 2023

[52] [52]

Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong Wu, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). https://arxiv.org/abs/2402.10753

arXiv 2024

[53] [53]

research-agent

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecA- gent: Benchmarking Indirect Prompt Injections in Tool-Calling LLM Agents. arXiv:2403.02691 [cs.CR] https://arxiv.org/abs/2403.02691 Observability for Delegated Execution in Agentic AI Systems Appendix A Proof of Proposition 2.1 Proof. We make explicit the standard-telemetry assumption ...

Pith/arXiv arXiv 2024