AgentFlow: Building Agent Dependency Graphs for Static Analysis of Agent Programs

Haoyu Wang; Shenao Wang; Xiao Cheng; Xinyi Hou; Yanjie Zhao

arxiv: 2607.01640 · v1 · pith:ANSFEE7Enew · submitted 2026-07-02 · 💻 cs.SE · cs.CR

AgentFlow: Building Agent Dependency Graphs for Static Analysis of Agent Programs

Shenao Wang , Xinyi Hou , Yanjie Zhao , Xiao Cheng , Haoyu Wang This is my paper

Pith reviewed 2026-07-03 09:14 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords static analysisLLM agentsagent dependency graphsoftware bill of materialstaint analysisagent frameworksprompt-to-tool risksframework semantics

0 comments

The pith

AgentFlow recovers agent entities and dependencies missed by existing AST-based static analysis through a new graph representation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM agent programs combine host-language code with framework-defined elements such as agent constructors, tool decorators, and handoff declarations. These create dependencies that standard static analysis tools based on abstract syntax trees do not fully capture. AgentFlow builds an Agent Dependency Graph to represent agents, prompts, models, capabilities, memory, and control policies as typed nodes connected by component, control-flow, and data-flow edges. The graph supports generation of dependency-aware Agent Bills of Materials and detection of taint-style prompt-to-tool risks. Evaluation across 5,399 real-world programs from five frameworks shows recovery of richer entities and dependencies along with identification of 238 such risks.

Core claim

AgentFlow constructs an Agent Dependency Graph (ADG) as a framework-agnostic representation where agents, prompts, models, capabilities, memory states, and control policies appear as typed nodes and their component, control-flow, and data-flow relations appear as typed edges. This structure recovers framework-induced semantics from static source code that existing AST-based tools overlook. The resulting graphs enable analyses that produce more complete Agent Bills of Materials and detect prompt-to-tool taint risks, with the evaluation on the AgentZoo corpus of 5,399 programs confirming richer recovery and 238 identified risks.

What carries the argument

The Agent Dependency Graph (ADG), a typed node-and-edge structure that encodes agent components and their framework-induced dependencies to support static recovery and downstream analyses.

If this is right

Agent programs yield more complete Bills of Materials that track agent-specific dependencies.
Taint-style prompt-to-tool risks become detectable through static analysis of existing codebases.
The same graph representation applies across multiple agent frameworks without per-framework runtime instrumentation.
Agent governance and security analyses gain a common intermediate representation for dependency tracking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The ADG could serve as input to dynamic or hybrid analysis tools that validate static recoveries at runtime.
Similar graph constructions might apply to other framework-heavy domains such as workflow engines or plugin systems.
Security scanning pipelines for agents could standardize on ADG outputs for consistent risk reporting.

Load-bearing premise

Framework-induced semantics expressed through constructors, decorators, and handoff declarations can be reliably identified from static source code alone.

What would settle it

A comparison of ADG-derived dependencies and risks against those observed by executing a sample of the evaluated agent programs and inspecting their runtime behavior.

Figures

Figures reproduced from arXiv: 2607.01640 by Haoyu Wang, Shenao Wang, Xiao Cheng, Xinyi Hou, Yanjie Zhao.

**Figure 2.** Figure 2: Motivating example of agent-specific dependencies in an agent program. The left side shows an OpenAI-Agent-SDK [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of AGENTFLOW. Here, A, I, M, C, S, and G denote the abstract domains of agent units, prompt contexts, model units, agent capabilities, memory states, and control policies. Agent Component Dependency Graph. The ACDG captures static component binding relations. ACDG is an undirected graph, ACDGP = (V, Eacdg), where Eacdg contains undirected dependency edges between statically associated entities. Fo… view at source ↗

**Figure 4.** Figure 4: Agent fact extraction. AGENTFLOW resolves framework object references in an OpenAI-Agent-SDK style code excerpt and normalizes them into entity, component, control, and data facts for ADG construction. callable tool, HostedMCPTool creates the MCP capability with an approval policy, and Agent(...) binds models, instructions, and tools to an agent. Second, AGENTFLOW resolves the alias references to connect e… view at source ↗

**Figure 5.** Figure 5: Queries for Agent BOM and Prompt-to-Tool. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

LLM agents are increasingly developed as source-code applications built on agent frameworks. These agent programs combine conventional host-language code with framework-defined semantics for models, prompts, tools, memory, and multi-agent orchestration logic. As a result, their behavior depends not only on traditional control and data flows, but also on a new class of agent dependencies. Such dependencies are often expressed as framework-induced semantics, such as agent constructors, tool decorators, and agent handoff declarations, making them difficult to recover with existing static analysis or dependency tracking tools. In this paper, we present AgentFlow, the first static analysis framework for recovering and analyzing agent dependencies from agent programs. AgentFlow constructs an Agent Dependency Graph (ADG), a framework-agnostic graph representation that represents agents, prompts, models, capabilities, memory states, and control policies as typed nodes, and captures their component-dependency, control-flow, and data-flow dependencies as typed edges. Built on ADGs, AgentFlow supports a range of analyses for agent governance and security, including Agent Bill of Materials (BOM) generation and prompt-to-tool risk detection. We implement AgentFlow for five representative agent frameworks and evaluate it on AgentZoo, a corpus of 5,399 real-world agent programs. Our evaluation shows that AgentFlow recovers richer agent entities and dependencies than existing AST-based agent static analysis tools, generates more dependency-aware Agent BOMs, and uncovers 238 taint-style prompt-to-tool risks in real-world agent programs. These results show that ADG provides a practical foundation for understanding, governing, and securing emerging agent software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AgentFlow gives the first static way to recover framework-specific agent dependencies like handoffs and decorators in a typed graph, shown working across five frameworks on thousands of programs.

read the letter

The main takeaway is that AgentFlow builds the first static analysis framework that recovers agent-level dependencies from source code across multiple LLM agent frameworks, using a new typed graph called the ADG.

They define nodes for things like agents, prompts, models, capabilities, memory, and control policies, then add edges for component dependencies, control flow, and data flow. This lets them generate Agent BOMs and detect taint-style risks from prompts to tools. The implementation covers five frameworks and they tested on a corpus of 5,399 programs from AgentZoo. The results show more entities and dependencies than AST baselines, plus 238 risks identified.

That evaluation on real code at that scale is the part that stands out. It shows the approach can be applied practically without needing runtime execution for the analysis.

The weaker part is the lack of detailed accuracy metrics. The paper reports the raw counts but does not include precision or recall numbers for the recovered dependencies or the risk detections. This leaves some uncertainty about how complete or accurate the ADGs are, especially if any framework uses patterns not covered by their static front-ends. The concern about missing dynamic behaviors is worth checking in the full implementation, but the paper positions the static method as sufficient for the cases they handle.

Readers working on static analysis for emerging software paradigms or on security tooling for AI agents will find this useful. It is worth bringing to a reading group for discussion on how to analyze non-traditional code.

The paper deserves a serious referee because it tackles a timely problem with a concrete implementation and large-scale experiment. I would recommend sending it to peer review, with feedback focused on adding validation metrics for the analysis results.

Referee Report

2 major / 2 minor

Summary. The paper presents AgentFlow as the first static analysis framework for LLM-based agent programs. It defines an Agent Dependency Graph (ADG) representation with typed nodes for agents, prompts, models, capabilities, memory, and policies, plus typed edges for component, control-flow, and data-flow dependencies. The system implements front-ends for five agent frameworks, constructs ADGs from source, and supports Agent BOM generation and taint-style prompt-to-tool risk detection. On a corpus of 5,399 real-world programs (AgentZoo), it claims richer entity/dependency recovery than existing AST-based tools, more dependency-aware BOMs, and detection of 238 risks.

Significance. If the recovery accuracy holds, the work supplies a practical, framework-agnostic foundation for static governance and security analysis of agent software, which is timely given the rapid adoption of multi-agent systems. The scale of the evaluation corpus and the implementation across five frameworks are concrete strengths that would support adoption if accompanied by validation metrics.

major comments (2)

[Evaluation] Evaluation section (and abstract): the central claims of 'richer agent entities and dependencies' and 'uncovers 238 taint-style prompt-to-tool risks' are stated as direct outputs of running the tool, yet no precision, recall, or ground-truth comparison against manual inspection or dynamic oracles is reported. This is load-bearing because the reported counts cannot be interpreted without knowing the false-positive rate of the static pattern matching for framework semantics.
[Implementation] Implementation and ADG construction (likely §3–4): the recovery of framework-induced nodes/edges (constructors, @tool decorators, handoff declarations) is presented as reliable via static IR traversal, but the manuscript provides no discussion or experiment addressing cases of dynamic registration, conditional instantiation, or metaprogramming that would cause under-approximation. A concrete validation (e.g., manual audit of 50 programs or comparison to runtime traces) is required to support the claim that ADGs are faithful rather than artifacts of the chosen front-ends.

minor comments (2)

[Evaluation] Define the precise criteria used to declare one ADG 'richer' than an AST baseline (e.g., node/edge counts per category) so that the comparison is reproducible.
[Evaluation] Clarify whether the 238 risks are unique across the corpus or include duplicates, and whether any manual triage was performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The two major comments identify important gaps in validation that we will address through targeted revisions. Below we respond point-by-point.

read point-by-point responses

Referee: [Evaluation] Evaluation section (and abstract): the central claims of 'richer agent entities and dependencies' and 'uncovers 238 taint-style prompt-to-tool risks' are stated as direct outputs of running the tool, yet no precision, recall, or ground-truth comparison against manual inspection or dynamic oracles is reported. This is load-bearing because the reported counts cannot be interpreted without knowing the false-positive rate of the static pattern matching for framework semantics.

Authors: We agree that the current evaluation relies on comparative counts (entities/dependencies recovered versus AST baselines) and raw detection counts (238 risks) without reporting precision or recall against ground truth. This limits interpretability of the absolute numbers. In the revision we will add a manual audit of a random sample of 100 programs from AgentZoo, performed by two authors with adjudication, to compute precision for entity/dependency recovery and for the prompt-to-tool taint rules. We will also report the number of programs where dynamic features prevented full recovery. These metrics will be added to §5 and the abstract will be updated to qualify the claims accordingly. revision: yes
Referee: [Implementation] Implementation and ADG construction (likely §3–4): the recovery of framework-induced nodes/edges (constructors, @tool decorators, handoff declarations) is presented as reliable via static IR traversal, but the manuscript provides no discussion or experiment addressing cases of dynamic registration, conditional instantiation, or metaprogramming that would cause under-approximation. A concrete validation (e.g., manual audit of 50 programs or comparison to runtime traces) is required to support the claim that ADGs are faithful rather than artifacts of the chosen front-ends.

Authors: The referee correctly notes the absence of any treatment of dynamic registration, conditional instantiation, and metaprogramming. The current front-ends perform static traversal of constructors, decorators, and handoff calls; any runtime-determined behavior is therefore under-approximated by design. In the revision we will add an explicit “Limitations” subsection in §4 that enumerates these sources of under-approximation and will include the results of the 100-program manual audit (mentioned above) that also flags programs exhibiting dynamic patterns. We will not claim soundness but will characterize the static nature of the analysis. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical tool evaluation is self-contained

full rationale

The paper presents AgentFlow as a new static analysis framework that builds ADGs from source code and evaluates it empirically on the AgentZoo corpus. Reported outcomes (richer entity recovery than AST baselines, more dependency-aware BOMs, and 238 detected risks) are direct results of executing the implemented front-ends and analyses on real programs, not quantities obtained by fitting parameters to subsets of the same data or by renaming known patterns. No equations, self-definitional constructs, load-bearing self-citations, uniqueness theorems, or smuggled ansatzes appear in the abstract or described methodology. The contribution is an engineering artifact whose claims rest on observable tool outputs rather than any derivation that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that static pattern matching over framework APIs is sufficient to reconstruct agent-level semantics; the paper introduces the ADG as a new modeling artifact without external validation of its completeness.

axioms (1)

domain assumption Framework-induced agent semantics (constructors, decorators, handoffs) are statically recoverable from source code without runtime information.
Invoked to justify construction of the ADG from static ASTs.

invented entities (1)

Agent Dependency Graph (ADG) no independent evidence
purpose: Typed graph that represents agents, prompts, models, capabilities, memory, and policies as nodes with component, control, and data-flow edges.
New representation introduced by the paper; no independent evidence supplied that the graph is complete or sound beyond the authors' implementation.

pith-pipeline@v0.9.1-grok · 5828 in / 1442 out tokens · 22421 ms · 2026-07-03T09:14:48.219033+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 47 canonical work pages · 12 internal anchors

[1]

State of agentic ai security and governance,

OW ASP GenAI Security Project, “State of agentic ai security and governance,” https://genai.owasp.org/resource/ state-of-agentic-ai-security-and-governance/, 2026, accessed: 2026-07- 01

2026
[2]

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

X. Jiang, S. Yang, Z. Li, L. Liu, H. Yu, and Y . Liu, “Chaincaps: Composition-safe tool-using agents via monotonic capability attenuation,” 2026. [Online]. Available: https://arxiv.org/abs/2605.26542

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

S. Wang, X. Hou, Z. Liu, Y . Zhao, X. Cheng, Q. Zou, X. Zhang, and H. Wang, “Demystifying and detecting agentic workflow injection vulnerabilities in github actions,”CoRR, vol. abs/2605.07135, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2605.07135

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.07135 2026
[4]

Langgraph,

LangChain, “Langgraph,” https://github.com/langchain-ai/langgraph, 2026, accessed: 2026-07-01

2026
[5]

Openai agents sdk,

OpenAI, “Openai agents sdk,” https://github.com/openai/ openai-agents-python, 2026, accessed: 2026-07-01

2026
[6]

CrewAI, “Crewai,” https://github.com/crewAIInc/crewAI, 2026, ac- cessed: 2026-07-01

2026
[7]

Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,

S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y . L. Traon, D. Octeau, and P. D. McDaniel, “Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,” inACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, ...

work page doi:10.1145/2594291.2594299 2014
[8]

Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps,

F. Wei, S. Roy, X. Ou, and Robby, “Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps,”ACM Trans. Priv. Secur., vol. 21, no. 3, Apr. 2018. [Online]. Available: https://doi.org/10.1145/3183575

work page doi:10.1145/3183575 2018
[9]

Reducing static analysis unsoundness with approximate interpretation,

M. R. Laursen, W. Xu, and A. Møller, “Reducing static analysis unsoundness with approximate interpretation,”Proc. ACM Program. Lang., vol. 8, no. PLDI, Jun. 2024. [Online]. Available: https: //doi.org/10.1145/3656424

work page doi:10.1145/3656424 2024
[10]

Y ASA: scalable multi-language taint analysis on the unified AST at ant group,

Y . Wang, S. Wang, J. Zhao, S. Shi, T. Li, Y . Cheng, L. Bian, K. Yu, Y . Zhao, and H. Wang, “Y ASA: scalable multi-language taint analysis on the unified AST at ant group,”CoRR, vol. abs/2601.17390, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.17390

work page doi:10.48550/arxiv.2601.17390 2026
[11]

SVF: interprocedural static value-flow analysis in LLVM,

Y . Sui and J. Xue, “SVF: interprocedural static value-flow analysis in LLVM,” inProceedings of the 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, March 12-18, 2016, A. Zaks and M. V . Hermenegildo, Eds. ACM, 2016, pp. 265–266. [Online]. Available: https://doi.org/10.1145/2892208.2892235

work page doi:10.1145/2892208.2892235 2016
[12]

"elementary, my dear watson

S. Wang, J. He, Y . Zhao, Y . Wang, K. Yu, and H. Wang, “”elementary, my dear watson.” detecting malicious skills via neuro-symbolic reasoning across heterogeneous artifacts,”CoRR, vol. abs/2603.27204,

work page arXiv
[13]

"elementary, my dear watson

[Online]. Available: https://doi.org/10.48550/arXiv.2603.27204

work page doi:10.48550/arxiv.2603.27204
[14]

Multi-agentic system threat modeling guide v1.0,

OW ASP GenAI Security Project, “Multi-agentic system threat modeling guide v1.0,” https://genai.owasp.org/resource/ multi-agentic-system-threat-modeling-guide-v1-0/, 2025, accessed: 2026-07-01

2025
[15]

Not all ai boms are created equal,

U. Feldman, “Not all ai boms are created equal,” https://www.pillar. security/blog/not-all-ai-boms-are-created-equal, 2025, accessed: 2026- 07-01

2025
[16]

Agentproof: Static verification of agent workflow graphs,

M. Xavier, V . M. A, M. Jolly, and M. Xavier, “Agentproof: Static verification of agent workflow graphs,”CoRR, vol. abs/2603.20356,

work page arXiv
[17]

Agentproof: Static verification of agent workflow graphs,

[Online]. Available: https://doi.org/10.48550/arXiv.2603.20356

work page doi:10.48550/arxiv.2603.20356
[18]

Agent-wiz: A cli tool for threat modeling and visualizing ai agents,

Repello AI, “Agent-wiz: A cli tool for threat modeling and visualizing ai agents,” https://github.com/Repello-AI/Agent-Wiz, 2026, accessed: 2026-07-01

2026
[19]

SPDX 3.0 specification,

SPDX Project, “SPDX 3.0 specification,” https://spdx.github.io/ spdx-spec/v3.0/, 2024, accessed: 2026-07-01

2024
[20]

The minimum elements for a software bill of materials (sbom),

National Telecommunications and Information Administration, “The minimum elements for a software bill of materials (sbom),” https://www. ntia.gov/report/2021/minimum-elements-software-bill-materials-sbom, 2021, accessed: 2026-07-01

2021
[21]

Implementing AI bill of materials (AI BOM) with SPDX 3.0: A comprehensive guide to creating AI and dataset bill of materials,

K. Bennet, G. K. Rajbahadur, A. Suriyawongkul, and K. Stewart, “Implementing AI bill of materials (AI BOM) with SPDX 3.0: A comprehensive guide to creating AI and dataset bill of materials,” CoRR, vol. abs/2504.16743, 2025. [Online]. Available: https://doi.org/ 10.48550/arXiv.2504.16743

work page doi:10.48550/arxiv.2504.16743 2025
[22]

TAIBOM: bringing trustworthiness to ai-enabled systems,

V . Safronov, A. McCaigue, N. Allott, and A. Martin, “TAIBOM: bringing trustworthiness to ai-enabled systems,” inProceedings of the 1st International Workshop on Security and Privacy-Preserving AI/ML co-located with 28th European Conference on Artificial Intelligence (ECAI 2025), Bologna, Italy, October 26th, 2025, ser. CEUR Workshop Proceedings, J. Leich...

2025
[23]

Aibomgen: Generating an AI bill of materials for secure, transparent, and compliant model training,

W. Vandendriessche, J. Thijsman, L. D’hooge, B. V olckaert, and M. Sebrechts, “Aibomgen: Generating an AI bill of materials for secure, transparent, and compliant model training,”CoRR, vol. abs/2601.05703,

work page arXiv
[24]

Aibomgen: Generating an AI bill of materials for secure, transparent, and compliant model training,

[Online]. Available: https://doi.org/10.48550/arXiv.2601.05703

work page doi:10.48550/arxiv.2601.05703
[25]

Towards Security-Auditable LLM Agents: A Unified Graph Representation

C. Li, L. Zhang, J. Zhai, S. Feng, X. Yang, H. Wang, S. Dou, Y . Ji, Y . Hu, Y . Wu, Y . Liu, and D. Zou, “Towards security-auditable LLM agents: A unified graph representation,”CoRR, vol. abs/2605.06812,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Towards Security-Auditable LLM Agents: A Unified Graph Representation

[Online]. Available: https://doi.org/10.48550/arXiv.2605.06812

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.06812
[27]

AgentRiskBOM: A Risk-Scoping Security Bill of Materials for Agentic AI Systems

S. Dutta and A. K. Moharir, “Agentriskbom: A risk-scoping security bill of materials for agentic ai systems,” 2026. [Online]. Available: https://arxiv.org/abs/2606.21877

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (MCP): landscape, security threats, and future research directions,”CoRR, vol. abs/2503.23278, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2503.23278

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.23278 2025
[29]

Mcptox: A benchmark for tool poisoning on real-world MCP servers,

Z. Wang, Y . Gao, Y . Wang, S. Liu, H. Sun, H. Cheng, G. Shi, H. Du, and X. Li, “Mcptox: A benchmark for tool poisoning on real-world MCP servers,” inFortieth AAAI Conference on Artificial Intelligence, Thirty- Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Advances in Artificial Intelligence, A...

work page doi:10.1609/aaai.v40i42.40895 2026
[30]

Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

X. Hou, Y . Zhao, and H. Wang, “Unsafe by flow: Uncovering bidirec- tional data-flow risks in MCP ecosystem,”CoRR, vol. abs/2605.07836,

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

[Online]. Available: https://doi.org/10.48550/arXiv.2605.07836

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.07836
[32]

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

J. Wu, Y . Nan, Y . Lin, H. Wang, Y . Xiao, S. Wang, and Z. Zheng, “Skillscope: Toward fine-grained least-privilege enforcement for agent skills,”CoRR, vol. abs/2605.05868, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2605.05868

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.05868 2026
[33]

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills

W. Guo, W. Zeng, C. Liu, X. Jia, Y . Xu, L. Tang, Y . Fang, and Y . Liu, “Malskillbench: A runtime-verified benchmark of malicious agent skills,” 2026. [Online]. Available: https://arxiv.org/abs/2606.07131

work page internal anchor Pith review Pith/arXiv arXiv 2026
[34]

"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

Y . Liu, Z. Chen, Y . Zhang, G. Deng, Y . Li, J. Ning, and L. Y . Zhang, “Malicious agent skills in the wild: A large-scale security empirical study,”CoRR, vol. abs/2602.06547, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2602.06547

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.06547 2026
[35]

Chainfuzzer: Greybox fuzzing for workflow-level multi-tool vulnerabilities in LLM agents,

J. Wu, Z. Yao, Y . Nan, and Z. Zheng, “Chainfuzzer: Greybox fuzzing for workflow-level multi-tool vulnerabilities in LLM agents,” CoRR, vol. abs/2603.12614, 2026. [Online]. Available: https://doi.org/ 10.48550/arXiv.2603.12614

work page doi:10.48550/arxiv.2603.12614 2026
[36]

Demystifying rce vulnerabilities in llm-integrated apps,

T. Liu, Z. Deng, G. Meng, Y . Li, and K. Chen, “Demystifying rce vulnerabilities in llm-integrated apps,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1716–1730. [Online]. Available: https://doi.org/10.1145/3658644.3690338

work page doi:10.1145/3658644.3690338 2024
[37]

Make agent defeat agent: Automatic detection of taint-style vulnerabilities in llm-based agents,

F. Liu, Y . Zhang, J. Luo, J. Dai, T. Chen, L. Yuan, Z. Yu, Y . Shi, K. Li, C. Zhou, H. Chen, and M. Yang, “Make agent defeat agent: Automatic detection of taint-style vulnerabilities in llm-based agents,” in34th USENIX Security Symposium, USENIX Security 2025, Seattle, WA, USA, August 13-15, 2025, L. Bauer and G. Pellegrino, Eds. USENIX Association, 2025...

2025
[38]

Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and Defenses ,

R. Pedro, M. E. Coimbra, D. Castro, P. Carreira, and N. Santos, “ Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and Defenses ,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 1768–1780. [Online]. Available: https://doi.ieeecomputersociety.org/10...

work page doi:10.1109/icse55347.2025.00007 2025
[39]

Taintp2x: Detecting taint-style prompt-to-anything injection vulnerabilities in llm-integrated applications,

J. He, S. Wang, Y . Zhao, X. Hou, Z. Liu, Q. Zou, and H. Wang, “Taintp2x: Detecting taint-style prompt-to-anything injection vulnerabilities in llm-integrated applications,” inProceedings of the IEEE/ACM 48th International Conference on Software Engineering,
[40]

Available: https://doi.org/10.1145/3744916.3773199

[Online]. Available: https://doi.org/10.1145/3744916.3773199

work page doi:10.1145/3744916.3773199
[41]

Vision: Identifying affected library versions for open source software vulnerabilities,

C. Yan, R. Ren, M. H. Meng, L. Wan, T. Y . Ooi, and G. Bai, “Exploring chatgpt app ecosystem: Distribution, deployment and security,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1370–1382. [Online]. Available: https://doi.org...

work page doi:10.1145/3691620.3695510 2024
[42]

Understanding and detecting file knowledge leakage in gpt app ecosystem,

C. Yan, B. Guan, Y . Li, M. H. Meng, L. Wan, and G. Bai, “Understanding and detecting file knowledge leakage in gpt app ecosystem,” inProceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 3831–3839. [Online]. Available: https://doi.org/10.1145/3696410.3714755

work page doi:10.1145/3696410.3714755 2025
[43]

On the (In)Security of LLM app stores,

X. Hou, Y . Zhao, and H. Wang, “ On the (In)Security of LLM App Stores ,” in2025 IEEE Symposium on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 317–335. [Online]. Available: https://doi.ieeecomputersociety.org/ 10.1109/SP61157.2025.00117

work page doi:10.1109/sp61157.2025.00117 2025
[44]

Llm app store analysis: A vision and roadmap,

Y . Zhao, X. Hou, S. Wang, and H. Wang, “Llm app store analysis: A vision and roadmap,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, May 2025. [Online]. Available: https://doi.org/10.1145/3708530

work page doi:10.1145/3708530 2025
[45]

Model context protocol,

Anthropic, “Model context protocol,” https://modelcontextprotocol.io/, 2024, accessed: 2026-07-01

2024
[46]

Agent2agent protocol,

Google, “Agent2agent protocol,” https://a2a-protocol.org/, 2025, ac- cessed: 2026-07-01

2025
[47]

Agent skills,

Anthropic, “Agent skills,” https://agentskills.io/, 2025, accessed: 2026- 07-01

2025
[48]

A characterization study of bugs in LLM agent workflow orchestration frameworks,

Z. Xue, Y . Zhao, S. Wang, K. Chen, and H. Wang, “A characterization study of bugs in LLM agent workflow orchestration frameworks,” in40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025, Seoul, Korea, Republic of, November 16-20, 2025. IEEE, 2025, pp. 3369–3380. [Online]. Available: https://doi.org/10.1109/ASE63991.2025.00278

work page doi:10.1109/ase63991.2025.00278 2025
[49]

An empirical study of agent developer practices in AI agent frameworks,

Y . Wang, X. Xu, J. Chen, T. Bi, W. Gu, and Z. Zheng, “An empirical study of agent developer practices in AI agent frameworks,”CoRR, vol. abs/2512.01939, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2512.01939

work page doi:10.48550/arxiv.2512.01939 2025
[50]

Defining and detecting the defects of large language model-based autonomous agents,

K. Ning, J. Chen, J. Zhang, W. Li, Z. Wang, Y . Feng, W. Zhang, and Z. Zheng, “Defining and detecting the defects of large language model-based autonomous agents,”IEEE Trans. Software Eng., vol. 52, no. 3, pp. 1074–1093, 2026. [Online]. Available: https://doi.org/10.1109/TSE.2026.3658554

work page doi:10.1109/tse.2026.3658554 2026
[51]

Open agent specification: Enabling cross-framework comparison of ai agents,

S. Amini, Y . Benajiba, C. Bernardis, P. Cayet, H. Chafi, A. Fathan, L. Faucon, D. Hilloulin, S. Hong, I. Kossyk, T. Lahiri, T. M. S. Le, R. Patra, S. Ravi, J. Schweizer, J. Singh, S. Singh, W. Sun, K. Talamadupula, and J. Xu, “Open agent specification: Enabling cross-framework comparison of ai agents,” inProceedings of the ACM Conference on AI and Agenti...

work page doi:10.1145/3786335.3813130 2026
[52]

Large language model supply chain: A research agenda,

S. Wang, Y . Zhao, X. Hou, and H. Wang, “Large language model supply chain: A research agenda,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, pp. 147:1–147:46, 2025. [Online]. Available: https://doi.org/10.1145/3708531

work page doi:10.1145/3708531 2025
[53]

Large language model supply chain: Open problems from the security perspective,

Q. Hu, X. Xie, S. Chen, L. Quan, and L. Ma, “Large language model supply chain: Open problems from the security perspective,” inProceedings of the 34th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA Companion 2025, Clarion Hotel Trondheim, Trondheim, Norway, June 25-28, 2025, M. Papadakis, M. B. Cohen, and P. Tonella, Eds. ACM...

work page doi:10.1145/3713081.3731747 2025
[54]

Lifting the veil on composition, risks, and mitigations of the large language model supply chain,

K. Huang, B. Chen, Y . Lu, S. Wu, D. Wang, Y . Huang, H. Jiang, Z. Zhou, J. Cao, and X. Peng, “Lifting the veil on composition, risks, and mitigations of the large language model supply chain,” 2025. [Online]. Available: https://arxiv.org/abs/2410.21218

work page arXiv 2025
[55]

A characterization study of bugs in LLM agent workflow orchestration frameworks,

Z. Shen, J. Dai, Y . Zhang, and M. Yang, “Security debt in LLM agent applications: A measurement study of vulnerabilities and mitigation trade-offs,” in40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025, Seoul, Korea, Republic of, November 16-20, 2025. IEEE, 2025, pp. 559–570. [Online]. Available: https://doi.org/10.1109/AS...

work page doi:10.1109/ase63991.2025.00053 2025
[56]

LLM-Enabled Open-Source Systems in the Wild: An Empirical Study of Vulnerabilities in GitHub Security Advisories

F. T. Shifat, H. Baburaj, C. Zhou, J. Sarker, and M. M. Imran, “Llm-enabled open-source systems in the wild: An empirical study of vulnerabilities in github security advisories,”CoRR, vol. abs/2604.04288, 2026. [Online]. Available: https://doi.org/10.48550/ arXiv.2604.04288

work page internal anchor Pith review Pith/arXiv arXiv 2026
[57]

Sok: Understanding vulnerabilities in the large language model supply chain,

S. Wang, Y . Zhao, Z. Liu, Q. Zou, and H. Wang, “Sok: Understanding vulnerabilities in the large language model supply chain,”CoRR, vol. abs/2502.12497, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2502.12497

work page doi:10.48550/arxiv.2502.12497 2025
[58]

Agentic radar: Security scanner for agentic workflows,

SplxAI, “Agentic radar: Security scanner for agentic workflows,” https: //github.com/splx-ai/agentic-radar, 2026, accessed: 2026-07-01

2026
[59]

AI-BOM: Ai bill of materials scanner,

Trusera, “AI-BOM: Ai bill of materials scanner,” https://github.com/ Trusera/ai-bom, 2026, accessed: 2026-07-01

2026
[60]

Drako Agent BOM,

Drako, “Drako Agent BOM,” https://github.com/DrakoLabs/drako, 2026, accessed: 2026-07-01

2026
[61]

Cisco AI BOM: Ai bill of materials through source code scanning,

Cisco AI Defense, “Cisco AI BOM: Ai bill of materials through source code scanning,” https://github.com/cisco-ai-defense/aibom, 2026, accessed: 2026-07-01

2026
[62]

Iccta: Detecting inter-component privacy leaks in android apps,

L. Li, A. Bartel, T. F. Bissyand ´e, J. Klein, Y . L. Traon, S. Arzt, S. Rasthofer, E. Bodden, D. Octeau, and P. D. McDaniel, “Iccta: Detecting inter-component privacy leaks in android apps,” in37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1, A. Bertolino, G. Canfora, and S. G. Elbaum, ...

work page doi:10.1109/icse.2015.48 2015
[63]

Jasmine: A static analysis framework for spring core technologies,

M. Chen, T. Tu, H. Zhang, Q. Wen, and W. Wang, “Jasmine: A static analysis framework for spring core technologies,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’22. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3556910

work page doi:10.1145/3551349.3556910 2023
[64]

Tai-e: A developer-friendly static analysis framework for java by harnessing the good designs of classics,

T. Tan and Y . Li, “Tai-e: A developer-friendly static analysis framework for java by harnessing the good designs of classics,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 1093–1105. [Online]. Available: https://doi.org/10...

work page doi:10.1145/3597926.3598120 2023
[65]

ARGUS: A framework for staged static taint analysis of github workflows and actions,

S. Muralee, I. Koishybayev, A. Nahapetyan, G. Tystahl, B. Reaves, A. Bianchi, W. Enck, A. Kapravelos, and A. Machiry, “ARGUS: A framework for staged static taint analysis of github workflows and actions,” in32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, J. A. Calandrino and C. Troncoso, Eds. USENIX Association, ...

2023
[66]

Reactappscan: Mining react application vulnerabilities via component graph,

Z. Guo, M. Kang, V . Venkatakrishnan, R. Gjomemo, and Y . Cao, “Reactappscan: Mining react application vulnerabilities via component graph,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 585–599. [Online]. Available: https://doi.or...

work page doi:10.1145/3658644.3670331 2024
[67]

Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis,

C. Wang, R. Ko, Y . Zhang, Y . Yang, and Z. Lin, “Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis,” in 2023 IEEE/ACM 45th International Conference on Software Engineer- ing (ICSE), 2023, pp. 932–944

2023
[68]

Miniscope: Automated ui exploration and privacy inconsistency detection of miniapps via two-phase iterative hybrid analysis,

S. Wang, Y . Li, K. Wang, Y . Liu, H. Li, Y . Liu, and H. Wang, “Miniscope: Automated ui exploration and privacy inconsistency detection of miniapps via two-phase iterative hybrid analysis,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 6, Jul. 2025. [Online]. Available: https://doi.org/10.1145/3709351

work page doi:10.1145/3709351 2025
[69]

Wemint:tainting sensitive data leaks in wechat mini-programs,

S. Meng, L. Wang, S. Wang, K. Wang, X. Xiao, G. Bai, and H. Wang, “Wemint:tainting sensitive data leaks in wechat mini-programs,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 1403–1415

2023

[1] [1]

State of agentic ai security and governance,

OW ASP GenAI Security Project, “State of agentic ai security and governance,” https://genai.owasp.org/resource/ state-of-agentic-ai-security-and-governance/, 2026, accessed: 2026-07- 01

2026

[2] [2]

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

X. Jiang, S. Yang, Z. Li, L. Liu, H. Yu, and Y . Liu, “Chaincaps: Composition-safe tool-using agents via monotonic capability attenuation,” 2026. [Online]. Available: https://arxiv.org/abs/2605.26542

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

S. Wang, X. Hou, Z. Liu, Y . Zhao, X. Cheng, Q. Zou, X. Zhang, and H. Wang, “Demystifying and detecting agentic workflow injection vulnerabilities in github actions,”CoRR, vol. abs/2605.07135, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2605.07135

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.07135 2026

[4] [4]

Langgraph,

LangChain, “Langgraph,” https://github.com/langchain-ai/langgraph, 2026, accessed: 2026-07-01

2026

[5] [5]

Openai agents sdk,

OpenAI, “Openai agents sdk,” https://github.com/openai/ openai-agents-python, 2026, accessed: 2026-07-01

2026

[6] [6]

CrewAI, “Crewai,” https://github.com/crewAIInc/crewAI, 2026, ac- cessed: 2026-07-01

2026

[7] [7]

Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,

S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y . L. Traon, D. Octeau, and P. D. McDaniel, “Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,” inACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, ...

work page doi:10.1145/2594291.2594299 2014

[8] [8]

Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps,

F. Wei, S. Roy, X. Ou, and Robby, “Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps,”ACM Trans. Priv. Secur., vol. 21, no. 3, Apr. 2018. [Online]. Available: https://doi.org/10.1145/3183575

work page doi:10.1145/3183575 2018

[9] [9]

Reducing static analysis unsoundness with approximate interpretation,

M. R. Laursen, W. Xu, and A. Møller, “Reducing static analysis unsoundness with approximate interpretation,”Proc. ACM Program. Lang., vol. 8, no. PLDI, Jun. 2024. [Online]. Available: https: //doi.org/10.1145/3656424

work page doi:10.1145/3656424 2024

[10] [10]

Y ASA: scalable multi-language taint analysis on the unified AST at ant group,

Y . Wang, S. Wang, J. Zhao, S. Shi, T. Li, Y . Cheng, L. Bian, K. Yu, Y . Zhao, and H. Wang, “Y ASA: scalable multi-language taint analysis on the unified AST at ant group,”CoRR, vol. abs/2601.17390, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.17390

work page doi:10.48550/arxiv.2601.17390 2026

[11] [11]

SVF: interprocedural static value-flow analysis in LLVM,

Y . Sui and J. Xue, “SVF: interprocedural static value-flow analysis in LLVM,” inProceedings of the 25th International Conference on Compiler Construction, CC 2016, Barcelona, Spain, March 12-18, 2016, A. Zaks and M. V . Hermenegildo, Eds. ACM, 2016, pp. 265–266. [Online]. Available: https://doi.org/10.1145/2892208.2892235

work page doi:10.1145/2892208.2892235 2016

[12] [12]

"elementary, my dear watson

S. Wang, J. He, Y . Zhao, Y . Wang, K. Yu, and H. Wang, “”elementary, my dear watson.” detecting malicious skills via neuro-symbolic reasoning across heterogeneous artifacts,”CoRR, vol. abs/2603.27204,

work page arXiv

[13] [13]

"elementary, my dear watson

[Online]. Available: https://doi.org/10.48550/arXiv.2603.27204

work page doi:10.48550/arxiv.2603.27204

[14] [14]

Multi-agentic system threat modeling guide v1.0,

OW ASP GenAI Security Project, “Multi-agentic system threat modeling guide v1.0,” https://genai.owasp.org/resource/ multi-agentic-system-threat-modeling-guide-v1-0/, 2025, accessed: 2026-07-01

2025

[15] [15]

Not all ai boms are created equal,

U. Feldman, “Not all ai boms are created equal,” https://www.pillar. security/blog/not-all-ai-boms-are-created-equal, 2025, accessed: 2026- 07-01

2025

[16] [16]

Agentproof: Static verification of agent workflow graphs,

M. Xavier, V . M. A, M. Jolly, and M. Xavier, “Agentproof: Static verification of agent workflow graphs,”CoRR, vol. abs/2603.20356,

work page arXiv

[17] [17]

Agentproof: Static verification of agent workflow graphs,

[Online]. Available: https://doi.org/10.48550/arXiv.2603.20356

work page doi:10.48550/arxiv.2603.20356

[18] [18]

Agent-wiz: A cli tool for threat modeling and visualizing ai agents,

Repello AI, “Agent-wiz: A cli tool for threat modeling and visualizing ai agents,” https://github.com/Repello-AI/Agent-Wiz, 2026, accessed: 2026-07-01

2026

[19] [19]

SPDX 3.0 specification,

SPDX Project, “SPDX 3.0 specification,” https://spdx.github.io/ spdx-spec/v3.0/, 2024, accessed: 2026-07-01

2024

[20] [20]

The minimum elements for a software bill of materials (sbom),

National Telecommunications and Information Administration, “The minimum elements for a software bill of materials (sbom),” https://www. ntia.gov/report/2021/minimum-elements-software-bill-materials-sbom, 2021, accessed: 2026-07-01

2021

[21] [21]

Implementing AI bill of materials (AI BOM) with SPDX 3.0: A comprehensive guide to creating AI and dataset bill of materials,

K. Bennet, G. K. Rajbahadur, A. Suriyawongkul, and K. Stewart, “Implementing AI bill of materials (AI BOM) with SPDX 3.0: A comprehensive guide to creating AI and dataset bill of materials,” CoRR, vol. abs/2504.16743, 2025. [Online]. Available: https://doi.org/ 10.48550/arXiv.2504.16743

work page doi:10.48550/arxiv.2504.16743 2025

[22] [22]

TAIBOM: bringing trustworthiness to ai-enabled systems,

V . Safronov, A. McCaigue, N. Allott, and A. Martin, “TAIBOM: bringing trustworthiness to ai-enabled systems,” inProceedings of the 1st International Workshop on Security and Privacy-Preserving AI/ML co-located with 28th European Conference on Artificial Intelligence (ECAI 2025), Bologna, Italy, October 26th, 2025, ser. CEUR Workshop Proceedings, J. Leich...

2025

[23] [23]

Aibomgen: Generating an AI bill of materials for secure, transparent, and compliant model training,

W. Vandendriessche, J. Thijsman, L. D’hooge, B. V olckaert, and M. Sebrechts, “Aibomgen: Generating an AI bill of materials for secure, transparent, and compliant model training,”CoRR, vol. abs/2601.05703,

work page arXiv

[24] [24]

Aibomgen: Generating an AI bill of materials for secure, transparent, and compliant model training,

[Online]. Available: https://doi.org/10.48550/arXiv.2601.05703

work page doi:10.48550/arxiv.2601.05703

[25] [25]

Towards Security-Auditable LLM Agents: A Unified Graph Representation

C. Li, L. Zhang, J. Zhai, S. Feng, X. Yang, H. Wang, S. Dou, Y . Ji, Y . Hu, Y . Wu, Y . Liu, and D. Zou, “Towards security-auditable LLM agents: A unified graph representation,”CoRR, vol. abs/2605.06812,

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Towards Security-Auditable LLM Agents: A Unified Graph Representation

[Online]. Available: https://doi.org/10.48550/arXiv.2605.06812

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.06812

[27] [27]

AgentRiskBOM: A Risk-Scoping Security Bill of Materials for Agentic AI Systems

S. Dutta and A. K. Moharir, “Agentriskbom: A risk-scoping security bill of materials for agentic ai systems,” 2026. [Online]. Available: https://arxiv.org/abs/2606.21877

work page internal anchor Pith review Pith/arXiv arXiv 2026

[28] [28]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

X. Hou, Y . Zhao, S. Wang, and H. Wang, “Model context protocol (MCP): landscape, security threats, and future research directions,”CoRR, vol. abs/2503.23278, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2503.23278

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.23278 2025

[29] [29]

Mcptox: A benchmark for tool poisoning on real-world MCP servers,

Z. Wang, Y . Gao, Y . Wang, S. Liu, H. Sun, H. Cheng, G. Shi, H. Du, and X. Li, “Mcptox: A benchmark for tool poisoning on real-world MCP servers,” inFortieth AAAI Conference on Artificial Intelligence, Thirty- Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Advances in Artificial Intelligence, A...

work page doi:10.1609/aaai.v40i42.40895 2026

[30] [30]

Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

X. Hou, Y . Zhao, and H. Wang, “Unsafe by flow: Uncovering bidirec- tional data-flow risks in MCP ecosystem,”CoRR, vol. abs/2605.07836,

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

[Online]. Available: https://doi.org/10.48550/arXiv.2605.07836

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.07836

[32] [32]

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

J. Wu, Y . Nan, Y . Lin, H. Wang, Y . Xiao, S. Wang, and Z. Zheng, “Skillscope: Toward fine-grained least-privilege enforcement for agent skills,”CoRR, vol. abs/2605.05868, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2605.05868

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.05868 2026

[33] [33]

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills

W. Guo, W. Zeng, C. Liu, X. Jia, Y . Xu, L. Tang, Y . Fang, and Y . Liu, “Malskillbench: A runtime-verified benchmark of malicious agent skills,” 2026. [Online]. Available: https://arxiv.org/abs/2606.07131

work page internal anchor Pith review Pith/arXiv arXiv 2026

[34] [34]

"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

Y . Liu, Z. Chen, Y . Zhang, G. Deng, Y . Li, J. Ning, and L. Y . Zhang, “Malicious agent skills in the wild: A large-scale security empirical study,”CoRR, vol. abs/2602.06547, 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2602.06547

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.06547 2026

[35] [35]

Chainfuzzer: Greybox fuzzing for workflow-level multi-tool vulnerabilities in LLM agents,

J. Wu, Z. Yao, Y . Nan, and Z. Zheng, “Chainfuzzer: Greybox fuzzing for workflow-level multi-tool vulnerabilities in LLM agents,” CoRR, vol. abs/2603.12614, 2026. [Online]. Available: https://doi.org/ 10.48550/arXiv.2603.12614

work page doi:10.48550/arxiv.2603.12614 2026

[36] [36]

Demystifying rce vulnerabilities in llm-integrated apps,

T. Liu, Z. Deng, G. Meng, Y . Li, and K. Chen, “Demystifying rce vulnerabilities in llm-integrated apps,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1716–1730. [Online]. Available: https://doi.org/10.1145/3658644.3690338

work page doi:10.1145/3658644.3690338 2024

[37] [37]

Make agent defeat agent: Automatic detection of taint-style vulnerabilities in llm-based agents,

F. Liu, Y . Zhang, J. Luo, J. Dai, T. Chen, L. Yuan, Z. Yu, Y . Shi, K. Li, C. Zhou, H. Chen, and M. Yang, “Make agent defeat agent: Automatic detection of taint-style vulnerabilities in llm-based agents,” in34th USENIX Security Symposium, USENIX Security 2025, Seattle, WA, USA, August 13-15, 2025, L. Bauer and G. Pellegrino, Eds. USENIX Association, 2025...

2025

[38] [38]

Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and Defenses ,

R. Pedro, M. E. Coimbra, D. Castro, P. Carreira, and N. Santos, “ Prompt-to-SQL Injections in LLM-Integrated Web Applications: Risks and Defenses ,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 1768–1780. [Online]. Available: https://doi.ieeecomputersociety.org/10...

work page doi:10.1109/icse55347.2025.00007 2025

[39] [39]

Taintp2x: Detecting taint-style prompt-to-anything injection vulnerabilities in llm-integrated applications,

J. He, S. Wang, Y . Zhao, X. Hou, Z. Liu, Q. Zou, and H. Wang, “Taintp2x: Detecting taint-style prompt-to-anything injection vulnerabilities in llm-integrated applications,” inProceedings of the IEEE/ACM 48th International Conference on Software Engineering,

[40] [40]

Available: https://doi.org/10.1145/3744916.3773199

[Online]. Available: https://doi.org/10.1145/3744916.3773199

work page doi:10.1145/3744916.3773199

[41] [41]

Vision: Identifying affected library versions for open source software vulnerabilities,

C. Yan, R. Ren, M. H. Meng, L. Wan, T. Y . Ooi, and G. Bai, “Exploring chatgpt app ecosystem: Distribution, deployment and security,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1370–1382. [Online]. Available: https://doi.org...

work page doi:10.1145/3691620.3695510 2024

[42] [42]

Understanding and detecting file knowledge leakage in gpt app ecosystem,

C. Yan, B. Guan, Y . Li, M. H. Meng, L. Wan, and G. Bai, “Understanding and detecting file knowledge leakage in gpt app ecosystem,” inProceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 3831–3839. [Online]. Available: https://doi.org/10.1145/3696410.3714755

work page doi:10.1145/3696410.3714755 2025

[43] [43]

On the (In)Security of LLM app stores,

X. Hou, Y . Zhao, and H. Wang, “ On the (In)Security of LLM App Stores ,” in2025 IEEE Symposium on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 317–335. [Online]. Available: https://doi.ieeecomputersociety.org/ 10.1109/SP61157.2025.00117

work page doi:10.1109/sp61157.2025.00117 2025

[44] [44]

Llm app store analysis: A vision and roadmap,

Y . Zhao, X. Hou, S. Wang, and H. Wang, “Llm app store analysis: A vision and roadmap,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, May 2025. [Online]. Available: https://doi.org/10.1145/3708530

work page doi:10.1145/3708530 2025

[45] [45]

Model context protocol,

Anthropic, “Model context protocol,” https://modelcontextprotocol.io/, 2024, accessed: 2026-07-01

2024

[46] [46]

Agent2agent protocol,

Google, “Agent2agent protocol,” https://a2a-protocol.org/, 2025, ac- cessed: 2026-07-01

2025

[47] [47]

Agent skills,

Anthropic, “Agent skills,” https://agentskills.io/, 2025, accessed: 2026- 07-01

2025

[48] [48]

A characterization study of bugs in LLM agent workflow orchestration frameworks,

Z. Xue, Y . Zhao, S. Wang, K. Chen, and H. Wang, “A characterization study of bugs in LLM agent workflow orchestration frameworks,” in40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025, Seoul, Korea, Republic of, November 16-20, 2025. IEEE, 2025, pp. 3369–3380. [Online]. Available: https://doi.org/10.1109/ASE63991.2025.00278

work page doi:10.1109/ase63991.2025.00278 2025

[49] [49]

An empirical study of agent developer practices in AI agent frameworks,

Y . Wang, X. Xu, J. Chen, T. Bi, W. Gu, and Z. Zheng, “An empirical study of agent developer practices in AI agent frameworks,”CoRR, vol. abs/2512.01939, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2512.01939

work page doi:10.48550/arxiv.2512.01939 2025

[50] [50]

Defining and detecting the defects of large language model-based autonomous agents,

K. Ning, J. Chen, J. Zhang, W. Li, Z. Wang, Y . Feng, W. Zhang, and Z. Zheng, “Defining and detecting the defects of large language model-based autonomous agents,”IEEE Trans. Software Eng., vol. 52, no. 3, pp. 1074–1093, 2026. [Online]. Available: https://doi.org/10.1109/TSE.2026.3658554

work page doi:10.1109/tse.2026.3658554 2026

[51] [51]

Open agent specification: Enabling cross-framework comparison of ai agents,

S. Amini, Y . Benajiba, C. Bernardis, P. Cayet, H. Chafi, A. Fathan, L. Faucon, D. Hilloulin, S. Hong, I. Kossyk, T. Lahiri, T. M. S. Le, R. Patra, S. Ravi, J. Schweizer, J. Singh, S. Singh, W. Sun, K. Talamadupula, and J. Xu, “Open agent specification: Enabling cross-framework comparison of ai agents,” inProceedings of the ACM Conference on AI and Agenti...

work page doi:10.1145/3786335.3813130 2026

[52] [52]

Large language model supply chain: A research agenda,

S. Wang, Y . Zhao, X. Hou, and H. Wang, “Large language model supply chain: A research agenda,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 5, pp. 147:1–147:46, 2025. [Online]. Available: https://doi.org/10.1145/3708531

work page doi:10.1145/3708531 2025

[53] [53]

Large language model supply chain: Open problems from the security perspective,

Q. Hu, X. Xie, S. Chen, L. Quan, and L. Ma, “Large language model supply chain: Open problems from the security perspective,” inProceedings of the 34th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA Companion 2025, Clarion Hotel Trondheim, Trondheim, Norway, June 25-28, 2025, M. Papadakis, M. B. Cohen, and P. Tonella, Eds. ACM...

work page doi:10.1145/3713081.3731747 2025

[54] [54]

Lifting the veil on composition, risks, and mitigations of the large language model supply chain,

K. Huang, B. Chen, Y . Lu, S. Wu, D. Wang, Y . Huang, H. Jiang, Z. Zhou, J. Cao, and X. Peng, “Lifting the veil on composition, risks, and mitigations of the large language model supply chain,” 2025. [Online]. Available: https://arxiv.org/abs/2410.21218

work page arXiv 2025

[55] [55]

A characterization study of bugs in LLM agent workflow orchestration frameworks,

Z. Shen, J. Dai, Y . Zhang, and M. Yang, “Security debt in LLM agent applications: A measurement study of vulnerabilities and mitigation trade-offs,” in40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025, Seoul, Korea, Republic of, November 16-20, 2025. IEEE, 2025, pp. 559–570. [Online]. Available: https://doi.org/10.1109/AS...

work page doi:10.1109/ase63991.2025.00053 2025

[56] [56]

LLM-Enabled Open-Source Systems in the Wild: An Empirical Study of Vulnerabilities in GitHub Security Advisories

F. T. Shifat, H. Baburaj, C. Zhou, J. Sarker, and M. M. Imran, “Llm-enabled open-source systems in the wild: An empirical study of vulnerabilities in github security advisories,”CoRR, vol. abs/2604.04288, 2026. [Online]. Available: https://doi.org/10.48550/ arXiv.2604.04288

work page internal anchor Pith review Pith/arXiv arXiv 2026

[57] [57]

Sok: Understanding vulnerabilities in the large language model supply chain,

S. Wang, Y . Zhao, Z. Liu, Q. Zou, and H. Wang, “Sok: Understanding vulnerabilities in the large language model supply chain,”CoRR, vol. abs/2502.12497, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2502.12497

work page doi:10.48550/arxiv.2502.12497 2025

[58] [58]

Agentic radar: Security scanner for agentic workflows,

SplxAI, “Agentic radar: Security scanner for agentic workflows,” https: //github.com/splx-ai/agentic-radar, 2026, accessed: 2026-07-01

2026

[59] [59]

AI-BOM: Ai bill of materials scanner,

Trusera, “AI-BOM: Ai bill of materials scanner,” https://github.com/ Trusera/ai-bom, 2026, accessed: 2026-07-01

2026

[60] [60]

Drako Agent BOM,

Drako, “Drako Agent BOM,” https://github.com/DrakoLabs/drako, 2026, accessed: 2026-07-01

2026

[61] [61]

Cisco AI BOM: Ai bill of materials through source code scanning,

Cisco AI Defense, “Cisco AI BOM: Ai bill of materials through source code scanning,” https://github.com/cisco-ai-defense/aibom, 2026, accessed: 2026-07-01

2026

[62] [62]

Iccta: Detecting inter-component privacy leaks in android apps,

L. Li, A. Bartel, T. F. Bissyand ´e, J. Klein, Y . L. Traon, S. Arzt, S. Rasthofer, E. Bodden, D. Octeau, and P. D. McDaniel, “Iccta: Detecting inter-component privacy leaks in android apps,” in37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1, A. Bertolino, G. Canfora, and S. G. Elbaum, ...

work page doi:10.1109/icse.2015.48 2015

[63] [63]

Jasmine: A static analysis framework for spring core technologies,

M. Chen, T. Tu, H. Zhang, Q. Wen, and W. Wang, “Jasmine: A static analysis framework for spring core technologies,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’22. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3556910

work page doi:10.1145/3551349.3556910 2023

[64] [64]

Tai-e: A developer-friendly static analysis framework for java by harnessing the good designs of classics,

T. Tan and Y . Li, “Tai-e: A developer-friendly static analysis framework for java by harnessing the good designs of classics,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 1093–1105. [Online]. Available: https://doi.org/10...

work page doi:10.1145/3597926.3598120 2023

[65] [65]

ARGUS: A framework for staged static taint analysis of github workflows and actions,

S. Muralee, I. Koishybayev, A. Nahapetyan, G. Tystahl, B. Reaves, A. Bianchi, W. Enck, A. Kapravelos, and A. Machiry, “ARGUS: A framework for staged static taint analysis of github workflows and actions,” in32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, J. A. Calandrino and C. Troncoso, Eds. USENIX Association, ...

2023

[66] [66]

Reactappscan: Mining react application vulnerabilities via component graph,

Z. Guo, M. Kang, V . Venkatakrishnan, R. Gjomemo, and Y . Cao, “Reactappscan: Mining react application vulnerabilities via component graph,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 585–599. [Online]. Available: https://doi.or...

work page doi:10.1145/3658644.3670331 2024

[67] [67]

Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis,

C. Wang, R. Ko, Y . Zhang, Y . Yang, and Z. Lin, “Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis,” in 2023 IEEE/ACM 45th International Conference on Software Engineer- ing (ICSE), 2023, pp. 932–944

2023

[68] [68]

Miniscope: Automated ui exploration and privacy inconsistency detection of miniapps via two-phase iterative hybrid analysis,

S. Wang, Y . Li, K. Wang, Y . Liu, H. Li, Y . Liu, and H. Wang, “Miniscope: Automated ui exploration and privacy inconsistency detection of miniapps via two-phase iterative hybrid analysis,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 6, Jul. 2025. [Online]. Available: https://doi.org/10.1145/3709351

work page doi:10.1145/3709351 2025

[69] [69]

Wemint:tainting sensitive data leaks in wechat mini-programs,

S. Meng, L. Wang, S. Wang, K. Wang, X. Xiao, G. Bai, and H. Wang, “Wemint:tainting sensitive data leaks in wechat mini-programs,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 1403–1415

2023