pith. machine review for the scientific record. sign in

arxiv: 2604.02837 · v1 · submitted 2026-04-03 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:56 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords Agent Skillssecurity analysisthreat taxonomyLLM agentsagent securityframework vulnerabilitiesmarketplace securitysupply chain attacks
0
0 comments X

The pith

Agent Skills frameworks carry structural security threats from missing data-instruction boundaries, single-approval trust, and unchecked marketplaces that incremental patches cannot resolve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper maps the full lifecycle of an Agent Skill through creation, distribution, deployment, and execution to expose where each phase opens attack surfaces. It organizes those surfaces into a taxonomy of seven categories and seventeen scenarios across three layers, then checks the taxonomy against five confirmed incidents. The central finding is that the framework's built-in choices—no separation between data and instructions, a trust model that approves once and keeps that approval, and marketplaces without required security checks—create risks that stay even after ordinary fixes. A reader should care because Agent Skills already spreads across multiple agent platforms and community stores, so these properties could scale exposure to prompt-style and supply-chain attacks. The work ends by listing defense directions and open challenges that follow from treating the flaws as architectural rather than local.

Core claim

The Agent Skills architecture creates attack surfaces in every phase of its lifecycle, and the resulting threat taxonomy of seven categories and seventeen scenarios shows that the gravest risks come from three structural properties: the absence of a data-instruction boundary, reliance on a single-approval persistent trust model, and the lack of mandatory marketplace security review. These properties are confirmed by analysis of five real incidents and cannot be removed by incremental mitigations alone.

What carries the argument

The four-phase lifecycle analysis that feeds the seven-category, seventeen-scenario threat taxonomy organized into three attack layers.

If this is right

  • Any defense must target the three structural properties directly rather than adding layers around them.
  • Marketplaces will need mandatory security review processes before distribution can be considered safe.
  • Agent platforms must enforce data-instruction separation at execution time to close the largest class of threats.
  • The taxonomy supplies a checklist that stakeholders can use to audit existing and future skills.
  • Research on redesigning the trust and boundary mechanisms is required before the framework can support high-stakes uses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same boundary and trust issues likely appear in other modular skill or tool-packaging systems for LLM agents, suggesting a pattern worth checking across frameworks.
  • Expanding the incident sample beyond the five reported cases could test whether the taxonomy needs additional categories.
  • Requiring marketplace review would trade some speed of skill adoption for lower risk, an explicit cost that future designs must weigh.
  • Without data-instruction separation, skills could serve as persistent vectors for prompt injection across sessions, an implication that extends to any agent that loads external modules.

Load-bearing premise

The architectural analysis captures the complete attack surface and the five incidents are representative enough to confirm the taxonomy covers the main risks.

What would settle it

An implemented mitigation that removes the identified threats while preserving the current data-instruction handling, single-approval trust model, and voluntary marketplace review would show the structural claim is incorrect.

Figures

Figures reproduced from arXiv: 2604.02837 by Jingzheng Wu, Tianyue Luo, Xiang Ling, Xing Cui, Zhiyuan Li.

Figure 1
Figure 1. Figure 1: The Agent Skills architecture. Left: the filesystem layout of a Skill package within the agent’s virtual [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Agent Skills lifecycle and threat taxonomy. The horizontal axis represents the four lifecycle phases; [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The MedusaLocker Ransomware attack. The user-visible layer shows normal GIF creation behavior, [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
read the original abstract

Agent Skills is an emerging open standard that defines a modular, filesystem-based packaging format enabling LLM-based agents to acquire domain-specific expertise on demand. Despite rapid adoption across multiple agentic platforms and the emergence of large community marketplaces, the security properties of Agent Skills have not been systematically studied. This paper presents the first comprehensive security analysis of the Agent Skills framework. We define the full lifecycle of an Agent Skill across four phases -- Creation, Distribution, Deployment, and Execution -- and identify the structural attack surface each phase introduces. Building on this lifecycle analysis, we construct a threat taxonomy comprising seven categories and seventeen scenarios organized across three attack layers, grounded in both architectural analysis and real-world evidence. We validate the taxonomy through analysis of five confirmed security incidents in the Agent Skills ecosystem. Based on these findings, we discuss defense directions for each threat category, identify open research challenges, and provide actionable recommendations for stakeholders. Our analysis reveals that the most severe threats arise from structural properties of the framework itself, including the absence of a data-instruction boundary, a single-approval persistent trust model, and the lack of mandatory marketplace security review, and cannot be addressed through incremental mitigations alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents the first comprehensive security analysis of the Agent Skills open standard for LLM-based agents. It defines a four-phase lifecycle (Creation, Distribution, Deployment, Execution), derives a threat taxonomy with seven categories and seventeen scenarios across three attack layers, validates it against five real-world incidents, and concludes that severe threats stem from inherent structural properties such as the absence of a data-instruction boundary and a single-approval trust model, which cannot be mitigated incrementally.

Significance. If the taxonomy holds, this establishes a foundational reference for security in agent skill ecosystems by linking architectural properties directly to threat categories and real incidents. It highlights the limits of incremental defenses and provides concrete recommendations for platforms and marketplaces, filling a gap in the literature on emerging agent standards.

major comments (1)
  1. [Validation section] Validation section: The taxonomy is validated by mapping five confirmed incidents to the seven categories, but the manuscript provides no explicit argument or enumeration showing why these incidents are representative of the full attack surface (e.g., unexamined marketplace or deployment scenarios). This weakens the load-bearing claim that the identified structural properties produce the most severe threats and cannot be addressed incrementally, as additional vectors could alter that assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive comment on the validation section. The feedback correctly identifies an opportunity to strengthen the link between the selected incidents and the broader claim about structural properties. We will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Validation section] Validation section: The taxonomy is validated by mapping five confirmed incidents to the seven categories, but the manuscript provides no explicit argument or enumeration showing why these incidents are representative of the full attack surface (e.g., unexamined marketplace or deployment scenarios). This weakens the load-bearing claim that the identified structural properties produce the most severe threats and cannot be addressed incrementally, as additional vectors could alter that assessment.

    Authors: We agree that an explicit argument for representativeness was not sufficiently articulated. The five incidents were chosen because they collectively instantiate all seven threat categories and touch every phase of the four-phase lifecycle (Creation, Distribution, Deployment, Execution). In the revised version we will insert a new subsection (Validation: Coverage and Representativeness) that (1) provides a table mapping each incident to the specific categories and layers it exercises, (2) enumerates the marketplace and deployment vectors covered (including community marketplaces and production agent platforms), and (3) explains why the core structural weaknesses—absence of a data-instruction boundary, single-approval persistent trust, and lack of mandatory review—manifest across both examined and unexamined scenarios. We will also note remaining gaps (e.g., certain proprietary deployment environments) and argue that the structural nature of the threats makes it unlikely that additional vectors would invalidate the conclusion that incremental defenses are insufficient. This revision directly addresses the concern without altering the paper’s central claims. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper constructs its threat taxonomy directly from an independent architectural analysis of the four lifecycle phases (Creation, Distribution, Deployment, Execution) and grounds the seven categories and seventeen scenarios in external real-world incidents. No derivations, equations, or claims reduce by construction to fitted parameters, self-citations, or imported ansatzes; the conclusion that structural properties resist incremental mitigations follows from the defined attack surfaces and observed evidence without self-referential loops or load-bearing prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on domain assumptions about the Agent Skills architecture and the representativeness of observed incidents rather than on fitted parameters or new postulated entities.

axioms (2)
  • domain assumption Agent Skills defines a modular filesystem-based packaging format with a four-phase lifecycle of Creation, Distribution, Deployment, and Execution.
    This lifecycle is used to identify the structural attack surface in each phase.
  • domain assumption The framework exhibits absence of a data-instruction boundary, single-approval persistent trust, and lack of mandatory marketplace security review.
    These properties are identified as the root causes of the most severe threats.

pith-pipeline@v0.9.0 · 5511 in / 1401 out tokens · 48363 ms · 2026-05-13T19:56:57.198911+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sealing the Audit-Runtime Gap for LLM Skills

    cs.CR 2026-05 unverdicted novelty 7.0

    SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

  2. AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

    cs.CR 2026-05 conditional novelty 6.0

    AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.

  3. SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

    cs.CR 2026-05 unverdicted novelty 6.0

    SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.

  4. From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

    cs.CL 2026-04 unverdicted novelty 6.0

    SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 4 Pith papers · 5 internal anchors

  1. [1]

    The rise and potential of large language model based agents: A survey,

    Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhouet al., “The rise and potential of large language model based agents: A survey, ”Science China Information Sciences, vol. 68, no. 2, p. 121101, 2025

  2. [2]

    A survey on large language model based autonomous agents,

    L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Linet al., “A survey on large language model based autonomous agents, ”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

  3. [3]

    ChatGPT plugins,

    OpenAI, “ChatGPT plugins, ” https://openai.com/blog/chatgpt-plugins, 2023

  4. [4]

    Model context protocol (mcp): Landscape, security threats, and future research directions,

    X. Hou, Y. Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research directions, ”ACM Transactions on Software Engineering and Methodology, 2025

  5. [5]

    Agent Skills: Claude code documentation,

    Anthropic, “Agent Skills: Claude code documentation, ” https://docs.anthropic.com/en/docs/claude-code/skills, 2025

  6. [6]

    Agent Skills | Cursor docs,

    Cursor, “Agent Skills | Cursor docs, ” https://cursor.com/docs/context/skills, 2025

  7. [7]

    About agent Skills — GitHub Copilot documentation,

    GitHub, “About agent Skills — GitHub Copilot documentation, ” https://docs.github.com/en/copilot/concepts/agents/ about-agent-skills, 2025

  8. [8]

    Agent Skills | Gemini CLI documentation,

    Google, “Agent Skills | Gemini CLI documentation, ” https://geminicli.com/docs/cli/skills/, 2026

  9. [9]

    Cato CTRL threat research: From productivity boost to ransomware nightmare — weaponizing Claude Skills with MedusaLocker,

    I. Cherny, “Cato CTRL threat research: From productivity boost to ransomware nightmare — weaponizing Claude Skills with MedusaLocker, ” https://www.catonetworks.com/blog/cato-ctrl-weaponizing-claude-skills-with-medusalocker/, 2025

  10. [10]

    Malicious agent skills in the wild: A large-scale security empirical study.arXiv preprint arXiv:2602.06547, 2026

    Y. Liu, Z. Chen, Y. Zhang, G. Deng, Y. Li, J. Ning, and L. Y. Zhang, “Malicious agent skills in the wild: A large-scale security empirical study, ”arXiv preprint arXiv:2602.06547, 2026

  11. [11]

    Snyk Security Research, “Snyk finds prompt injection in 36

  12. [12]

    Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

    Y. Liu, W. Wang, R. Feng, Y. Zhang, G. Xu, G. Deng, Y. Li, and L. Zhang, “Agent skills in the wild: An empirical study of security vulnerabilities at scale, ”arXiv preprint arXiv:2601.10338, 2026

  13. [13]

    Agent skills enable a new class of realistic and trivially simple prompt injections,

    D. Schmotz, S. Abdelnabi, and M. Andriushchenko, “Agent skills enable a new class of realistic and trivially simple prompt injections, ”arXiv preprint arXiv:2510.26328, 2025

  14. [14]

    Skill-inject: Measuring agent vulnerability to skill file attacks.arXiv preprint arXiv:2602.20156, 2026

    D. Schmotz, L. Beurer-Kellner, S. Abdelnabi, and M. Andriushchenko, “Skill-inject: Measuring agent vulnerability to skill file attacks, ”arXiv preprint arXiv:2602.20156, 2026

  15. [15]

    Use skills in claude,

    Anthropic, “Use skills in claude, ” https://support.claude.com/en/articles/12512180-use-skills-in-claude, 2025

  16. [16]

    ChatGPT plugin review: Lessons learned,

    CustomGPT, “ChatGPT plugin review: Lessons learned, ” https://customgpt.ai/chatgpt-plugin-review/, 2023

  17. [17]

    ChatGPT plugins are no more,

    DataCamp, “ChatGPT plugins are no more, ” https://www.datacamp.com/blog/best-chat-gpt-plugins, 2024

  18. [18]

    Introducing the model context protocol,

    Anthropic, “Introducing the model context protocol, ” https://www.anthropic.com/news/model-context-protocol, 2024

  19. [19]

    AI agent supply chain risk: Silent codebase exfiltration via skills,

    Mitiga Labs, “AI agent supply chain risk: Silent codebase exfiltration via skills, ” https://www.mitiga.io/blog/ai-agent- supply-chain-risk-silent-codebase-exfiltration-via-skills, 2026

  20. [20]

    OpenClaw’s 230 malicious skills: What agentic AI supply chains teach us about the need to evolve identity security,

    Authmind, “OpenClaw’s 230 malicious skills: What agentic AI supply chains teach us about the need to evolve identity security, ” https://www.authmind.com/blogs/openclaw-malicious-skills-agentic-ai-supply-chain, 2026

  21. [21]

    Agent skills threat model,

    SafeDep Team, “Agent skills threat model, ” https://safedep.io/agent-skills-threat-model, 2026

  22. [22]

    Sok: Taxonomy of attacks on open-source software supply chains,

    P. Ladisa, H. Plate, M. Martinez, and O. Barais, “Sok: Taxonomy of attacks on open-source software supply chains, ” in 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023, pp. 1509–1526

  23. [23]

    Agent skills spreading hallucinated npx commands,

    Aikido Security, “Agent skills spreading hallucinated npx commands, ” https://www.aikido.dev/blog/agent-skills- spreading-hallucinated-npx-commands, 2026

  24. [24]

    OWASP top 10 for LLM apps & gen AI agentic security initiative,

    OWASP GenAI Security Project, “OWASP top 10 for LLM apps & gen AI agentic security initiative, ” https://genai. owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/, 2025

  25. [25]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection,

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection, ” inProceedings of the 16th ACM workshop on J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026. 111:26 Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui, an...

  26. [26]

    Agent skills: Explore security threats and controls,

    Red Hat Developer, “Agent skills: Explore security threats and controls, ” https://developers.redhat.com/articles/2026/ 03/10/agent-skills-explore-security-threats-and-controls, 2026

  27. [27]

    Skill scanner: Security analysis of AI agent skills,

    Cisco AI Defense, “Skill scanner: Security analysis of AI agent skills, ” https://github.com/cisco-ai-defense/skill-scanner, 2026

  28. [28]

    Caught in the hook: RCE and API token exfiltration through Claude Code project files,

    A. Donenfeld and O. Vanunu, “Caught in the hook: RCE and API token exfiltration through Claude Code project files, ” https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve- 2025-59536/, 2026

  29. [29]

    Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

    Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases, ”Advances in Neural Information Processing Systems, vol. 37, pp. 130 185–130 213, 2024

  30. [30]

    Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

    D. Lee and M. Tiwari, “Prompt infection: Llm-to-llm prompt injection within multi-agent systems, ”arXiv preprint arXiv:2410.07283, 2024

  31. [31]

    Trends and lessons from three years fighting malicious extensions,

    N. Jagpal, E. Dingle, J.-P. Gravel, P. Mavrommatis, N. Provos, M. A. Rajab, and K. Thomas, “Trends and lessons from three years fighting malicious extensions, ” in24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 579–593

  32. [32]

    {StruQ}: Defending against prompt injection with structured queries,

    S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “{StruQ}: Defending against prompt injection with structured queries, ” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2383–2400

  33. [33]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instructions, ”arXiv preprint arXiv:2404.13208, 2024

  34. [34]

    A survey on trustworthy llm agents: Threats and countermeasures,

    M. Yu, F. Meng, X. Zhou, S. Wang, J. Mao, L. Pan, T. Chen, K. Wang, X. Li, Y. Zhanget al., “A survey on trustworthy llm agents: Threats and countermeasures, ” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 6216–6226

  35. [35]

    in-toto: Providing farm-to-table guarantees for bits and bytes,

    S. Torres-Arias, H. Afzali, T. K. Kuppusamy, R. Curtmola, and J. Cappos, “in-toto: Providing farm-to-table guarantees for bits and bytes, ” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 1393–1410

  36. [36]

    Snyk Agent Scan: Security scanner for AI agents, MCP servers and agent skills,

    Snyk, “Snyk Agent Scan: Security scanner for AI agents, MCP servers and agent skills, ” https://github.com/snyk/agent- scan, 2026

  37. [37]

    Skill scanner: Security scanner for agent skills,

    Cisco AI Defense, “Skill scanner: Security scanner for agent skills, ” https://github.com/cisco-ai-defense/skill-scanner, 2026

  38. [38]

    Jailbroken: How does llm safety training fail?

    A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does llm safety training fail?”Advances in neural information processing systems, vol. 36, pp. 80 079–80 110, 2023

  39. [39]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models, ”arXiv preprint arXiv:2307.15043, 2023

  40. [40]

    Jailbreaking black box large language models in twenty queries,

    P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries, ” in2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, 2025, pp. 23–42

  41. [41]

    Tree of attacks: Jailbreaking black-box llms automatically,

    A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box llms automatically, ”Advances in Neural Information Processing Systems, vol. 37, pp. 61 065–61 105, 2024

  42. [42]

    Ignore previous prompt: Attack techniques for language models,

    F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models, ” inNeurIPS ML Safety Workshop, 2022

  43. [43]

    Promptlocate: Localizing prompt injection attacks,

    Y. Jia, Y. Liu, Z. Shao, J. Jia, and N. Gong, “Promptlocate: Localizing prompt injection attacks, ”arXiv preprint arXiv:2510.12252, 2025

  44. [44]

    Extracting training data from large language models,

    N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., “Extracting training data from large language models, ” in30th USENIX security symposium (USENIX Security 21), 2021, pp. 2633–2650

  45. [45]

    Hulk: Eliciting malicious behavior in browser extensions,

    A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna, and V. Paxson, “Hulk: Eliciting malicious behavior in browser extensions, ” in23rd USENIX Security Symposium (USENIX Security 14), 2014, pp. 641–654

  46. [46]

    Developers are victims too: A comprehensive analysis of the vs code extension ecosystem,

    S. Edirimannage, C. Elvitigala, A. K. K. Don, W. Daluwatta, P. Wijesekara, and I. Khalil, “Developers are victims too: A comprehensive analysis of the vs code extension ecosystem, ”arXiv preprint arXiv:2411.07479, 2024

  47. [47]

    Backstabber’s knife collection: A review of open source software supply chain attacks,

    M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife collection: A review of open source software supply chain attacks, ” inInternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 2020, pp. 23–43

  48. [48]

    Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with SUN- BURST backdoor,

    Mandiant, “Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with SUN- BURST backdoor, ” https://cloud.google.com/blog/topics/threat-intelligence/evasive-attacker-leverages-solarwinds- supply-chain-compromises-with-sunburst-backdoor, dec 2020

  49. [49]

    backdoor in upstream xz/liblzma leading to ssh server compromise,

    A. Freund, “backdoor in upstream xz/liblzma leading to ssh server compromise, ” https://www.openwall.com/lists/oss- security/2024/03/29/4, mar 2024

  50. [50]

    Dependency confusion: How I hacked into Apple, Microsoft and dozens of other companies,

    A. Birsan, “Dependency confusion: How I hacked into Apple, Microsoft and dozens of other companies, ”Medium, 2021

  51. [51]

    Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents.arXiv preprintarXiv:2411.09523, 2024

    Y. Gan, Y. Yang, Z. Ma, P. He, R. Zeng, Y. Wang, Q. Li, C. Zhou, S. Li, T. Wanget al., “Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents, ”arXiv preprint arXiv:2411.09523, 2024. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Secu...

  52. [52]

    The emerged security and privacy of llm agent: A survey with case studies,

    F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of llm agent: A survey with case studies, ”ACM Computing Surveys, vol. 58, no. 6, pp. 1–36, 2025

  53. [53]

    Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

    S. Datta, S. K. Nahin, A. Chhabra, and P. Mohapatra, “Agentic ai security: Threats, defenses, evaluation, and open challenges, ”arXiv preprint arXiv:2510.23883, 2025

  54. [54]

    Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents,

    Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents, ” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10 471–10 506

  55. [55]

    Formalizing and benchmarking prompt injection attacks and defenses,

    Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses, ” in 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 1831–1847

  56. [56]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents,

    E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, and F. Tramèr, “Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents, ”Advances in Neural Information Processing Systems, vol. 37, pp. 82 895–82 920, 2024

  57. [57]

    Watch out for your agents! investigating backdoor threats to llm-based agents,

    W. Yang, X. Bi, Y. Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to llm-based agents, ”Advances in Neural Information Processing Systems, vol. 37, pp. 100 938–100 964, 2024. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2026