"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors
Pith reviewed 2026-05-18 13:16 UTC · model grok-4.3
The pith
Prompt injection attacks can hijack agentic AI coding editors by poisoning external resources to execute malicious commands.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By poisoning external resources with hidden instructions, attackers can remotely hijack the AI agents inside high-privilege coding editors, turning them into shells that execute malicious commands; large-scale tests with AIShellJack confirm this works at rates up to 84 percent across initial access, discovery, credential theft, and exfiltration objectives on GitHub Copilot and Cursor.
What carries the argument
AIShellJack, an automated testing framework that supplies 314 unique prompt injection payloads covering 70 MITRE ATT&CK techniques to evaluate how external resources can influence AI agent behavior in coding editors.
If this is right
- Attackers gain remote initial access to development environments through the compromised AI without needing direct interaction.
- System discovery, credential theft, and data exfiltration become achievable objectives via the hijacked agent.
- The attacks succeed against real editors that grant terminal and system-level privileges for coding tasks.
- Common external resources such as code files or documentation can serve as vectors for the injection.
Where Pith is reading between the lines
- Developers using these editors might reduce risk by reviewing AI-proposed actions before execution or restricting access to untrusted external files.
- Similar prompt injection risks could appear in other agentic AI tools that load external inputs and then perform autonomous actions.
- Adding input validation or prompt isolation layers in future editor versions could block this class of attack.
Load-bearing premise
The AI agents will read and act on instructions placed inside external resources without any built-in checks that prevent harmful command execution.
What would settle it
Loading a deliberately poisoned external file into Cursor or Copilot and observing that the AI agent neither executes the embedded malicious command nor takes any action based on it.
Figures
read the original abstract
Agentic AI coding editors driven by large language models have recently become more popular due to their ability to improve developer productivity during software development. Modern editors such as Cursor are designed not just for code completion, but also with more system privileges for complex coding tasks (e.g., run commands in the terminal, access development environments, and interact with external systems). While this brings us closer to the "fully automated programming" dream, it also raises new security concerns. In this study, we present the first empirical analysis of prompt injection attacks targeting these high-privilege agentic AI coding editors. We show how attackers can remotely exploit these systems by poisoning external development resources with malicious instructions, effectively hijacking AI agents to run malicious commands, turning "your AI" into "attacker's shell". To perform this analysis, we implement AIShellJack, an automated testing framework for assessing prompt injection vulnerabilities in agentic AI coding editors. AIShellJack contains 314 unique attack payloads that cover 70 techniques from the MITRE ATT&CK framework. Using AIShellJack, we conduct a large-scale evaluation on GitHub Copilot and Cursor, and our evaluation results show that attack success rates can reach as high as 84% for executing malicious commands. Moreover, these attacks are proven effective across a wide range of objectives, ranging from initial access and system discovery to credential theft and data exfiltration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to conduct the first empirical analysis of prompt injection attacks on agentic AI coding editors like GitHub Copilot and Cursor. It introduces the AIShellJack framework containing 314 unique attack payloads covering 70 MITRE ATT&CK techniques and reports that these attacks achieve success rates up to 84% in executing malicious commands through poisoning external resources.
Significance. If the results are reproducible and the evaluation accounts for realistic agent behaviors, this study would significantly contribute to the understanding of security risks in emerging AI coding tools with system privileges. The systematic coverage of attack techniques from MITRE ATT&CK is a notable strength, providing a comprehensive view of potential threats.
major comments (2)
- The abstract and evaluation section report an 84% attack success rate, but the manuscript does not provide details on the total number of experiments conducted, the criteria for determining success (e.g., actual command execution verification), or controls for agent context selection mechanisms that might filter injected prompts from files like READMEs or configs.
- While 314 payloads are mentioned, there is insufficient description of how these payloads were designed to bypass potential safety layers in the AI agents, and whether the evaluation tested scenarios where the agent summarizes or ignores external content.
minor comments (2)
- Some sentences could be clarified regarding the distinction between traditional prompt injection and the specific context of agentic editors with terminal access.
- The paper would benefit from including raw data or anonymized logs as supplementary material to support the reported success rates.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review of our manuscript. We appreciate the opportunity to clarify aspects of our experimental methodology and will incorporate revisions to address the concerns raised.
read point-by-point responses
-
Referee: The abstract and evaluation section report an 84% attack success rate, but the manuscript does not provide details on the total number of experiments conducted, the criteria for determining success (e.g., actual command execution verification), or controls for agent context selection mechanisms that might filter injected prompts from files like READMEs or configs.
Authors: We agree that these methodological details should be more explicitly documented to support reproducibility. In the revised manuscript, we will add a dedicated subsection in the Evaluation section specifying the total number of experiments performed, the success criteria (defined as verified execution of the malicious command via terminal output logging and environment state checks), and our approach to agent context handling. Our tests included direct poisoning of files such as READMEs and configuration files that agents were instructed to read in full, with observations that context selection did not systematically filter the injected prompts in the evaluated setups. revision: yes
-
Referee: While 314 payloads are mentioned, there is insufficient description of how these payloads were designed to bypass potential safety layers in the AI agents, and whether the evaluation tested scenarios where the agent summarizes or ignores external content.
Authors: We acknowledge that the payload design process merits expanded explanation. The 314 payloads were constructed by adapting established prompt injection patterns to the coding agent context and mapping them to 70 MITRE ATT&CK techniques, with specific elements such as indirect instruction overriding and role assumption included to navigate safety alignments. We will revise the AIShellJack framework description to provide concrete examples and rationale for these bypass strategies. Our evaluation did encompass scenarios in which agents were prompted to summarize or process external content (including cases where summarization occurred), and attack success was measured in those conditions as well; we will report these results separately in the revised evaluation to make this coverage explicit. revision: yes
Circularity Check
No circularity: purely empirical measurement study with observed outcomes
full rationale
This paper conducts an empirical evaluation of prompt injection attacks by implementing the AIShellJack testing framework and measuring attack success rates (up to 84%) on GitHub Copilot and Cursor through direct experimentation with 314 payloads. No equations, derivations, predictions, or first-principles results are present. There are no self-definitional elements, fitted inputs renamed as predictions, or load-bearing self-citations that reduce claims to inputs by construction. The central results are observed experimental outcomes from running attacks, making the study self-contained against external benchmarks such as actual editor executions. This is the expected finding for a measurement paper with no mathematical chain to inspect.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI agents in coding editors will interpret and act on instructions found in external development resources as part of completing coding tasks.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We implement AIShellJack, an automated testing framework … 314 unique attack payloads that cover 70 techniques from the MITRE ATT&CK framework … attack success rates can reach as high as 84 %
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Heimdallr: Characterizing and Detecting LLM-Induced Security Risks in GitHub CI Workflows
Heimdallr detects LLM-induced security risks in GitHub CI workflows by normalizing them into an LLM-Workflow Property Graph and combining triggerability analysis with LLM-assisted dataflow summarization, achieving ove...
-
LogJack: Indirect Prompt Injection Through Cloud Logs Against LLM Debugging Agents
LogJack shows indirect prompt injection via cloud logs succeeds in making LLM agents execute remote code on 6 of 8 models, with most cloud guardrails failing to detect the attacks.
-
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.
Reference graph
Works this paper leans on
-
[1]
Lepton AI. 2025. search_with_lepton. https://github.com/leptonai/search_with_lepton. Accessed: July 15, 2025
work page 2025
-
[2]
Amazon. 2025. Amazon Q Developer. https://aws.amazon.com/q/developer/. Accessed: July 19, 2025
work page 2025
-
[3]
Anonymous. 2025. Reproduction package. https://doi.org/10.6084/m9.figshare.30111988. Accessed: June 25, 2025
-
[4]
Divyansh Bhatia. 2025. The AI Model Race: Claude 4 vs GPT-4.1 vs Gemini 2.5 Pro. https://medium.com/ @divyanshbhatiajm19/the-ai-model-race-claude-4-vs-gpt-4-1-vs-gemini-2-5-pro-dab5db064f3e. Accessed: June 25, 2025
work page 2025
-
[5]
Elena Cross. 2025. The "S" in MCP Stands for Security. https://news.ycombinator.com/item?id=43600192
work page 2025
-
[6]
Cursor. 2024. Rules. https://docs.cursor.com/en/context/rules. Accessed: July 19, 2025
work page 2024
-
[7]
Cursor. 2025. Available models in Cursor. https://docs.cursor.com/models. Accessed: July 15, 2025
work page 2025
-
[8]
Cursor. 2025. Cursor - The AI Code Editor. https://cursor.com/. Accessed: July 19, 2025
work page 2025
-
[9]
Cursor Forum. 2025. Always run all commands without user confirmation. https://forum.cursor.com/t/always-run-all- commands-without-user-confirmation/31199. Accessed: August 25, 2025
work page 2025
-
[10]
Cursor Forum. 2025. Always run command. https://forum.cursor.com/t/always-run-command/29737. Accessed: August 25, 2025
work page 2025
-
[11]
Cursor Forum. 2025. Cursor tried to wipe my computer. https://forum.cursor.com/t/cursor-tried-to-wipe-my- computer/107142. Accessed: August 25, 2025
work page 2025
-
[12]
Badhan Chandra Das, M Hadi Amini, and Yanzhao Wu. 2025. Security and privacy challenges of large language models: A survey.Comput. Surveys57, 6 (2025), 1–39
work page 2025
-
[13]
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.Advances in Neural Information Processing Systems37 (2024), 82895–82920
work page 2024
-
[14]
Gabe Ragland. 2025. chatgpt-chrome-extension. https://github.com/gragland/chatgpt-chrome-extension. Accessed: July 15, 2025
work page 2025
-
[15]
GitHub. 2025. GitHub Copilot. https://github.com/features/copilot. Accessed: July 19, 2025
work page 2025
-
[16]
GitHub. 2025. Start and track GitHub Copilot coding agent sessions from Visual Studio Code. https://github.blog/ changelog/2025-07-14-start-and-track-github-copilot-coding-agent-sessions-from-visual-studio-code/. Accessed: July 19, 2025
work page 2025
-
[17]
Google Cloud. 2025. What is Vibe Coding? https://cloud.google.com/discover/what-is-vibe-coding. Accessed: August 25, 2025
work page 2025
-
[18]
Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model context protocol (mcp): Landscape, security threats, and future research directions.arXiv preprint arXiv:2503.23278(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Yizhan Huang, Yichen Li, Weibin Wu, Jianping Zhang, and Michael R Lyu. 2024. Your code secret belongs to me: Neural code completion tools can memorize hard-coded credentials.Proceedings of the ACM on Software Engineering1, FSE (2024), 2515–2537
work page 2024
-
[20]
Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, and Yinzhi Cao. 2024. Pleak: Prompt leaking attacks against large language model applications. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 3600–3614. Proc. ACM Softw. Eng., Vol. 1, No. 1, Article 1. Publication date: October 2025. 1:20 Yue Liu, Yanjie Zhao, Y...
work page 2024
- [21]
-
[22]
Jordan Novet. 2025. Microsoft introduces GitHub AI agent that can code for you. https://www.cnbc.com/2025/05/19/ microsoft-ai-github.html. Accessed: July 19, 2025
work page 2025
-
[23]
Jan H Klemmer, Stefan Albert Horstmann, Nikhil Patnaik, Cordelia Ludden, Cordell Burton Jr, Carson Powers, Fabio Massacci, Akond Rahman, Daniel Votipka, Heather Richter Lipford, et al . 2024. Using ai assistants in software development: A qualitative study on security practices and concerns. InProceedings of the 2024 on ACM SIGSAC Conference on Computer a...
work page 2024
-
[24]
Ravie Lakshmanan. 2025. New ’Rules File Backdoor’ Attack Lets Hackers Inject Malicious Code via AI Code Editors. https://thehackernews.com/2025/03/new-rules-file-backdoor-attack-lets.html. Accessed: 2025-05-17
work page 2025
-
[25]
Elizabeth Lin, Igibek Koishybayev, Trevor Dunlap, William Enck, and Alexandros Kapravelos. 2024. Untrustide: Exploiting weaknesses in vs code extensions. InProceedings of the ISOC Network and Distributed Systems Symposium (NDSS). Internet Society
work page 2024
-
[26]
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847
work page 2024
-
[27]
Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. 2025. DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks. In2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2190–2208
work page 2025
-
[28]
Yue Liu, Chakkrit Tantithamthavorn, and Li Li. 2025. Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions. In2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 551–562
work page 2025
-
[29]
Ludic. 2025. ludic. https://github.com/getludic/ludic. Accessed: July 15, 2025
work page 2025
-
[30]
My productivity is boosted, but
Yunbo Lyu, Zhou Yang, Jieke Shi, Jianming Chang, Yue Liu, and David Lo. 2025. " My productivity is boosted, but... " Demystifying Users’ Perception on AI Coding Assistants.arXiv preprint arXiv:2508.12285(2025)
-
[31]
Vahid Majdinasab, Michael Joshua Bishop, Shawn Rasheed, Arghavan Moradidakhel, Amjed Tahir, and Foutse Khomh
-
[32]
In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Assessing the security of github copilot’s generated code-a targeted replication study. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 435–444
-
[33]
N64Recomp. 2025. N64Recomp. https://github.com/N64Recomp/N64Recomp. Accessed: July 15, 2025
work page 2025
-
[34]
National Institute of Standards and Technology. 2025. CVE-2025-54135 Detail. https://nvd.nist.gov/vuln/detail/CVE- 2025-54135. Accessed: July 19, 2025
work page 2025
-
[35]
OWASP. 2025. OWASP Top 10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for- large-language-model-applications/. Accessed: August 25, 2025
work page 2025
-
[36]
PatrickJS. 2025. CursorRules: A New Way to Inject Malicious Code into AI Code Editors. https://github.com/PatrickJS/ awesome-cursorrules. Accessed: 2025-05-17
work page 2025
-
[37]
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2025. Asleep at the keyboard? assessing the security of github copilot’s code contributions.Commun. ACM68, 2 (2025), 96–105
work page 2025
-
[38]
Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do users write more insecure code with ai assistants?. InProceedings of the 2023 ACM SIGSAC conference on computer and communications security. 2785–2799
work page 2023
-
[39]
PyTorch Labs. 2025. gpt-fast. https://github.com/pytorch-labs/gpt-fast. Accessed: July 15, 2025
work page 2025
-
[40]
Qodo AI. 2025. 2025 StateofAI code quality. https://www.qodo.ai/reports/state-of-ai-code-quality/. Accessed: August 25, 2025
work page 2025
-
[41]
Red Canary. 2025. Atomic Red Team. https://github.com/redcanaryco/atomic-red-team. Accessed: June 10, 2025
work page 2025
-
[42]
Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. 2023. Lost at c: A user study on the security implications of large language model code assistants. In32nd USENIX Security Symposium (USENIX Security 23). 2205–2222
work page 2023
-
[43]
Stack Overflow. 2025. Stack Overflow Developer Survey 2025. https://survey.stackoverflow.co/2025/ai. Accessed: July 19, 2025
work page 2025
-
[44]
SurveyMonkey. 2025. Sample size calculator. https://www.surveymonkey.com/mp/sample-size-calculator/. Accessed: July 19, 2025
work page 2025
-
[45]
SWEbench. 2024. SWEbench: The Software Engineering Benchmark for AI Models. https://www.swebench.com/. Accessed: June 25, 2025
work page 2024
-
[46]
Tap Twice Digital. 2025. 10 Cursor Statistics (2025). https://taptwicedigital.com/cursor. Accessed: July 19, 2025
work page 2025
-
[47]
The MITRE Corporation. 2025. MITRE ATT&CK. https://attack.mitre.org/. Accessed: June 10, 2025
work page 2025
-
[48]
Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic.Fam med 37, 5 (2005), 360–363
work page 2005
-
[49]
Visual Studio Code. 2025. Use MCP servers in VS Code. https://code.visualstudio.com/docs/copilot/chat/mcp-servers. Accessed: July 19, 2025
work page 2025
-
[50]
Wikipedia. 2024. GPT-4o. https://en.wikipedia.org/wiki/GPT-4o. Accessed: June 25, 2025. Proc. ACM Softw. Eng., Vol. 1, No. 1, Article 1. Publication date: October 2025. “Your AI, My Shell”: Demystifying Prompt Injection Attacks on Agentic AI Coding Editors 1:21
work page 2024
-
[51]
Simon Willison. 2025. Model Context Protocol has prompt injection security problems. https://simonwillison.net/ 2025/Apr/9/mcp-prompt-injection/
work page 2025
-
[52]
Jiaqi Xue, Mengxin Zheng, Ting Hua, Yilin Shen, Yepeng Liu, Ladislau Bölöni, and Qian Lou. 2023. Trojllm: A black- box trojan prompt attack on large language models.Advances in Neural Information Processing Systems36 (2023), 65665–65677
work page 2023
-
[53]
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and defending against indirect prompt injection attacks on large language models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1809–1820
work page 2025
-
[54]
Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2024. Measuring github copilot’s impact on productivity.Commun. ACM67, 3 (2024), 54–63. Proc. ACM Softw. Eng., Vol. 1, No. 1, Article 1. Publication date: October 2025
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.