ShareLock: A Stealthy Multi-Tool Threshold Poisoning Attack Against MCP

Liwei Liu; Na Ruan; Tianzhu Han; Zijian Liu; Zishu Dong

arxiv: 2606.27027 · v1 · pith:6FTGK7YXnew · submitted 2026-06-25 · 💻 cs.CR · cs.AI

ShareLock: A Stealthy Multi-Tool Threshold Poisoning Attack Against MCP

Liwei Liu , Tianzhu Han , Zijian Liu , Zishu Dong , Na Ruan This is my paper

Pith reviewed 2026-06-26 04:02 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords tool poisoning attackMCPsecret sharingLLM agentsthreshold schememulti-tool attackstealthy poisoningmodel context protocol

0 comments

The pith

ShareLock splits malicious instructions into secret shares across multiple tool descriptions to poison MCP agents while evading detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ShareLock as a multi-tool threshold poisoning attack on the Model Context Protocol that connects LLMs to external tools. It shows how to disperse a hidden instruction using Shamir's scheme so each tool description looks benign on its own. A trigger planted during server updates lets the shares recombine inside the LLM during normal use, causing unauthorized actions. The work demonstrates this approach beats single-tool poisoning on detection resistance while keeping attack success above 90 percent across tested models and clients. A sympathetic reader would see this as evidence that current MCP security, which checks tools individually, leaves room for coordinated cross-tool attacks.

Core claim

ShareLock applies Shamir's threshold scheme to split a malicious instruction into multiple secret shares that are embedded as seemingly normal content in separate tool descriptions. A covert reconstruction trigger is inserted during a server update. When the LLM later aggregates the tool descriptions, the shares recombine to reveal the original instruction, which the model then executes, producing breaches of system assets or private data.

What carries the argument

Shamir's threshold secret sharing scheme that disperses the attack payload across tool descriptions and reconstructs it only when a planted trigger activates during LLM processing.

If this is right

The attack achieves information-theoretic secrecy so individual shares reveal nothing about the payload.
It maintains robustness against moderate auditing because no single tool contains the full instruction.
Attack success exceeds 90 percent on average across mainstream LLMs and two MCP clients.
It outperforms existing single-tool poisoning methods on tool-description detection benchmarks.
The framework works in four distinct multi-tool scenarios without requiring changes to the core MCP protocol.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

MCP security reviews may need to add cross-tool correlation checks rather than examining descriptions in isolation.
Protocol designers could add optional share-reconstruction detectors at the client side before feeding tools to the LLM.
The same dispersal technique might apply to other agent-tool interfaces that lack coordinated validation.
Server update processes become higher-risk vectors if they allow insertion of reconstruction triggers without extra verification.

Load-bearing premise

The shares placed in separate tool descriptions will stay undetected by inspection or detectors and the LLM will reliably reconstruct and follow the hidden instruction once the shares are combined.

What would settle it

Deploy ShareLock on a live MCP server, run standard manual review and automated detectors on the tool descriptions, then observe whether any share is flagged and whether the attack still triggers with high success rate.

Figures

Figures reproduced from arXiv: 2606.27027 by Liwei Liu, Na Ruan, Tianzhu Han, Zijian Liu, Zishu Dong.

**Figure 1.** Figure 1: Overview of MCP workflow and tool poisoning attack mechanism. fragile against real-world defenses, e.g., manual review of MCP Servers in practical deployments, especially following the disclosure of MCP risk. The straightforward embedding method is likely to deceive users with limited MCP security awareness and will not withstand expert review or automated security scanners such as MCPSafetyScanner[23], … view at source ↗

**Figure 1.** Figure 1: Fig.1. Although both attacks are essentially variants of prompt in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of ShareLock, a multi-tool threshold poisoning attack. For instance, considering the scenario where user query a weather assistant to make a travel to Oakland, adversary attempt to steal its api_key or anything else. Adversary generate benign secret shares through initial encoding and a (𝑡, 𝑛)-Share generator, disguising the share as tool_id and tool_seq into a normal tool in order to evade the po… view at source ↗

**Figure 3.** Figure 3: Threat scores of the four MCP poisoning attack methods in the security classification task across GPT-5, ClaudeSonnet-4.5 and Gemini-2.5-Flash. A higher score indicates that the model perceives a greater threat in the corresponding hazard category. We follow Llama Guard’s Hazard categories, which can be found in Appendix B. The top three categories with the highest scores are #2: Non-Violent Crimes, #7: P… view at source ↗

**Figure 4.** Figure 4: Ablation study on the robustness of the ShareLock attack with a (𝑡 = 3, 𝑛 = 5) scheme. (Left) The ASR remains high as long as the number of available tools 𝑘 ≥ 𝑡, but drops to 0% when 𝑘 < 𝑡, confirming the attack’s robustness. (Right) The TCR largely mirrors the ASR when the deterministic failure of the reconstruction step halts the agent’s workflow. a subset of these tools, effectively reducing the number… view at source ↗

**Figure 5.** Figure 5: Impact of temperature on ShareLock. (Left) ASR peaks at low-to-moderate temperatures, degrading as generation randomness increases. (Right) TCR trends vary by model. Robust models (e.g., Claude) maintain high task completion despite attack failures, whereas others experience complete workflow collapse. TPA Puppet Encode-Only ShareLock 0 10000 20000 30000 40000 Total Token Consumption Token Consumption Ov… view at source ↗

**Figure 6.** Figure 6: Token consumption overhead incurred by different poisoning strategies. The baseline token cost for normal user task execution is depicted by the gray hatched area. Overlaid colored segments illustrate the overhead imposed by the attacks. for maximizing the attack’s efficacy, typically between 0.5 and 1.0. As the temperature increases beyond this, the ASR for most models declines sharply. The heightened ran… view at source ↗

**Figure 7.** Figure 7: Example of safety classification. D.2 Hazard Category Standard In the safety classification task, we follow the 14 hazard categories used in Llama Guard, as shown in the [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Poisoned Tool Example of TPA [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Poisoned Tool Example of Puppet Attack. 3. Encode-Only Attack (Semantic Obfuscation) The payload is obfuscated via ASCII, requiring the agent to perform explicit decoding [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Poisoned Tool Example of Enc-Only Attack. 4. ShareLock The malicious payload is fragmented and cryptographically disguised as standard metadata (checksum), accompanied by a plausible compliance policy to trigger the reconstruction silently. E.3 Ablation Study Settings and Observations To maximize the evaluation coverage of state-of-the-art LLMs while maintaining computational resource efficiency, we int… view at source ↗

**Figure 11.** Figure 11: Trigger Tool Example of ShareLock [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Disguised Tool Example of ShareLock. behavioral analyses reported in this section are derived uniformly from Scenario I: Travel Assistant. Our empirical results demonstrate that ShareLock remains highly effective even against the Claude model family, which is industryrenowned for its rigorous safety alignment and stringent refusal policies. A critical observation from this ablation study is the severe di… view at source ↗

read the original abstract

With the rapid evolution of LLM-driven agents, Model Context Protocol (MCP), an open protocol bridging LLMs with external tools, has quickly become foundational to modern agent ecosystems. However, the expanding adoption of MCP has also introduced novel security concerns such as Tool Poisoning Attack (TPA), which exploit LLM-server interactions to inject malicious prompts. Existing poisoning schemes typically adopt a monolithic plaintext embedding paradigm, which fails to withstand manual inspection or automated detectors. Current research still lacks a systematic analysis on multi-tool poisoning, where multiple tools can be exploited cooperatively to disperse detection risk. In this paper, we introduce ShareLock, a multi-tool threshold poisoning framework that utilizes Shamir's threshold scheme to ensure exceptional stealth and fault tolerance. ShareLock distributes the malicious instruction as benign-looking secret shares across multiple tool descriptions, achieving both information-theoretic secrecy and attack robustness against moderate auditing. After a covert reconstruction trigger is planted during server update, the aggregated shares reconstruct the hidden instruction, resulting in critical breaches of system assets or private data. To evaluate the realistic threat of ShareLock, we constructed a comprehensive benchmark encompassing four multi-tool scenarios and conducted extensive experiments across mainstream LLMs on two distinct MCP clients. Our results demonstrate that ShareLock significantly outperforms existing single-tool poisoning strategies in tool description-based detection while maintaining an average attack success rate exceeding 90%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ShareLock is the first to frame multi-tool poisoning via Shamir threshold sharing, but the information-theoretic secrecy claim does not survive the need to embed random shares into readable tool text.

read the letter

The paper's main contribution is applying Shamir's threshold scheme to spread a malicious instruction across several tool descriptions in the MCP protocol so that any single description looks harmless. Prior work stayed with single-tool plaintext injections, so this multi-tool dispersal plus the reconstruction trigger during server updates is new. The authors also built a benchmark covering four multi-tool scenarios and ran tests on mainstream LLMs with two MCP clients, reporting average success above 90 percent and improved resistance to description-based detection.

That framework and the benchmark setup are the parts that hold up. Dispersing risk across tools makes sense for fault tolerance, and the abstract correctly notes the lack of systematic multi-tool analysis before.

The soft spot is the embedding step required for the secrecy claim. Shamir shares are uniform random field elements. Turning them into benign-looking natural-language tool descriptions requires some mapping—base64 in comments, synonym substitution, or template filling. Any such mapping is either statistically detectable or deterministic enough to lose the information-theoretic guarantee that k-1 shares reveal nothing. The abstract asserts both IT secrecy and stealth without describing the construction or proving the mapping preserves uniformity. The stress-test concern lands directly on the abstract.

Experiments are reported at summary level only. No error bars, no concrete detection baselines, and no ablation on how the LLM actually aggregates and executes the reconstructed instruction. The weakest assumption—that the shares stay hidden from manual or automated inspection while still triggering reliably—remains untested in the provided details.

This is for researchers working on LLM agent tool security and protocol threats. A reader who wants to see how threshold methods might extend poisoning attacks will find the framing and benchmark useful. The work shows clear thinking on the gap and does not contain internal contradictions, so it deserves a serious referee even though the encoding and experimental controls need tightening.

Referee Report

3 major / 0 minor

Summary. The manuscript introduces ShareLock, a multi-tool threshold poisoning attack on the Model Context Protocol (MCP) used by LLM agents. It applies Shamir's secret sharing to split a malicious instruction into shares that are embedded as benign-looking text within multiple tool descriptions. A covert reconstruction trigger is inserted during a server update; when shares are aggregated at runtime the LLM reconstructs and executes the hidden instruction, producing asset or data breaches. Experiments across four multi-tool scenarios, mainstream LLMs, and two MCP clients are reported to yield average attack success rates above 90 percent while evading tool-description-based detection better than single-tool baselines.

Significance. If the embedding construction preserves information-theoretic secrecy and the reconstruction trigger functions reliably, the work would establish a concrete, threshold-based multi-tool attack vector that disperses detection risk and improves robustness over monolithic poisoning. The use of an established secret-sharing primitive for stealth in an open protocol is a clear technical contribution; the multi-scenario benchmark supplies practical evidence of the threat model.

major comments (3)

[Abstract and method description] Abstract and method description: the central claim of 'information-theoretic secrecy' is unsupported. Shamir shares are uniform random field elements, yet any mapping into readable natural-language tool descriptions (base64, synonym substitution, template filling, etc.) is necessarily deterministic or low-entropy and therefore statistically distinguishable from genuine documentation by entropy, n-gram, or LLM-likelihood tests. No encoding construction, uniformity proof, or leakage analysis is supplied; this directly undermines the stealth and IT-secrecy guarantees that the attack's novelty rests upon.
[Experimental evaluation] Experimental evaluation: the reported average success rate exceeding 90 percent is presented without the number of trials per scenario, standard deviations or error bars, explicit baselines, or controls for prompt variability and LLM stochasticity. Because the central claim is that ShareLock 'significantly outperforms' single-tool strategies while remaining undetected, the absence of these statistics prevents assessment of whether the improvement is statistically reliable or merely an artifact of the chosen prompts and models.
[Reconstruction trigger and LLM execution] Reconstruction trigger and LLM execution: the manuscript assumes that once shares are aggregated the LLM will both correctly reconstruct the secret and then reliably act on the malicious instruction. No ablation or failure-mode analysis is given for cases in which the reconstructed prompt is only partially recovered or is refused by the model’s safety alignment; this assumption is load-bearing for the claimed breach outcomes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments on the information-theoretic secrecy claim, experimental reporting, and reconstruction analysis are helpful for improving the manuscript. We address each major comment below and will make the necessary revisions.

read point-by-point responses

Referee: [Abstract and method description] Abstract and method description: the central claim of 'information-theoretic secrecy' is unsupported. Shamir shares are uniform random field elements, yet any mapping into readable natural-language tool descriptions (base64, synonym substitution, template filling, etc.) is necessarily deterministic or low-entropy and therefore statistically distinguishable from genuine documentation by entropy, n-gram, or LLM-likelihood tests. No encoding construction, uniformity proof, or leakage analysis is supplied; this directly undermines the stealth and IT-secrecy guarantees that the attack's novelty rests upon.

Authors: The information-theoretic secrecy claim in the paper specifically refers to the core property of Shamir's threshold scheme: any collection of fewer than the threshold number of shares reveals zero information about the secret (in the information-theoretic sense). This holds independently of how the shares are subsequently encoded or presented. However, we agree that the manuscript does not supply an explicit encoding construction, uniformity argument, or statistical leakage analysis for the natural-language embedding step. In the revised version we will add a dedicated subsection describing the encoding procedure, together with an entropy/n-gram/LLM-likelihood leakage evaluation against genuine tool descriptions. This will clarify the distinction between the IT secrecy of the secret-sharing layer and the empirical stealth of the embedding. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: the reported average success rate exceeding 90 percent is presented without the number of trials per scenario, standard deviations or error bars, explicit baselines, or controls for prompt variability and LLM stochasticity. Because the central claim is that ShareLock 'significantly outperforms' single-tool strategies while remaining undetected, the absence of these statistics prevents assessment of whether the improvement is statistically reliable or merely an artifact of the chosen prompts and models.

Authors: We acknowledge that the current experimental section lacks the statistical detail required to support the performance claims rigorously. In the revision we will report the exact number of independent trials per scenario, include standard deviations and error bars, add explicit single-tool baselines, and describe controls for prompt variability and LLM temperature stochasticity. These additions will allow readers to assess the statistical significance of the reported >90% success rate and the claimed improvement over single-tool poisoning. revision: yes
Referee: [Reconstruction trigger and LLM execution] Reconstruction trigger and LLM execution: the manuscript assumes that once shares are aggregated the LLM will both correctly reconstruct the secret and then reliably act on the malicious instruction. No ablation or failure-mode analysis is given for cases in which the reconstructed prompt is only partially recovered or is refused by the model’s safety alignment; this assumption is load-bearing for the claimed breach outcomes.

Authors: The referee correctly identifies that the paper provides no ablation or failure-mode analysis for partial reconstruction or safety refusals. We will add a new subsection containing such an analysis, including experiments that deliberately degrade share quality or trigger safety filters, and report the resulting success/failure rates. This will substantiate the reliability assumptions underlying the attack outcomes. revision: yes

Circularity Check

0 steps flagged

No circularity detected; construction relies on external Shamir scheme

full rationale

The paper's central construction applies the standard Shamir threshold scheme to distribute shares across tool descriptions, claiming information-theoretic secrecy and robustness as direct consequences of that scheme. No equations or steps reduce the result to a self-defined quantity, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. The embedding of shares into natural-language descriptions is presented as an implementation choice without any derivation that loops back to its own inputs. The attack success claims rest on experimental evaluation rather than any tautological redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract was available; no free parameters, axioms, or invented entities are described in sufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5777 in / 1143 out tokens · 56623 ms · 2026-06-26T04:02:31.411386+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 4 linked inside Pith

[1]

Meta AI. 2024. Prompt Guard 86M Model. https://huggingface.co/meta-llama/ Prompt-Guard-86M

2024
[2]

Anthropic. 2025. Introduction to Model Context Protocol. https:// modelcontextprotocol.io/introduction

2025
[3]

CSA. 2025. Agentic ai threat modeling framework: Maestro. https: //cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling- framework-maestro Online

2025
[4]

Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, and Soheil Feizi. 2025. Tool Preferences in Agentic LLMs are Unreliable. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 20965–20980

2025
[5]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. 2025. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express (2025)

2025
[6]

GoogleCloud. 2025. Prompt engineering: overview and guide. https://cloud. google.com/discover/what-is-prompt-engineering?

2025
[7]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

2023
[8]

Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, and Sheng Wen. 2025. Systematic analysis of mcp security.arXiv preprint arXiv:2508.12538(2025)

Pith/arXiv arXiv 2025
[9]

John Halloran. 2025. MCP Safety Training: Learning to Refuse Falsely Be- nign MCP Exploits using Improved Preference Alignment.arXiv preprint arXiv:2505.23634(2025)

arXiv 2025
[10]

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations.arXiv preprint arXiv: 2312.06674(2023)

Pith/arXiv arXiv 2023
[11]

InvariantLabs. 2025. MCP Security Notifications: Tool Poisoning Attacks. https: //invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks Online

2025
[12]

Huihao Jing, Haoran Li, Wenbin Hu, Qi Hu, Xu Heli, Tianshu Chu, Peizhao Hu, and Yangqiu Song. 2025. Mcip: Protecting mcp safety via model contextual integrity protocol. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 1177–1194

2025
[13]

Sonu Kumar, Anubhav Girdhar, Ritesh Patil, and Divyansh Tripathi. 2025. Mcp guardian: A security-first layer for safeguarding mcp-based ai system.arXiv preprint arXiv:2504.12757(2025)

arXiv 2025
[14]

Songze Li, Jiameng Cheng, Yiming Li, Xiaojun Jia, and Dacheng Tao. 2026. Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography. InProceedings of the 33rd Annual Network and Distributed System Security Symposium (NDSS)

2026
[15]

Huawei Lin, Yingjie Lao, Tong Geng, Tan Yu, and Weijie Zhao. 2025. Uniguardian: A unified defense for detecting prompt injection, backdoor attacks and adversarial attacks in large language models.arXiv preprint arXiv:2502.13141(2025)

arXiv 2025
[16]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847

2024
[17]

Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. 2025. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2190–2208

2025
[18]

Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, et al. 2026. Safety at scale: A comprehensive survey of large model and agent safety.Foundations and Trends in Privacy and Security8, 3-4 (2026), 1–240

2026
[19]

Yingning Ma. 2025. Realsafe: Quantifying safety risks of language agents in real-world. InProceedings of the 31st International Conference on Computational Linguistics. 9586–9617

2025
[20]

Vineeth Sai Narajala and Idan Habler. 2025. Enterprise-grade security for the model context protocol (mcp): Frameworks and mitigation strategies.arXiv preprint arXiv:2504.08623(2025)

arXiv 2025
[21]

OpenAI et al. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]. doi:10. 48550/arXiv.2303.08774

Pith/arXiv arXiv 2023
[22]

Chetan Pathade. 2025. Red teaming the mind of the machine: A systematic evaluation of prompt injection and jailbreak vulnerabilities in llms.arXiv preprint arXiv:2505.04806(2025)

arXiv 2025
[23]

Brandon Radosevich and John Halloran. 2025. Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767(2025)

arXiv 2025
[24]

Adi Shamir. 1979. How to share a secret.Commun. ACM22, 11 (1979), 612–613

1979
[25]

Haoran Shi, Hongwei Yao, Shuo Shao, Shaopeng Jiao, Ziqi Peng, Zhan Qin, and Cong Wang. 2025. Quantifying Conversation Drift in MCP via Latent Polytope. arXiv preprint arXiv:2508.06418(2025)

arXiv 2025
[26]

Smithery.ai. 2025. Introduction to Smithery. https://smithery.ai/docs

2025
[27]

Hao Song, Yiming Shen, Wenxuan Luo, Leixin Guo, Ting Chen, Jiashui Wang, Beibei Li, Xiaosong Zhang, and Jiachi Chen. 2025. Beyond the protocol: Un- veiling attack vectors in the model context protocol ecosystem.arXiv preprint arXiv:2506.02040(2025)

arXiv 2025
[28]

Zihan Wang, Rui Zhang, Yu Liu, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Hongwei Li, and Guowen Xu. 2026. Mpma: Preference manipulation attack against model context protocol. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 35838–35846

2026
[29]

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How Does LLM Safety Training Fail?. InAdvances in Neural Information Processing Systems, Vol. 36. 80079–80110

2023
[30]

Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Yu Liu, Zongjie Li, Ping Chen, Shuai Wang, and Dongdong She. 2025. Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security As- sessment.arXiv preprint arXiv:2509.05755(2025)

arXiv 2025
[31]

Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, and Meng Han. 2025. MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications.arXiv preprint arXiv:2508.10991(2025)

arXiv 2025
[32]

Junjie Xiong, Changjia Zhu, Shuhang Lin, Chong Zhang, Yongfeng Zhang, Yao Liu, and Lingyao Li. 2025. Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models.arXiv preprint arXiv:2505.16957(2025)

arXiv 2025
[33]

Shuli Zhao, Qinsheng Hou, Zihan Zhan, Yanhao Wang, Yuchong Xie, Yu Guo, Libo Chen, Shenghong Li, and Zhi Xue. 2025. Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem.arXiv preprint arXiv:2509.06572(2025). A Ethical Considerations All MCP poisoning attack scenarios evaluated in this work were executed solely on the au...

Pith/arXiv arXiv 2025
[34]

TPA (Direct Injection) The unencoded malicious payload is blatantly exposed in the target tool’s description
[35]

Figure 8:Poisoned Tool Example of TPA

Puppet Attack (Indirect Hijacking) A benign-looking target tool is manipulated by a payload injected into a separate, attacker-controlled tool (e.g.,EnvSetup). Figure 8:Poisoned Tool Example of TPA. Figure 9:Poisoned Tool Example of Puppet Attack
[36]

Figure 10:Poisoned Tool Example of Enc-Only Attack

Encode-Only Attack (Semantic Obfuscation) The payload is obfuscated via ASCII, requiring the agent to perform explicit decoding. Figure 10:Poisoned Tool Example of Enc-Only Attack
[37]

Do you authorize modi- fying the API_KEY file before proceeding?

ShareLock The malicious payload is fragmented and cryptographically dis- guised as standard metadata (checksum), accompanied by a plausi- ble compliance policy to trigger the reconstruction silently. E.3 Ablation Study Settings and Observations To maximize the evaluation coverage of state-of-the-art LLMs while maintaining computational resource efficiency...

2017

[1] [1]

Meta AI. 2024. Prompt Guard 86M Model. https://huggingface.co/meta-llama/ Prompt-Guard-86M

2024

[2] [2]

Anthropic. 2025. Introduction to Model Context Protocol. https:// modelcontextprotocol.io/introduction

2025

[3] [3]

CSA. 2025. Agentic ai threat modeling framework: Maestro. https: //cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling- framework-maestro Online

2025

[4] [4]

Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, and Soheil Feizi. 2025. Tool Preferences in Agentic LLMs are Unreliable. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 20965–20980

2025

[5] [5]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. 2025. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express (2025)

2025

[6] [6]

GoogleCloud. 2025. Prompt engineering: overview and guide. https://cloud. google.com/discover/what-is-prompt-engineering?

2025

[7] [7]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

2023

[8] [8]

Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, and Sheng Wen. 2025. Systematic analysis of mcp security.arXiv preprint arXiv:2508.12538(2025)

Pith/arXiv arXiv 2025

[9] [9]

John Halloran. 2025. MCP Safety Training: Learning to Refuse Falsely Be- nign MCP Exploits using Improved Preference Alignment.arXiv preprint arXiv:2505.23634(2025)

arXiv 2025

[10] [10]

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations.arXiv preprint arXiv: 2312.06674(2023)

Pith/arXiv arXiv 2023

[11] [11]

InvariantLabs. 2025. MCP Security Notifications: Tool Poisoning Attacks. https: //invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks Online

2025

[12] [12]

Huihao Jing, Haoran Li, Wenbin Hu, Qi Hu, Xu Heli, Tianshu Chu, Peizhao Hu, and Yangqiu Song. 2025. Mcip: Protecting mcp safety via model contextual integrity protocol. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 1177–1194

2025

[13] [13]

Sonu Kumar, Anubhav Girdhar, Ritesh Patil, and Divyansh Tripathi. 2025. Mcp guardian: A security-first layer for safeguarding mcp-based ai system.arXiv preprint arXiv:2504.12757(2025)

arXiv 2025

[14] [14]

Songze Li, Jiameng Cheng, Yiming Li, Xiaojun Jia, and Dacheng Tao. 2026. Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography. InProceedings of the 33rd Annual Network and Distributed System Security Symposium (NDSS)

2026

[15] [15]

Huawei Lin, Yingjie Lao, Tong Geng, Tan Yu, and Weijie Zhao. 2025. Uniguardian: A unified defense for detecting prompt injection, backdoor attacks and adversarial attacks in large language models.arXiv preprint arXiv:2502.13141(2025)

arXiv 2025

[16] [16]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847

2024

[17] [17]

Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. 2025. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2190–2208

2025

[18] [18]

Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, et al. 2026. Safety at scale: A comprehensive survey of large model and agent safety.Foundations and Trends in Privacy and Security8, 3-4 (2026), 1–240

2026

[19] [19]

Yingning Ma. 2025. Realsafe: Quantifying safety risks of language agents in real-world. InProceedings of the 31st International Conference on Computational Linguistics. 9586–9617

2025

[20] [20]

Vineeth Sai Narajala and Idan Habler. 2025. Enterprise-grade security for the model context protocol (mcp): Frameworks and mitigation strategies.arXiv preprint arXiv:2504.08623(2025)

arXiv 2025

[21] [21]

OpenAI et al. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]. doi:10. 48550/arXiv.2303.08774

Pith/arXiv arXiv 2023

[22] [22]

Chetan Pathade. 2025. Red teaming the mind of the machine: A systematic evaluation of prompt injection and jailbreak vulnerabilities in llms.arXiv preprint arXiv:2505.04806(2025)

arXiv 2025

[23] [23]

Brandon Radosevich and John Halloran. 2025. Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767(2025)

arXiv 2025

[24] [24]

Adi Shamir. 1979. How to share a secret.Commun. ACM22, 11 (1979), 612–613

1979

[25] [25]

Haoran Shi, Hongwei Yao, Shuo Shao, Shaopeng Jiao, Ziqi Peng, Zhan Qin, and Cong Wang. 2025. Quantifying Conversation Drift in MCP via Latent Polytope. arXiv preprint arXiv:2508.06418(2025)

arXiv 2025

[26] [26]

Smithery.ai. 2025. Introduction to Smithery. https://smithery.ai/docs

2025

[27] [27]

Hao Song, Yiming Shen, Wenxuan Luo, Leixin Guo, Ting Chen, Jiashui Wang, Beibei Li, Xiaosong Zhang, and Jiachi Chen. 2025. Beyond the protocol: Un- veiling attack vectors in the model context protocol ecosystem.arXiv preprint arXiv:2506.02040(2025)

arXiv 2025

[28] [28]

Zihan Wang, Rui Zhang, Yu Liu, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Hongwei Li, and Guowen Xu. 2026. Mpma: Preference manipulation attack against model context protocol. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 35838–35846

2026

[29] [29]

Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How Does LLM Safety Training Fail?. InAdvances in Neural Information Processing Systems, Vol. 36. 80079–80110

2023

[30] [30]

Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Yu Liu, Zongjie Li, Ping Chen, Shuai Wang, and Dongdong She. 2025. Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security As- sessment.arXiv preprint arXiv:2509.05755(2025)

arXiv 2025

[31] [31]

Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, and Meng Han. 2025. MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications.arXiv preprint arXiv:2508.10991(2025)

arXiv 2025

[32] [32]

Junjie Xiong, Changjia Zhu, Shuhang Lin, Chong Zhang, Yongfeng Zhang, Yao Liu, and Lingyao Li. 2025. Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models.arXiv preprint arXiv:2505.16957(2025)

arXiv 2025

[33] [33]

Shuli Zhao, Qinsheng Hou, Zihan Zhan, Yanhao Wang, Yuchong Xie, Yu Guo, Libo Chen, Shenghong Li, and Zhi Xue. 2025. Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem.arXiv preprint arXiv:2509.06572(2025). A Ethical Considerations All MCP poisoning attack scenarios evaluated in this work were executed solely on the au...

Pith/arXiv arXiv 2025

[34] [34]

TPA (Direct Injection) The unencoded malicious payload is blatantly exposed in the target tool’s description

[35] [35]

Figure 8:Poisoned Tool Example of TPA

Puppet Attack (Indirect Hijacking) A benign-looking target tool is manipulated by a payload injected into a separate, attacker-controlled tool (e.g.,EnvSetup). Figure 8:Poisoned Tool Example of TPA. Figure 9:Poisoned Tool Example of Puppet Attack

[36] [36]

Figure 10:Poisoned Tool Example of Enc-Only Attack

Encode-Only Attack (Semantic Obfuscation) The payload is obfuscated via ASCII, requiring the agent to perform explicit decoding. Figure 10:Poisoned Tool Example of Enc-Only Attack

[37] [37]

Do you authorize modi- fying the API_KEY file before proceeding?

ShareLock The malicious payload is fragmented and cryptographically dis- guised as standard metadata (checksum), accompanied by a plausi- ble compliance policy to trigger the reconstruction silently. E.3 Ablation Study Settings and Observations To maximize the evaluation coverage of state-of-the-art LLMs while maintaining computational resource efficiency...

2017