ShareLock: A Stealthy Multi-Tool Threshold Poisoning Attack Against MCP
Pith reviewed 2026-06-26 04:02 UTC · model grok-4.3
The pith
ShareLock splits malicious instructions into secret shares across multiple tool descriptions to poison MCP agents while evading detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ShareLock applies Shamir's threshold scheme to split a malicious instruction into multiple secret shares that are embedded as seemingly normal content in separate tool descriptions. A covert reconstruction trigger is inserted during a server update. When the LLM later aggregates the tool descriptions, the shares recombine to reveal the original instruction, which the model then executes, producing breaches of system assets or private data.
What carries the argument
Shamir's threshold secret sharing scheme that disperses the attack payload across tool descriptions and reconstructs it only when a planted trigger activates during LLM processing.
If this is right
- The attack achieves information-theoretic secrecy so individual shares reveal nothing about the payload.
- It maintains robustness against moderate auditing because no single tool contains the full instruction.
- Attack success exceeds 90 percent on average across mainstream LLMs and two MCP clients.
- It outperforms existing single-tool poisoning methods on tool-description detection benchmarks.
- The framework works in four distinct multi-tool scenarios without requiring changes to the core MCP protocol.
Where Pith is reading between the lines
- MCP security reviews may need to add cross-tool correlation checks rather than examining descriptions in isolation.
- Protocol designers could add optional share-reconstruction detectors at the client side before feeding tools to the LLM.
- The same dispersal technique might apply to other agent-tool interfaces that lack coordinated validation.
- Server update processes become higher-risk vectors if they allow insertion of reconstruction triggers without extra verification.
Load-bearing premise
The shares placed in separate tool descriptions will stay undetected by inspection or detectors and the LLM will reliably reconstruct and follow the hidden instruction once the shares are combined.
What would settle it
Deploy ShareLock on a live MCP server, run standard manual review and automated detectors on the tool descriptions, then observe whether any share is flagged and whether the attack still triggers with high success rate.
Figures
read the original abstract
With the rapid evolution of LLM-driven agents, Model Context Protocol (MCP), an open protocol bridging LLMs with external tools, has quickly become foundational to modern agent ecosystems. However, the expanding adoption of MCP has also introduced novel security concerns such as Tool Poisoning Attack (TPA), which exploit LLM-server interactions to inject malicious prompts. Existing poisoning schemes typically adopt a monolithic plaintext embedding paradigm, which fails to withstand manual inspection or automated detectors. Current research still lacks a systematic analysis on multi-tool poisoning, where multiple tools can be exploited cooperatively to disperse detection risk. In this paper, we introduce ShareLock, a multi-tool threshold poisoning framework that utilizes Shamir's threshold scheme to ensure exceptional stealth and fault tolerance. ShareLock distributes the malicious instruction as benign-looking secret shares across multiple tool descriptions, achieving both information-theoretic secrecy and attack robustness against moderate auditing. After a covert reconstruction trigger is planted during server update, the aggregated shares reconstruct the hidden instruction, resulting in critical breaches of system assets or private data. To evaluate the realistic threat of ShareLock, we constructed a comprehensive benchmark encompassing four multi-tool scenarios and conducted extensive experiments across mainstream LLMs on two distinct MCP clients. Our results demonstrate that ShareLock significantly outperforms existing single-tool poisoning strategies in tool description-based detection while maintaining an average attack success rate exceeding 90%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ShareLock, a multi-tool threshold poisoning attack on the Model Context Protocol (MCP) used by LLM agents. It applies Shamir's secret sharing to split a malicious instruction into shares that are embedded as benign-looking text within multiple tool descriptions. A covert reconstruction trigger is inserted during a server update; when shares are aggregated at runtime the LLM reconstructs and executes the hidden instruction, producing asset or data breaches. Experiments across four multi-tool scenarios, mainstream LLMs, and two MCP clients are reported to yield average attack success rates above 90 percent while evading tool-description-based detection better than single-tool baselines.
Significance. If the embedding construction preserves information-theoretic secrecy and the reconstruction trigger functions reliably, the work would establish a concrete, threshold-based multi-tool attack vector that disperses detection risk and improves robustness over monolithic poisoning. The use of an established secret-sharing primitive for stealth in an open protocol is a clear technical contribution; the multi-scenario benchmark supplies practical evidence of the threat model.
major comments (3)
- [Abstract and method description] Abstract and method description: the central claim of 'information-theoretic secrecy' is unsupported. Shamir shares are uniform random field elements, yet any mapping into readable natural-language tool descriptions (base64, synonym substitution, template filling, etc.) is necessarily deterministic or low-entropy and therefore statistically distinguishable from genuine documentation by entropy, n-gram, or LLM-likelihood tests. No encoding construction, uniformity proof, or leakage analysis is supplied; this directly undermines the stealth and IT-secrecy guarantees that the attack's novelty rests upon.
- [Experimental evaluation] Experimental evaluation: the reported average success rate exceeding 90 percent is presented without the number of trials per scenario, standard deviations or error bars, explicit baselines, or controls for prompt variability and LLM stochasticity. Because the central claim is that ShareLock 'significantly outperforms' single-tool strategies while remaining undetected, the absence of these statistics prevents assessment of whether the improvement is statistically reliable or merely an artifact of the chosen prompts and models.
- [Reconstruction trigger and LLM execution] Reconstruction trigger and LLM execution: the manuscript assumes that once shares are aggregated the LLM will both correctly reconstruct the secret and then reliably act on the malicious instruction. No ablation or failure-mode analysis is given for cases in which the reconstructed prompt is only partially recovered or is refused by the model’s safety alignment; this assumption is load-bearing for the claimed breach outcomes.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments on the information-theoretic secrecy claim, experimental reporting, and reconstruction analysis are helpful for improving the manuscript. We address each major comment below and will make the necessary revisions.
read point-by-point responses
-
Referee: [Abstract and method description] Abstract and method description: the central claim of 'information-theoretic secrecy' is unsupported. Shamir shares are uniform random field elements, yet any mapping into readable natural-language tool descriptions (base64, synonym substitution, template filling, etc.) is necessarily deterministic or low-entropy and therefore statistically distinguishable from genuine documentation by entropy, n-gram, or LLM-likelihood tests. No encoding construction, uniformity proof, or leakage analysis is supplied; this directly undermines the stealth and IT-secrecy guarantees that the attack's novelty rests upon.
Authors: The information-theoretic secrecy claim in the paper specifically refers to the core property of Shamir's threshold scheme: any collection of fewer than the threshold number of shares reveals zero information about the secret (in the information-theoretic sense). This holds independently of how the shares are subsequently encoded or presented. However, we agree that the manuscript does not supply an explicit encoding construction, uniformity argument, or statistical leakage analysis for the natural-language embedding step. In the revised version we will add a dedicated subsection describing the encoding procedure, together with an entropy/n-gram/LLM-likelihood leakage evaluation against genuine tool descriptions. This will clarify the distinction between the IT secrecy of the secret-sharing layer and the empirical stealth of the embedding. revision: yes
-
Referee: [Experimental evaluation] Experimental evaluation: the reported average success rate exceeding 90 percent is presented without the number of trials per scenario, standard deviations or error bars, explicit baselines, or controls for prompt variability and LLM stochasticity. Because the central claim is that ShareLock 'significantly outperforms' single-tool strategies while remaining undetected, the absence of these statistics prevents assessment of whether the improvement is statistically reliable or merely an artifact of the chosen prompts and models.
Authors: We acknowledge that the current experimental section lacks the statistical detail required to support the performance claims rigorously. In the revision we will report the exact number of independent trials per scenario, include standard deviations and error bars, add explicit single-tool baselines, and describe controls for prompt variability and LLM temperature stochasticity. These additions will allow readers to assess the statistical significance of the reported >90% success rate and the claimed improvement over single-tool poisoning. revision: yes
-
Referee: [Reconstruction trigger and LLM execution] Reconstruction trigger and LLM execution: the manuscript assumes that once shares are aggregated the LLM will both correctly reconstruct the secret and then reliably act on the malicious instruction. No ablation or failure-mode analysis is given for cases in which the reconstructed prompt is only partially recovered or is refused by the model’s safety alignment; this assumption is load-bearing for the claimed breach outcomes.
Authors: The referee correctly identifies that the paper provides no ablation or failure-mode analysis for partial reconstruction or safety refusals. We will add a new subsection containing such an analysis, including experiments that deliberately degrade share quality or trigger safety filters, and report the resulting success/failure rates. This will substantiate the reliability assumptions underlying the attack outcomes. revision: yes
Circularity Check
No circularity detected; construction relies on external Shamir scheme
full rationale
The paper's central construction applies the standard Shamir threshold scheme to distribute shares across tool descriptions, claiming information-theoretic secrecy and robustness as direct consequences of that scheme. No equations or steps reduce the result to a self-defined quantity, a fitted parameter renamed as prediction, or a load-bearing self-citation chain. The embedding of shares into natural-language descriptions is presented as an implementation choice without any derivation that loops back to its own inputs. The attack success claims rest on experimental evaluation rather than any tautological redefinition.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Meta AI. 2024. Prompt Guard 86M Model. https://huggingface.co/meta-llama/ Prompt-Guard-86M
2024
-
[2]
Anthropic. 2025. Introduction to Model Context Protocol. https:// modelcontextprotocol.io/introduction
2025
-
[3]
CSA. 2025. Agentic ai threat modeling framework: Maestro. https: //cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling- framework-maestro Online
2025
-
[4]
Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, and Soheil Feizi. 2025. Tool Preferences in Agentic LLMs are Unreliable. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 20965–20980
2025
-
[5]
Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. 2025. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express (2025)
2025
-
[6]
GoogleCloud. 2025. Prompt engineering: overview and guide. https://cloud. google.com/discover/what-is-prompt-engineering?
2025
-
[7]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90
2023
-
[8]
Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, and Sheng Wen. 2025. Systematic analysis of mcp security.arXiv preprint arXiv:2508.12538(2025)
Pith/arXiv arXiv 2025
-
[9]
John Halloran. 2025. MCP Safety Training: Learning to Refuse Falsely Be- nign MCP Exploits using Improved Preference Alignment.arXiv preprint arXiv:2505.23634(2025)
arXiv 2025
-
[10]
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations.arXiv preprint arXiv: 2312.06674(2023)
Pith/arXiv arXiv 2023
-
[11]
InvariantLabs. 2025. MCP Security Notifications: Tool Poisoning Attacks. https: //invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks Online
2025
-
[12]
Huihao Jing, Haoran Li, Wenbin Hu, Qi Hu, Xu Heli, Tianshu Chu, Peizhao Hu, and Yangqiu Song. 2025. Mcip: Protecting mcp safety via model contextual integrity protocol. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 1177–1194
2025
-
[13]
Sonu Kumar, Anubhav Girdhar, Ritesh Patil, and Divyansh Tripathi. 2025. Mcp guardian: A security-first layer for safeguarding mcp-based ai system.arXiv preprint arXiv:2504.12757(2025)
arXiv 2025
-
[14]
Songze Li, Jiameng Cheng, Yiming Li, Xiaojun Jia, and Dacheng Tao. 2026. Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography. InProceedings of the 33rd Annual Network and Distributed System Security Symposium (NDSS)
2026
-
[15]
Huawei Lin, Yingjie Lao, Tong Geng, Tan Yu, and Weijie Zhao. 2025. Uniguardian: A unified defense for detecting prompt injection, backdoor attacks and adversarial attacks in large language models.arXiv preprint arXiv:2502.13141(2025)
arXiv 2025
-
[16]
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847
2024
-
[17]
Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. 2025. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP). IEEE, 2190–2208
2025
-
[18]
Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, et al. 2026. Safety at scale: A comprehensive survey of large model and agent safety.Foundations and Trends in Privacy and Security8, 3-4 (2026), 1–240
2026
-
[19]
Yingning Ma. 2025. Realsafe: Quantifying safety risks of language agents in real-world. InProceedings of the 31st International Conference on Computational Linguistics. 9586–9617
2025
-
[20]
Vineeth Sai Narajala and Idan Habler. 2025. Enterprise-grade security for the model context protocol (mcp): Frameworks and mitigation strategies.arXiv preprint arXiv:2504.08623(2025)
arXiv 2025
-
[21]
OpenAI et al. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]. doi:10. 48550/arXiv.2303.08774
Pith/arXiv arXiv 2023
-
[22]
Chetan Pathade. 2025. Red teaming the mind of the machine: A systematic evaluation of prompt injection and jailbreak vulnerabilities in llms.arXiv preprint arXiv:2505.04806(2025)
arXiv 2025
-
[23]
Brandon Radosevich and John Halloran. 2025. Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767(2025)
arXiv 2025
-
[24]
Adi Shamir. 1979. How to share a secret.Commun. ACM22, 11 (1979), 612–613
1979
-
[25]
Haoran Shi, Hongwei Yao, Shuo Shao, Shaopeng Jiao, Ziqi Peng, Zhan Qin, and Cong Wang. 2025. Quantifying Conversation Drift in MCP via Latent Polytope. arXiv preprint arXiv:2508.06418(2025)
arXiv 2025
-
[26]
Smithery.ai. 2025. Introduction to Smithery. https://smithery.ai/docs
2025
-
[27]
Hao Song, Yiming Shen, Wenxuan Luo, Leixin Guo, Ting Chen, Jiashui Wang, Beibei Li, Xiaosong Zhang, and Jiachi Chen. 2025. Beyond the protocol: Un- veiling attack vectors in the model context protocol ecosystem.arXiv preprint arXiv:2506.02040(2025)
arXiv 2025
-
[28]
Zihan Wang, Rui Zhang, Yu Liu, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Hongwei Li, and Guowen Xu. 2026. Mpma: Preference manipulation attack against model context protocol. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 35838–35846
2026
-
[29]
Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. 2023. Jailbroken: How Does LLM Safety Training Fail?. InAdvances in Neural Information Processing Systems, Vol. 36. 80079–80110
2023
-
[30]
Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Yu Liu, Zongjie Li, Ping Chen, Shuai Wang, and Dongdong She. 2025. Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security As- sessment.arXiv preprint arXiv:2509.05755(2025)
arXiv 2025
-
[31]
Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, and Meng Han. 2025. MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications.arXiv preprint arXiv:2508.10991(2025)
arXiv 2025
-
[32]
Junjie Xiong, Changjia Zhu, Shuhang Lin, Chong Zhang, Yongfeng Zhang, Yao Liu, and Lingyao Li. 2025. Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models.arXiv preprint arXiv:2505.16957(2025)
arXiv 2025
-
[33]
Shuli Zhao, Qinsheng Hou, Zihan Zhan, Yanhao Wang, Yuchong Xie, Yu Guo, Libo Chen, Shenghong Li, and Zhi Xue. 2025. Mind Your Server: A Systematic Study of Parasitic Toolchain Attacks on the MCP Ecosystem.arXiv preprint arXiv:2509.06572(2025). A Ethical Considerations All MCP poisoning attack scenarios evaluated in this work were executed solely on the au...
Pith/arXiv arXiv 2025
-
[34]
TPA (Direct Injection) The unencoded malicious payload is blatantly exposed in the target tool’s description
-
[35]
Figure 8:Poisoned Tool Example of TPA
Puppet Attack (Indirect Hijacking) A benign-looking target tool is manipulated by a payload injected into a separate, attacker-controlled tool (e.g.,EnvSetup). Figure 8:Poisoned Tool Example of TPA. Figure 9:Poisoned Tool Example of Puppet Attack
-
[36]
Figure 10:Poisoned Tool Example of Enc-Only Attack
Encode-Only Attack (Semantic Obfuscation) The payload is obfuscated via ASCII, requiring the agent to perform explicit decoding. Figure 10:Poisoned Tool Example of Enc-Only Attack
-
[37]
Do you authorize modi- fying the API_KEY file before proceeding?
ShareLock The malicious payload is fragmented and cryptographically dis- guised as standard metadata (checksum), accompanied by a plausi- ble compliance policy to trigger the reconstruction silently. E.3 Ablation Study Settings and Observations To maximize the evaluation coverage of state-of-the-art LLMs while maintaining computational resource efficiency...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.