Semantic Attacks on Tool-Augmented LLMs: Securing the Model Context Protocol Against Descriptor-Level Manipulation

Arghavan Moradi Dakhel; Foutse Khomh; Kawser Wazed Nafi; Saeid Jamshidi

arxiv: 2512.06556 · v2 · pith:RJOB5SJXnew · submitted 2025-12-06 · 💻 cs.CR · cs.AI

Semantic Attacks on Tool-Augmented LLMs: Securing the Model Context Protocol Against Descriptor-Level Manipulation

Saeid Jamshidi , Arghavan Moradi Dakhel , Kawser Wazed Nafi , Foutse Khomh This is my paper

Pith reviewed 2026-05-22 12:43 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords semantic attackstool-augmented LLMsModel Context Protocoldescriptor manipulationLLM securitytool poisoningadversarial robustness

0 comments

The pith

Tool descriptor manipulation in the Model Context Protocol can steer LLMs toward unsafe tool selections up to 36 percent of the time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that the Model Context Protocol allows LLMs to call external tools by feeding their descriptions straight into the reasoning context, yet those descriptions are accepted without verification. This setup lets attackers rewrite the metadata to favor dangerous choices through three defined attack patterns. Tests on multiple models and prompting styles confirm that baseline systems suffer high rates of unsafe tool use. A defense built from metadata checks, review by a separate model, and runtime rules cuts the unsafe rate while blocking most attempts, all without retraining the main model. The result matters because tool-augmented LLMs are moving into real decision-making roles where bad tool choices carry direct costs.

Core claim

Descriptor manipulation in the Model Context Protocol creates a semantic attack surface that biases LLM tool selection. The work defines three attack classes: Tool Poisoning, Shadowing, and Rug Pull. A full-stack mitigation using descriptor integrity verification, auxiliary-LLM semantic vetting before context insertion, and lightweight runtime guardrails reduces unsafe invocations from as high as 36 percent to 15 percent and raises the block rate to 74 percent across GPT-5.3, DeepSeek-V3, and LLaMA-3.5 in controlled adversarial scenarios.

What carries the argument

Model Context Protocol tool descriptors as the attack vector, defended by a three-layer stack of integrity verification, pre-context auxiliary-LLM semantic vetting, and runtime guardrails.

If this is right

Tool selection behavior in LLMs becomes more predictable once descriptor metadata receives integrity checks and semantic review.
Robustness varies across model families and prompting styles, so defenses must be tuned per architecture.
Secure tool use can be added to existing LLMs without retraining or architectural changes.
Descriptor attacks form a threat category distinct from prompt injection that requires metadata-specific controls.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Descriptor manipulation risks likely appear in any agent framework that supplies capability or tool metadata to the model context.
Chaining the auxiliary vetting model itself could create new attack paths worth testing in multi-stage setups.
Future tool-calling standards may need built-in security fields in descriptors to reduce reliance on post-hoc vetting.

Load-bearing premise

Controlled lab scenarios that manually alter tool metadata accurately capture the capabilities and goals of realistic attackers.

What would settle it

Measure unsafe tool invocation rates when the same descriptor changes are applied inside a live production tool-calling system instead of simulated tests.

Figures

Figures reproduced from arXiv: 2512.06556 by Arghavan Moradi Dakhel, Foutse Khomh, Kawser Wazed Nafi, Saeid Jamshidi.

**Figure 2.** Figure 2: Threat model for Tool Poisoning, Shadowing, and Rug Pull attacks in MCP-based LLM. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Average Prompt Length vs. Unsafe Tool Invocation Across Strategies. [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Mean tool invocation latency across prompting strategies. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Latency distribution across LLMs [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Prompting strategy usage distribution across models. [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Latency distributions per prompting strategy across models. [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Block Rate (%) of LLMs across benign and adversarial tool settings. [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Case study flow for tool misuse in an MCP-based email assistant. [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

read the original abstract

The Model Context Protocol (MCP) enables Large Language Models (LLMs) to interact with external tools via tool descriptors, thereby extending their capabilities for task execution, autonomous decision-making, and multi-agent coordination. Existing MCP deployments treat tool descriptors as trusted metadata, despite their direct integration into the LLM reasoning context. This introduces a previously underexplored semantic attack surface. Current defenses primarily target prompt injection, neglecting descriptor-level manipulation that can bias tool selection and downstream reasoning. To address this gap, we formalize three descriptor-driven attack classes: Tool Poisoning, Shadowing, and Rug Pull. We propose a layered defense solution that integrates descriptor integrity verification, pre-context semantic vetting with an auxiliary LLM, and lightweight runtime guardrails, without requiring model retraining. We evaluate GPT-5.3, DeepSeek-V3, and LLaMA-3.5 across eight prompting strategies in controlled, adversarial MCP scenarios in which tool metadata is manipulated to simulate realistic attacks. Results demonstrate that descriptor manipulation can substantially alter tool-selection behavior, producing unsafe tool invocations in up to 36% of trials under baseline configurations. The proposed full-stack mitigation reduces unsafe invocations to 15% while increasing the block rate to 74%, demonstrating substantial improvement in resistance to descriptor-driven attacks. Cross-model analysis further reveals significant differences in robustness, latency, and sensitivity to descriptor-level manipulation across LLM architectures and prompting strategies. This study provides a controlled cross-model evaluation of descriptor-level threats and mitigation strategies in tool-calling LLM systems, establishing an empirical foundation for deploying secure and resilient tool-augmented LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Descriptor manipulation can push tool-augmented LLMs into unsafe calls, and the layered defense cuts the rate from 36% to 15% in their tests, but the attacks assume full attacker control over metadata that real systems may not grant.

read the letter

The main thing to know is that this paper flags descriptor-level changes as a way to bias tool selection in MCP setups, and their combined defenses reduce unsafe invocations from 36% down to 15% while hitting a 74% block rate across the models they tested. They name three attack patterns—Tool Poisoning, Shadowing, and Rug Pull—and treat them as distinct from classic prompt injection. That framing is new enough to be worth noting, since prior work has not zeroed in on metadata edits inside the tool descriptor itself. The evaluation runs the same scenarios on GPT-5.3, DeepSeek-V3, and LLaMA-3.5 with eight prompting strategies, and the defense stack (integrity checks plus auxiliary-LLM vetting plus runtime guardrails) avoids any retraining, which keeps it deployable. Those pieces are the parts that hold up cleanly on the evidence given. The softer part is the attack realism. The experiments give the adversary direct write access to every descriptor field before it enters context. In actual MCP deployments, descriptors often arrive through registries or authenticated registration flows, so reaching the same edits would require supply-chain compromise or equivalent privilege. The paper calls the scenarios “realistic,” but it does not show how an external attacker would achieve those edits under normal constraints. That gap means the reported deltas are best read as upper-bound effects inside a controlled simulation rather than direct measures of field risk. The abstract also skips trial counts, labeling rules for unsafe calls, and any statistical tests, so the percentages are harder to weight than they should be. This is useful reading for anyone building or securing tool-calling agents and multi-agent systems. The cross-model data and the concrete defense recipe give practitioners something to try, even if the threat model needs tighter grounding. It is solid enough on structure and scope to go to a serious referee rather than a desk reject.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that descriptor-level manipulations in the Model Context Protocol enable semantic attacks (Tool Poisoning, Shadowing, and Rug Pull) that can induce unsafe tool invocations in LLMs at rates up to 36% under baseline conditions. It proposes a full-stack mitigation combining descriptor integrity verification, auxiliary-LLM semantic vetting, and runtime guardrails that reduces unsafe invocations to 15% and raises the block rate to 74%. The evaluation covers GPT-5.3, DeepSeek-V3, and LLaMA-3.5 across eight prompting strategies in controlled adversarial scenarios with manually edited tool metadata, plus cross-model analysis of robustness and latency.

Significance. If the quantitative results hold after fuller experimental reporting and the attack scenarios are shown to be realizable under realistic deployment constraints, the work would be significant for LLM security. It identifies a previously neglected attack surface in tool-augmented systems and offers practical, training-free defenses. The cross-model comparison supplies useful data on architectural differences in susceptibility, which could guide secure MCP design.

major comments (3)

[Evaluation] The abstract and evaluation report concrete figures (36% baseline unsafe invocations, reduction to 15%, 74% block rate) but supply no trial counts, statistical significance tests, confidence intervals, or explicit criteria for labeling an invocation unsafe. This information is required to evaluate reproducibility and reliability of the central empirical claims.
[Threat Model and Attack Classes] The threat model and attack implementation grant the adversary complete, direct control over every field in the tool descriptor. The manuscript does not discuss or validate how an attacker would obtain this level of access in typical MCP deployments (authenticated registration, versioned registries, or external fetches), which is load-bearing for interpreting the reported deltas as realistic attack success rates rather than simulation artifacts.
[Proposed Defense] The semantic-vetting layer relies on an auxiliary LLM whose own robustness to descriptor manipulation or prompt injection is not evaluated. If this auxiliary model can be compromised or biased, the mitigation stack's effectiveness would be undermined; this assumption is central to the defense claims.

minor comments (1)

[Abstract] The eight prompting strategies are referenced but not enumerated or described in the abstract or early sections; a short list or pointer to the relevant subsection would aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments help clarify the presentation of our empirical results, threat model assumptions, and defense assumptions. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Evaluation] The abstract and evaluation report concrete figures (36% baseline unsafe invocations, reduction to 15%, 74% block rate) but supply no trial counts, statistical significance tests, confidence intervals, or explicit criteria for labeling an invocation unsafe. This information is required to evaluate reproducibility and reliability of the central empirical claims.

Authors: We agree that greater statistical transparency is needed. The full evaluation section describes experiments across GPT-5.3, DeepSeek-V3, and LLaMA-3.5 with eight prompting strategies, but we will revise to explicitly state the trial count per configuration (100 trials), report 95% confidence intervals, apply appropriate significance tests (e.g., McNemar’s test for paired comparisons), and provide a precise definition of “unsafe invocation” based on policy violation or match to adversarial tool signatures. These additions will be placed in a new subsection on experimental methodology. revision: yes
Referee: [Threat Model and Attack Classes] The threat model and attack implementation grant the adversary complete, direct control over every field in the tool descriptor. The manuscript does not discuss or validate how an attacker would obtain this level of access in typical MCP deployments (authenticated registration, versioned registries, or external fetches), which is load-bearing for interpreting the reported deltas as realistic attack success rates rather than simulation artifacts.

Authors: The threat model is intentionally scoped to descriptor-level manipulation once an attacker has write access to the tool metadata presented to the LLM. We will expand the threat-model section with a dedicated paragraph on realizability, citing realistic vectors such as compromised tool registries, malicious third-party tool providers in open ecosystems, and supply-chain attacks on external descriptor fetches. This will clarify that the reported success rates apply to deployments lacking strong authentication or integrity enforcement on tool registration, while noting that stronger registry controls would raise the bar for the attacker. revision: yes
Referee: [Proposed Defense] The semantic-vetting layer relies on an auxiliary LLM whose own robustness to descriptor manipulation or prompt injection is not evaluated. If this auxiliary model can be compromised or biased, the mitigation stack's effectiveness would be undermined; this assumption is central to the defense claims.

Authors: We acknowledge that the auxiliary LLM’s own susceptibility was not directly tested. In revision we will add a short analysis subsection that (a) evaluates the auxiliary model on the same descriptor-manipulation corpus and (b) discusses the layered design: even partial compromise of the auxiliary layer is mitigated by the preceding descriptor-integrity check and the subsequent runtime guardrails. We will also note that the auxiliary model can itself be hardened or replaced with a smaller, fine-tuned classifier if desired. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical attack/mitigation rates are direct measurements

full rationale

The paper formalizes three attack classes (Tool Poisoning, Shadowing, Rug Pull) via controlled metadata edits, implements a layered defense (integrity verification + auxiliary LLM vetting + guardrails), and reports measured outcomes (unsafe invocations 36% baseline to 15%, block rate 74%) across GPT-5.3, DeepSeek-V3, and LLaMA-3.5 under eight prompting strategies. These percentages are obtained from explicit trial runs on manipulated descriptors; they are not obtained by fitting parameters to a subset and relabeling the fit as a prediction, nor by any self-referential definition or self-citation chain that would make the result equivalent to its inputs by construction. The evaluation setup is self-contained against the stated benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard assumptions about LLM tool-selection behavior and the trustworthiness of an auxiliary model for vetting; no free parameters are fitted to produce the headline percentages, and no new entities are postulated.

axioms (2)

domain assumption Tool descriptors are directly incorporated into the LLM reasoning context and treated as trusted metadata.
Stated in the opening paragraph of the abstract as the premise that creates the attack surface.
domain assumption An auxiliary LLM can reliably detect semantic manipulation in tool descriptors without itself being compromised.
Implicit in the description of the pre-context semantic vetting layer.

pith-pipeline@v0.9.0 · 5842 in / 1448 out tokens · 44013 ms · 2026-05-22T12:43:08.792819+00:00 · methodology

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sealing the Audit-Runtime Gap for LLM Skills
cs.CR 2026-05 unverdicted novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security
cs.CR 2026-04 conditional novelty 7.0

MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
cs.CR 2026-04 unverdicted novelty 6.0

MCPSHIELD offers a threat taxonomy of 23 attack vectors, a labeled transition system verification model, and a defense-in-depth architecture claiming 91% coverage for MCP-based AI agents.
Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP
cs.CR 2026-02 unverdicted novelty 5.0

The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.
CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems
cs.CR 2026-04 unverdicted novelty 4.0

CASCADE is a cascaded hybrid detector that combines fast regex/entropy filtering, BGE embeddings with local LLM fallback, and output pattern checks to achieve 95.85% precision and 6.06% false-positive rate against pro...

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 5 Pith papers · 2 internal anchors

[1]

Server Tools — Model Context Protocol (MCP) Specification (Draft)

2024. Server Tools — Model Context Protocol (MCP) Specification (Draft). Online documentation. https: //modelcontextprotocol.info/specification/draft/server/tools/ Accessed on 2025-09-05

work page 2024
[2]

Samuel Aidoo and AML Int Dip. 2025. Cryptocurrency and Financial Crime: Emerging Risks and Regulatory Responses. (2025)

work page 2025
[3]

Rohan Ajwani, Shashidhar Reddy Javaji, Frank Rudzicz, and Zining Zhu. 2024. LLM-generated black-box explanations can be adversarially helpful.arXiv preprint arXiv:2405.06800(2024)

work page arXiv 2024
[4]

S Akheel. 2025. Guardrails for large language models: A review of techniques and challenges.J Artif Intell Mach Learn & Data Sci3, 1 (2025), 2504–2512. J. ACM, Vol. 37, No. 4, Article 111. Publication date: November 2025. Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks 111:31

work page 2025
[5]

Anthropic. 2025. Our Framework for Developing Safe and Trustworthy Agents. Online article. https://www.anthropic. com/news/our-framework-for-developing-safe-and-trustworthy-agents

work page 2025
[6]

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, et al. 2025. Design patterns for securing llm agents against prompt injections.arXiv preprint arXiv:2506.08837(2025)

work page arXiv 2025
[7]

Manish Bhatt, Vineeth Sai Narajala, and Idan Habler. 2025. Etdi: Mitigating tool squatting and rug pull attacks in model context protocol (mcp) by using oauth-enhanced tool definitions and policy-based access control.arXiv preprint arXiv:2506.01333(2025)

work page arXiv 2025
[8]

Gordon Owusu Boateng, Hani Sami, Ahmed Alagha, Hanae Elmekki, Ahmad Hammoud, Rabeb Mizouni, Azzam Mourad, Hadi Otrok, Jamal Bentahar, Sami Muhaidat, et al. 2025. A survey on large language models for communication, network, and service management: Application insights, challenges, and future directions.IEEE Communications Surveys & Tutorials(2025)

work page 2025
[9]

Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang, et al . 2024. When large language models meet personalization: Perspectives of challenges and opportunities.World Wide Web27, 4 (2024), 42

work page 2024
[10]

Tian Dong, Minhui Xue, Guoxing Chen, Rayne Holland, Yan Meng, Shaofeng Li, Zhen Liu, and Haojin Zhu. 2023. The philosopher’s stone: Trojaning plugins of large language models.arXiv preprint arXiv:2312.00374(2023)

work page arXiv 2023
[11]

Xiang Fei, Xiawu Zheng, and Hao Feng. 2025. MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch.arXiv preprint arXiv:2506.01056(2025)

work page arXiv 2025
[12]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, and Merouane Debbah. 2025. From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows.arXiv preprint arXiv:2506.23260 (2025)

work page arXiv 2025
[13]

Florencio Cano Gabarda. 2025. Model Context Protocol (MCP): Understanding Security Risks and Controls. https: //www.redhat.com/en/blog/model-context-protocol-mcp-understanding-security-risks-and-controls. Accessed: 2025-08-04

work page 2025
[14]

John Halloran. 2025. MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment.arXiv preprint arXiv:2505.23634(2025)

work page arXiv 2025
[15]

Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, and Ahmed E Hassan

work page
[16]

Model context protocol (mcp) at first glance: Studying the security and maintainability of mcp servers.arXiv preprint arXiv:2506.13538(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Mahbub Hassan, Md Emtiaz Kabir, Muzammil Jusoh, Hong Ki An, Michael Negnevitsky, and Chengjiang Li. 2025. Large Language Models in Transportation: A Comprehensive Bibliometric Analysis of Emerging Trends, Challenges and Future Research.IEEE Access(2025)

work page 2025
[19]

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model context protocol (mcp): Landscape, security threats, and future research directions.arXiv preprint arXiv:2503.23278(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Zeyang Sha, Yuyuan Li, Changting Lin, et al. 2025. A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures.arXiv preprint arXiv:2506.19676(2025)

work page arXiv 2025
[21]

2024.Automotive security solution using hardware security module (HSM)

Arvind Kumar, Ashish Gholve, and Kedar Kotalwar. 2024.Automotive security solution using hardware security module (HSM). Technical Report. SAE Technical Paper

work page 2024
[22]

Sonu Kumar, Anubhav Girdhar, Ritesh Patil, and Divyansh Tripathi. 2025. Mcp guardian: A security-first layer for safeguarding mcp-based ai system.arXiv preprint arXiv:2504.12757(2025)

work page arXiv 2025
[23]

Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H Abdi, Yuqing Yang, and Lili Qiu. 2025. SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression.arXiv preprint arXiv:2506.12707(2025)

work page arXiv 2025
[24]

Zichuan Li, Jian Cui, Xiaojing Liao, and Luyi Xing. 2025. Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents.arXiv preprint arXiv:2504.03111(2025)

work page arXiv 2025
[25]

Anne Lott and Jerome P Reiter. 2020. Wilson confidence intervals for binomial proportions with multiple imputation for missing data.The American Statistician74, 2 (2020), 109–115

work page 2020
[26]

Weiqin Ma, Pu Duan, Sanmin Liu, Guofei Gu, and Jyh-Charn Liu. 2012. Shadow attacks: automatically evading system-call-behavior based malware detection.Journal in Computer Virology8, 1 (2012), 1–13

work page 2012
[27]

Shreekant Mandvikar. 2023. Augmenting intelligent document processing (IDP) workflows with contemporary large language models (LLMs).International Journal of Computer Trends and Technology71, 10 (2023), 80–91

work page 2023
[28]

Jeremy McHugh, Kristina Šekrst, and Jon Cefalu. 2025. Prompt Injection 2.0: Hybrid AI Threats.arXiv preprint arXiv:2507.13169(2025). J. ACM, Vol. 37, No. 4, Article 111. Publication date: November 2025. 111:32 Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh, and Naser Ezzati-Jivan

work page arXiv 2025
[29]

Vineeth Sai Narajala and Idan Habler. 2025. Enterprise-grade security for the model context protocol (mcp): Frameworks and mitigation strategies.arXiv preprint arXiv:2504.08623(2025)

work page arXiv 2025
[30]

Thanh Toan Nguyen, Nguyen Quoc Viet Hung, Thanh Tam Nguyen, Thanh Trung Huynh, Thanh Thi Nguyen, Matthias Weidlich, and Hongzhi Yin. 2024. Manipulating recommender systems: A survey of poisoning attacks and countermeasures.Comput. Surveys57, 1 (2024), 1–39

work page 2024
[31]

Esezi Isaac Obilor and Eric Chikweru Amadi. 2018. Test for significance of Pearson’s correlation coefficient.International Journal of Innovative Mathematics, Statistics & Energy Policies6, 1 (2018), 11–23

work page 2018
[32]

János Pintz. 2007. Cramér vs. Cramér. On Cramér’s probabilistic model for primes.Functiones et Approximatio Commentarii Mathematici37, 2 (2007), 361–376

work page 2007
[33]

Brandon Radosevich and John Halloran. 2025. Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767(2025)

work page arXiv 2025
[34]

Partha Pratim Ray. 2025. A survey on model context protocol: Architecture, state-of-the-art, challenges and future directions.Authorea Preprints(2025)

work page 2025
[35]

Anjana Sarkar and Soumyendu Sarkar. 2025. Survey of LLM Agent Communication with MCP: A Software Design Pattern Centric Review.arXiv preprint arXiv:2506.05364(2025)

work page arXiv 2025
[36]

Oleksii I Sheremet, Oleksandr V Sadovoi, Kateryna S Sheremet, and Yuliia V Sokhina. 2024. Effective documentation practices for enhancing user interaction through GPT-powered conversational interfaces.Applied Aspects of Information Technology7, 2 (2024), 135–150

work page 2024
[37]

Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. 2025. A survey of the model context protocol (mcp): Standardizing context to enhance large language models (llms). (2025)

work page 2025
[38]

Lars St, Svante Wold, et al. 1989. Analysis of variance (ANOVA).Chemometrics and intelligent laboratory systems6, 4 (1989), 259–272

work page 1989
[39]

Tal Shapira / Reco.ai. 2025. MCP Security: Key Risks, Controls & Best Practices Explained. Online article. https: //www.reco.ai/learn/mcp-security Updated August 7, 2025; accessed September 5, 2025

work page 2025
[40]

Zhibo Wang, Jingjing Ma, Xue Wang, Jiahui Hu, Zhan Qin, and Kui Ren. 2022. Threats to training: A survey of poisoning attacks and defenses on machine learning systems.Comput. Surveys55, 7 (2022), 1–36

work page 2022
[41]

Zhiqiang Wang, Junyang Zhang, Guanquan Shi, HaoRan Cheng, Yunhao Yao, Kaiwen Guo, Haohua Du, and Xiang-Yang Li. 2025. MindGuard: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph.arXiv preprint arXiv:2508.20412(2025)

work page arXiv 2025
[42]

2012.Experimentation in software engineering

Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in software engineering. Springer Science & Business Media

work page 2012
[43]

2016.An extensible component & connector architecture description infrastructure for multi-platform modeling

Andreas Wortmann. 2016.An extensible component & connector architecture description infrastructure for multi-platform modeling. Vol. 25. Shaker Verlag GmbH. J. ACM, Vol. 37, No. 4, Article 111. Publication date: November 2025

work page 2016

[1] [1]

Server Tools — Model Context Protocol (MCP) Specification (Draft)

2024. Server Tools — Model Context Protocol (MCP) Specification (Draft). Online documentation. https: //modelcontextprotocol.info/specification/draft/server/tools/ Accessed on 2025-09-05

work page 2024

[2] [2]

Samuel Aidoo and AML Int Dip. 2025. Cryptocurrency and Financial Crime: Emerging Risks and Regulatory Responses. (2025)

work page 2025

[3] [3]

Rohan Ajwani, Shashidhar Reddy Javaji, Frank Rudzicz, and Zining Zhu. 2024. LLM-generated black-box explanations can be adversarially helpful.arXiv preprint arXiv:2405.06800(2024)

work page arXiv 2024

[4] [4]

S Akheel. 2025. Guardrails for large language models: A review of techniques and challenges.J Artif Intell Mach Learn & Data Sci3, 1 (2025), 2504–2512. J. ACM, Vol. 37, No. 4, Article 111. Publication date: November 2025. Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks 111:31

work page 2025

[5] [5]

Anthropic. 2025. Our Framework for Developing Safe and Trustworthy Agents. Online article. https://www.anthropic. com/news/our-framework-for-developing-safe-and-trustworthy-agents

work page 2025

[6] [6]

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, et al. 2025. Design patterns for securing llm agents against prompt injections.arXiv preprint arXiv:2506.08837(2025)

work page arXiv 2025

[7] [7]

Manish Bhatt, Vineeth Sai Narajala, and Idan Habler. 2025. Etdi: Mitigating tool squatting and rug pull attacks in model context protocol (mcp) by using oauth-enhanced tool definitions and policy-based access control.arXiv preprint arXiv:2506.01333(2025)

work page arXiv 2025

[8] [8]

Gordon Owusu Boateng, Hani Sami, Ahmed Alagha, Hanae Elmekki, Ahmad Hammoud, Rabeb Mizouni, Azzam Mourad, Hadi Otrok, Jamal Bentahar, Sami Muhaidat, et al. 2025. A survey on large language models for communication, network, and service management: Application insights, challenges, and future directions.IEEE Communications Surveys & Tutorials(2025)

work page 2025

[9] [9]

Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang, et al . 2024. When large language models meet personalization: Perspectives of challenges and opportunities.World Wide Web27, 4 (2024), 42

work page 2024

[10] [10]

Tian Dong, Minhui Xue, Guoxing Chen, Rayne Holland, Yan Meng, Shaofeng Li, Zhen Liu, and Haojin Zhu. 2023. The philosopher’s stone: Trojaning plugins of large language models.arXiv preprint arXiv:2312.00374(2023)

work page arXiv 2023

[11] [11]

Xiang Fei, Xiawu Zheng, and Hao Feng. 2025. MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch.arXiv preprint arXiv:2506.01056(2025)

work page arXiv 2025

[12] [12]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, and Merouane Debbah. 2025. From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows.arXiv preprint arXiv:2506.23260 (2025)

work page arXiv 2025

[13] [13]

Florencio Cano Gabarda. 2025. Model Context Protocol (MCP): Understanding Security Risks and Controls. https: //www.redhat.com/en/blog/model-context-protocol-mcp-understanding-security-risks-and-controls. Accessed: 2025-08-04

work page 2025

[14] [14]

John Halloran. 2025. MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment.arXiv preprint arXiv:2505.23634(2025)

work page arXiv 2025

[15] [15]

Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, and Ahmed E Hassan

work page

[16] [16]

Model context protocol (mcp) at first glance: Studying the security and maintainability of mcp servers.arXiv preprint arXiv:2506.13538(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Mahbub Hassan, Md Emtiaz Kabir, Muzammil Jusoh, Hong Ki An, Michael Negnevitsky, and Chengjiang Li. 2025. Large Language Models in Transportation: A Comprehensive Bibliometric Analysis of Emerging Trends, Challenges and Future Research.IEEE Access(2025)

work page 2025

[18] [19]

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model context protocol (mcp): Landscape, security threats, and future research directions.arXiv preprint arXiv:2503.23278(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [20]

Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Zeyang Sha, Yuyuan Li, Changting Lin, et al. 2025. A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures.arXiv preprint arXiv:2506.19676(2025)

work page arXiv 2025

[20] [21]

2024.Automotive security solution using hardware security module (HSM)

Arvind Kumar, Ashish Gholve, and Kedar Kotalwar. 2024.Automotive security solution using hardware security module (HSM). Technical Report. SAE Technical Paper

work page 2024

[21] [22]

Sonu Kumar, Anubhav Girdhar, Ritesh Patil, and Divyansh Tripathi. 2025. Mcp guardian: A security-first layer for safeguarding mcp-based ai system.arXiv preprint arXiv:2504.12757(2025)

work page arXiv 2025

[22] [23]

Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H Abdi, Yuqing Yang, and Lili Qiu. 2025. SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression.arXiv preprint arXiv:2506.12707(2025)

work page arXiv 2025

[23] [24]

Zichuan Li, Jian Cui, Xiaojing Liao, and Luyi Xing. 2025. Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents.arXiv preprint arXiv:2504.03111(2025)

work page arXiv 2025

[24] [25]

Anne Lott and Jerome P Reiter. 2020. Wilson confidence intervals for binomial proportions with multiple imputation for missing data.The American Statistician74, 2 (2020), 109–115

work page 2020

[25] [26]

Weiqin Ma, Pu Duan, Sanmin Liu, Guofei Gu, and Jyh-Charn Liu. 2012. Shadow attacks: automatically evading system-call-behavior based malware detection.Journal in Computer Virology8, 1 (2012), 1–13

work page 2012

[26] [27]

Shreekant Mandvikar. 2023. Augmenting intelligent document processing (IDP) workflows with contemporary large language models (LLMs).International Journal of Computer Trends and Technology71, 10 (2023), 80–91

work page 2023

[27] [28]

Jeremy McHugh, Kristina Šekrst, and Jon Cefalu. 2025. Prompt Injection 2.0: Hybrid AI Threats.arXiv preprint arXiv:2507.13169(2025). J. ACM, Vol. 37, No. 4, Article 111. Publication date: November 2025. 111:32 Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh, and Naser Ezzati-Jivan

work page arXiv 2025

[28] [29]

Vineeth Sai Narajala and Idan Habler. 2025. Enterprise-grade security for the model context protocol (mcp): Frameworks and mitigation strategies.arXiv preprint arXiv:2504.08623(2025)

work page arXiv 2025

[29] [30]

Thanh Toan Nguyen, Nguyen Quoc Viet Hung, Thanh Tam Nguyen, Thanh Trung Huynh, Thanh Thi Nguyen, Matthias Weidlich, and Hongzhi Yin. 2024. Manipulating recommender systems: A survey of poisoning attacks and countermeasures.Comput. Surveys57, 1 (2024), 1–39

work page 2024

[30] [31]

Esezi Isaac Obilor and Eric Chikweru Amadi. 2018. Test for significance of Pearson’s correlation coefficient.International Journal of Innovative Mathematics, Statistics & Energy Policies6, 1 (2018), 11–23

work page 2018

[31] [32]

János Pintz. 2007. Cramér vs. Cramér. On Cramér’s probabilistic model for primes.Functiones et Approximatio Commentarii Mathematici37, 2 (2007), 361–376

work page 2007

[32] [33]

Brandon Radosevich and John Halloran. 2025. Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767(2025)

work page arXiv 2025

[33] [34]

Partha Pratim Ray. 2025. A survey on model context protocol: Architecture, state-of-the-art, challenges and future directions.Authorea Preprints(2025)

work page 2025

[34] [35]

Anjana Sarkar and Soumyendu Sarkar. 2025. Survey of LLM Agent Communication with MCP: A Software Design Pattern Centric Review.arXiv preprint arXiv:2506.05364(2025)

work page arXiv 2025

[35] [36]

Oleksii I Sheremet, Oleksandr V Sadovoi, Kateryna S Sheremet, and Yuliia V Sokhina. 2024. Effective documentation practices for enhancing user interaction through GPT-powered conversational interfaces.Applied Aspects of Information Technology7, 2 (2024), 135–150

work page 2024

[36] [37]

Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. 2025. A survey of the model context protocol (mcp): Standardizing context to enhance large language models (llms). (2025)

work page 2025

[37] [38]

Lars St, Svante Wold, et al. 1989. Analysis of variance (ANOVA).Chemometrics and intelligent laboratory systems6, 4 (1989), 259–272

work page 1989

[38] [39]

Tal Shapira / Reco.ai. 2025. MCP Security: Key Risks, Controls & Best Practices Explained. Online article. https: //www.reco.ai/learn/mcp-security Updated August 7, 2025; accessed September 5, 2025

work page 2025

[39] [40]

Zhibo Wang, Jingjing Ma, Xue Wang, Jiahui Hu, Zhan Qin, and Kui Ren. 2022. Threats to training: A survey of poisoning attacks and defenses on machine learning systems.Comput. Surveys55, 7 (2022), 1–36

work page 2022

[40] [41]

Zhiqiang Wang, Junyang Zhang, Guanquan Shi, HaoRan Cheng, Yunhao Yao, Kaiwen Guo, Haohua Du, and Xiang-Yang Li. 2025. MindGuard: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph.arXiv preprint arXiv:2508.20412(2025)

work page arXiv 2025

[41] [42]

2012.Experimentation in software engineering

Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in software engineering. Springer Science & Business Media

work page 2012

[42] [43]

2016.An extensible component & connector architecture description infrastructure for multi-platform modeling

Andreas Wortmann. 2016.An extensible component & connector architecture description infrastructure for multi-platform modeling. Vol. 25. Shaker Verlag GmbH. J. ACM, Vol. 37, No. 4, Article 111. Publication date: November 2025

work page 2016