pith. machine review for the scientific record. sign in

arxiv: 2503.23278 · v3 · submitted 2025-03-30 · 💻 cs.CR · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

Authors on Pith no claims yet

Pith reviewed 2026-05-13 08:58 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords Model Context ProtocolMCPthreat taxonomysecurity threatsAI interoperabilitylifecycle analysissecurity safeguardstool-augmented AI
0
0 comments X

The pith

MCP's emerging standard for AI-tool communication carries 16 security risks across four attacker types, addressed by a new lifecycle-based taxonomy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper systematically studies the Model Context Protocol to enhance interoperability between AI models and external tools. It defines a four-phase lifecycle for MCP servers broken into 16 activities and builds a threat taxonomy with 16 scenarios from malicious developers, external attackers, malicious users, and security flaws. This matters for safe adoption of tool-augmented AI systems, as unaddressed risks could lead to vulnerabilities in dynamic discovery and communication. The authors support their taxonomy with real-world case studies and propose phase-specific safeguards while analyzing current industry adoption and suggesting future directions.

Core claim

The central claim is that a comprehensive threat taxonomy for the Model Context Protocol categorizes security and privacy risks across four major attacker types into 16 distinct threat scenarios, built upon a full lifecycle analysis of MCP servers consisting of four phases and 16 key activities, validated through real-world case studies, and accompanied by tailored security safeguards.

What carries the argument

The threat taxonomy, which organizes risks by four attacker types and 16 scenarios derived from the MCP server lifecycle phases of creation, deployment, operation, and maintenance.

If this is right

  • Adopting MCP securely requires implementing the proposed safeguards tailored to each lifecycle phase and threat category.
  • Current limitations in MCP's standardization and trust boundaries must be addressed to enable broader deployment.
  • Industry integrations of MCP need to account for the identified threat scenarios to prevent concrete attacks.
  • Future development of tool-augmented AI systems should incorporate the taxonomy to strengthen security.
  • Analysis of the MCP landscape highlights strengths in interoperability but gaps that constrain sustainable growth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar threat taxonomies could be developed for other emerging AI communication protocols.
  • Automated tools for detecting the 16 threat scenarios might improve practical security in MCP implementations.
  • Extending the case studies to more diverse MCP deployments could reveal additional risks.
  • Integration patterns in the landscape suggest that early security focus in the creation phase could mitigate many downstream threats.

Load-bearing premise

The 16 threat scenarios and real-world case studies are representative of the main risks in actual MCP implementations, with the four attacker types covering the primary threat surface.

What would settle it

A large-scale deployment of MCP where none of the 16 threat scenarios occur despite significant usage, or the discovery of a major vulnerability that does not fit into any of the four attacker types.

read the original abstract

The Model Context Protocol (MCP) is an emerging open standard that defines a unified, bi-directional communication and dynamic discovery protocol between AI models and external tools or resources, aiming to enhance interoperability and reduce fragmentation across diverse systems. This paper presents a systematic study of MCP from both architectural and security perspectives. We first define the full lifecycle of an MCP server, comprising four phases (creation, deployment, operation, and maintenance), further decomposed into 16 key activities that capture its functional evolution. Building on this lifecycle analysis, we construct a comprehensive threat taxonomy that categorizes security and privacy risks across four major attacker types: malicious developers, external attackers, malicious users, and security flaws, encompassing 16 distinct threat scenarios. To validate these risks, we develop and analyze real-world case studies that demonstrate concrete attack surfaces and vulnerability manifestations within MCP implementations. Based on these findings, the paper proposes a set of fine-grained, actionable security safeguards tailored to each lifecycle phase and threat category, offering practical guidance for secure MCP adoption. We also analyze the current MCP landscape, covering industry adoption, integration patterns, and supporting tools, to identify its technological strengths as well as existing limitations that constrain broader deployment. Finally, we outline future research and development directions aimed at strengthening MCP's standardization, trust boundaries, and sustainable growth within the evolving ecosystem of tool-augmented AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to offer a systematic study of the Model Context Protocol (MCP) by defining its full lifecycle in four phases—creation, deployment, operation, and maintenance—broken down into 16 key activities. From this, it builds a threat taxonomy categorizing risks by four attacker types (malicious developers, external attackers, malicious users, and security flaws) into 16 distinct scenarios. These are validated via real-world case studies showing concrete vulnerabilities in MCP implementations. The work also proposes phase-specific security safeguards, analyzes industry adoption and tools, and suggests future research directions for secure MCP development.

Significance. If the taxonomy proves comprehensive and the case studies represent actual MCP deployments, this paper would be significant for the field of AI security by providing the first structured analysis of threats in this emerging interoperability protocol. It could guide developers and standardize security practices in tool-augmented AI systems, addressing a timely gap as MCP gains adoption.

major comments (2)
  1. [Threat Taxonomy Construction] The derivation of the 16 threat scenarios from the 4-phase/16-activity lifecycle is central to the paper's claim of comprehensiveness. However, the manuscript lacks an explicit mapping or table linking each lifecycle activity to the corresponding threat scenarios, making it difficult to verify that all risks are covered without gaps or overlaps.
  2. [Case Studies for Validation] The validation of the taxonomy relies on real-world case studies demonstrating attack surfaces. It is unclear from the description whether these cases are based on documented incidents from deployed MCP servers (with specific implementation details) or are hypothetical scenarios extrapolated from other protocols. This distinction is load-bearing for assessing the taxonomy's practical relevance.
minor comments (2)
  1. [Landscape Analysis] The section on industry adoption and integration patterns would benefit from more quantitative data, such as adoption statistics or specific tool examples, to strengthen the analysis of technological strengths and limitations.
  2. [Future Directions] The outlined future research directions are high-level; providing more concrete research questions or proposed methodologies would enhance their actionability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and transparency.

read point-by-point responses
  1. Referee: [Threat Taxonomy Construction] The derivation of the 16 threat scenarios from the 4-phase/16-activity lifecycle is central to the paper's claim of comprehensiveness. However, the manuscript lacks an explicit mapping or table linking each lifecycle activity to the corresponding threat scenarios, making it difficult to verify that all risks are covered without gaps or overlaps.

    Authors: We agree that an explicit mapping table is needed to demonstrate the systematic derivation. In the revised manuscript, we will add a dedicated table (in the threat taxonomy section) that maps each of the 16 lifecycle activities to the corresponding threat scenarios, explicitly showing coverage and confirming the absence of gaps or overlaps. revision: yes

  2. Referee: [Case Studies for Validation] The validation of the taxonomy relies on real-world case studies demonstrating attack surfaces. It is unclear from the description whether these cases are based on documented incidents from deployed MCP servers (with specific implementation details) or are hypothetical scenarios extrapolated from other protocols. This distinction is load-bearing for assessing the taxonomy's practical relevance.

    Authors: The case studies are grounded in documented incidents from deployed MCP servers, including specific implementation details from open-source projects and reported vulnerabilities. We will revise the validation section to explicitly state the real-world sources, add implementation specifics, and distinguish them from hypothetical scenarios to confirm practical relevance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; taxonomy is an independent analytical construction

full rationale

The paper defines an MCP server lifecycle (four phases decomposed into 16 activities) as an explicit analytical framework and then constructs a threat taxonomy (four attacker types with 16 scenarios) on top of that framework, validating it via separate real-world case studies. No equations, fitted parameters, or derivations appear anywhere in the provided text. The taxonomy does not reduce to its inputs by construction, nor does it rely on self-citation chains or imported uniqueness theorems; the 16 scenarios are presented as categorizations supported by external examples rather than tautological mappings. This is a standard survey-style contribution whose central claims remain independent of any self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on the domain assumption that MCP is an emerging open standard whose lifecycle and risks can be systematically enumerated from current implementations; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption MCP defines a unified bi-directional communication and dynamic discovery protocol between AI models and external tools
    The entire study presupposes the existence and basic structure of MCP as stated in the abstract.

pith-pipeline@v0.9.0 · 5551 in / 1210 out tokens · 39166 ms · 2026-05-13T08:58:04.322685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 30 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

    cs.CR 2026-04 unverdicted novelty 8.0

    Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.

  2. Grid-Orch: An LLM-Powered Orchestrator for Distribution Grid Simulation and Analytics

    eess.SY 2026-05 conditional novelty 7.0

    Grid-Orch is an LLM-orchestrated system with 36 tools that lets users perform distribution grid simulations and optimizations through conversation, matching direct scripting results.

  3. Five Attacks on x402 Agentic Payment Protocol

    cs.CR 2026-05 conditional novelty 7.0

    Five practical attacks on the x402 agentic payment protocol are demonstrated across authorization, binding, replay protection, and web handling, validated on local chains, Base Sepolia, live endpoints, and three open-...

  4. DADL: A Declarative Description Language for Enterprise Tool Libraries in LLM Agent Systems

    cs.SE 2026-05 unverdicted novelty 7.0

    DADL is a declarative YAML format that lets a single runtime handle many REST API tools for LLM agents, cutting tool advertisement context cost by 142x from 142,000 to 1,000 tokens on a catalog of 1,833 definitions.

  5. Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

    cs.CR 2026-04 unverdicted novelty 7.0

    A parameterized DFA firewall enforces safe tool sequences for structured AI agents, reducing attack success rates to 2.2% in tested workflows with low added latency.

  6. A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

    cs.CR 2026-04 unverdicted novelty 7.0

    A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

  7. From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

    cs.AI 2026-04 unverdicted novelty 7.0

    OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).

  8. Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

    cs.CR 2026-04 unverdicted novelty 7.0

    A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.

  9. AgileLog: A Forkable Shared Log for Agents on Data Streams

    cs.DC 2026-04 unverdicted novelty 7.0

    AgileLog introduces forkable shared logs with cheap forking and isolation to support AI agents on data streams.

  10. Listening Alone, Understanding Together: Collaborative Context Recovery for Privacy-Aware AI

    cs.AI 2026-04 conditional novelty 7.0

    CONCORD recovers conversation context in privacy-preserving AI assistants via spatio-temporal resolution, gap detection, and minimal relationship-aware A2A exchanges, achieving 91.4% gap recall, 96% relationship accur...

  11. Multi-Agent Orchestration for High-Throughput Materials Screening on a Leadership-Class System

    cs.AI 2026-04 unverdicted novelty 7.0

    A planner-executor multi-agent system using gpt-oss-120b and Parsl orchestrates scalable high-throughput MOF screening on the Aurora supercomputer with low overhead.

  12. MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

    cs.CR 2026-04 conditional novelty 7.0

    MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.

  13. Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

    cs.CR 2026-04 accept novelty 7.0

    Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.

  14. OpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Research

    cond-mat.mtrl-sci 2026-05 unverdicted novelty 6.0

    OpenAaaS is a hierarchical agent-as-a-service system that enables secure multi-agent collaboration for materials informatics by moving code to data rather than data to code.

  15. ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox

    cs.AI 2026-05 unverdicted novelty 6.0

    ComplexMCP benchmark shows current LLM agents achieve at most 60% success on interdependent tool tasks versus 90% for humans, due to tool retrieval saturation, over-confidence, and strategic defeatism.

  16. EvidenT: An Evidence-Preserving Framework for Iterative System-Level Package Repair

    cs.SE 2026-05 unverdicted novelty 6.0

    EvidenT repairs 53.88% of real-world RISC-V system-level package build failures by preserving repair history and build artifacts in a closed-loop validation system, outperforming baselines by a wide margin.

  17. When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

    cs.CR 2026-05 unverdicted novelty 6.0

    Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.

  18. Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

    cs.SE 2026-05 unverdicted novelty 6.0

    MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.

  19. Augmenting Interface Usability Heuristics for Reliable Computer-Use Agents

    cs.HC 2026-05 unverdicted novelty 6.0

    Augmented Nielsen heuristics improve computer-use agent task completion on varied interfaces while preserving human usability, as shown in UI-Verse experiments and human studies.

  20. Less Is More: Measuring How LLM Involvement affects Chatbot Accuracy in Static Analysis

    cs.SE 2026-04 unverdicted novelty 6.0

    A structured JSON intermediate representation for LLM-generated static analysis queries outperforms both direct generation and agentic tool use, with gains of 15-25 percentage points on large models.

  21. Diagnosing CFG Interpretation in LLMs

    cs.AI 2026-04 unverdicted novelty 6.0

    LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.

  22. How Adversarial Environments Mislead Agentic AI?

    cs.AI 2026-04 unverdicted novelty 6.0

    Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.

  23. QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance

    cs.MA 2026-04 unverdicted novelty 6.0

    QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.

  24. Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations

    physics.comp-ph 2026-03 unverdicted novelty 6.0

    QMP-Bench supplies a realistic test set for AI on quantum many-body problems while PhysVEC uses integrated verifiers to turn unreliable LLM generations into code that passes both syntax and physics checks, outperformi...

  25. Beyond Individual Mimicry: Constructing Human-Like Social network with Graph-Augmented LLM Agents

    cs.SI 2026-03 unverdicted novelty 6.0

    GraphMind equips LLM agents with graph awareness to construct human-like social networks, producing botnets that substantially degrade performance of both text-based and graph-based detectors.

  26. Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol

    cs.SE 2026-03 unverdicted novelty 6.0

    The paper proposes Context-Aware Broker Protocol, Adaptive Timeout Budget Allocation, and Structured Error Recovery Framework to address gaps in identity, budgeting, and error handling for production AI agent deployme...

  27. A Prompt-Aware Structuring Framework for Reliable Reuse of AI-Generated Content in the Agentic Web

    cs.AI 2026-05 unverdicted novelty 5.0

    A framework structures AI-generated content with prompt-aware metadata and verifiable credentials to support reliable assessment and reuse by agents.

  28. Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents

    cs.AI 2026-04 unverdicted novelty 5.0

    A neurocognitive governance model formalizes a Pre-Action Governance Reasoning Loop that consults global, workflow, agent, and situational rules before each action, yielding 95% compliance accuracy with zero false esc...

  29. From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

    cs.AI 2025-04 accept novelty 4.0

    A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.

  30. Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

    cs.AI 2026-04 unverdicted novelty 3.0

    Flowr is an agentic AI framework that decomposes retail supply chain workflows into coordinated LLM-based agents with human-in-the-loop oversight to automate operations in large supermarket chains.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · cited by 30 Pith papers · 3 internal anchors

  1. [1]

    ahujasid. 2025. BlenderMCP - Blender Model Context Protocol Integration. https://github.com/ahujasid/blender-mcp

  2. [2]

    MCP Server

    Alipay. 2025.Alipay Launches “MCP Server” for Integrated Payment Automation. Ant Group Co., Ltd. https://www. alipay.com

  3. [3]

    Anthropic. 2024. For Claude Desktop Users. https://modelcontextprotocol.io/quickstart/user

  4. [4]

    Anthropic. 2024. Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol

  5. [5]

    2025.Baidu Create 2025 Developer Conference Establishes MCP Ecosystem Forum

    Baidu. 2025.Baidu Create 2025 Developer Conference Establishes MCP Ecosystem Forum. Baidu Inc. https://www.aibase. com/news/17525

  6. [6]

    Manish Bhatt, Vineeth Sai Narajala, and Idan Habler. 2025. ETDI: Mitigating Tool Squatting and Rug Pull Attacks in Model Context Protocol (MCP) by Using OAuth-Enhanced Tool Definitions and Policy-Based Access Control.arXiv preprint arXiv:2506.01333(2025). https://arxiv.org/abs/2506.01333

  7. [7]

    Ivo Brett. 2025. Simplified and Secure MCP Gateways for Enterprise AI Integration.arXiv preprint arXiv:2504.19997 (2025). https://arxiv.org/abs/2504.19997

  8. [8]

    ByteDance. 2024. Coze plugin store. https://www.coze.com/store/plugin. , Vol. 1, No. 1, Article . Publication date: October 2025. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions 35

  9. [9]

    Jizhou Chen and Samuel Lee Cong. 2025. AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration.CoRRabs/2502.09809 (2025). https://doi.org/10.48550/ARXIV.2502.09809 arXiv:2502.09809

  10. [10]

    Zhi-Yuan Chen, Shiqi Shen, Guangyao Shen, Gong Zhi, Xu Chen, and Yankai Lin. 2024. Towards Tool Use Alignment of Large Language Models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 1382–1400

  11. [11]

    Cline. 2025. Cline. https://github.com/cline/cline

  12. [12]

    2025.Bailian AI Platform Introduces Complete Lifecycle MCP Service

    Alibaba Cloud. 2025.Bailian AI Platform Introduces Complete Lifecycle MCP Service. Alibaba Cloud Intelligence. https://bailian.console.aliyun.com/mcp-market

  13. [13]

    2025.Huawei Cloud Launches AI-Native Application Platform with Built-in MCP Integration

    Huawei Cloud. 2025.Huawei Cloud Launches AI-Native Application Platform with Built-in MCP Integration. Huawei Technologies Co., Ltd. https://support.huaweicloud.com/intl/en-us/usermanual-pangulm/pangulm_04_0483.html

  14. [14]

    2025.Tencent Cloud AI SDK Announced MCP Integration and Plugin Transport Standard

    Tencent Cloud. 2025.Tencent Cloud AI SDK Announced MCP Integration and Plugin Transport Standard. Tencent Cloud Computing (Beijing) Co., Ltd. https://cloud.tencent.com/developer/mcp

  15. [15]

    Cloudflare. 2025. Cloudflare. https://www.cloudflare.com

  16. [16]

    Codeium. 2025. Codeium. https://codeium.com

  17. [17]

    Sourcegraph Cody. 2025. Cody supports additional context through Anthropic’s Model Context Protocol. https: //sourcegraph.com/blog/cody-supports-anthropic-model-context-protocol

  18. [18]

    Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The Power of Noise: Redefining Retrieval for RAG Systems.CoRRabs/2401.14887 (2024). https://doi.org/10.48550/ARXIV.2401.14887 arXiv:2401.14887

  19. [19]

    Cursor. 2025. Learn how to add and use custom MCP tools within Cursor. https://docs.cursor.com/context/model- context-protocol

  20. [20]

    2025.Gemini Adds Support for Model Context Protocol (MCP)

    Google DeepMind. 2025.Gemini Adds Support for Model Context Protocol (MCP). DeepMind Technologies Limited. https://deepmind.google

  21. [21]

    Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. 2024. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways.CoRRabs/2406.02630 (2024). https://doi.org/10.48550/ARXIV.2406.02630 arXiv:2406.02630

  22. [22]

    Windsurf Editor. 2025. Windsurf Editor. https://windsurf.com

  23. [23]

    Antanavicius et al. 2025. PulseMCP. https://www.pulsemcp.com

  24. [24]

    Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024, Barcelona, Spain, August 25-29, 2024, Ricardo Baeza-Yates and Francesc...

  25. [25]

    Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

    Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

  26. [26]

    Gupta, Taylor Berg- Kirkpatrick, and Earlence Fernandes

    Imprompter: Tricking LLM Agents into Improper Tool Use.CoRRabs/2410.14923 (2024). https://doi.org/10. 48550/ARXIV.2410.14923 arXiv:2410.14923

  27. [27]

    Yuyou Gan, Yong Yang, Zhe Ma, Ping He, Rui Zeng, Yiming Wang, Qingming Li, Chunyi Zhou, Songze Li, Ting Wang, Yunjun Gao, Yingcai Wu, and Shouling Ji. 2024. Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents.CoRRabs/2411.09523 (2024). https://doi.org/10.48550/ARXIV.2411.09523 arXiv:2411.09523

  28. [28]

    gching. 2025. Toolbase. https://gettoolbase.ai

  29. [29]

    glama.ai. 2025. Glama MCP Servers. https://glama.ai/mcp/servers

  30. [30]

    MCP Zone

    Ant Group. 2025.Ant Group Launches “MCP Zone” for Unified Service Deployment and Invocation. Ant Group Co., Ltd. tbox.alipay.com/about

  31. [31]

    2021.Language Server Protocol and Implementation

    Nadeeshaan Gunasinghe and Nipuna Marcus. 2021.Language Server Protocol and Implementation. Springer

  32. [32]

    Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, and Ahmed E. Hassan

  33. [33]

    Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

    Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers.arXiv preprint arXiv:2506.13538(2025). https://arxiv.org/abs/2506.13538

  34. [34]

    Shijue Huang, Wanjun Zhong, Jianqiao Lu, Qi Zhu, Jiahui Gao, Weiwen Liu, Yutai Hou, Xingshan Zeng, Yasheng Wang, Lifeng Shang, Xin Jiang, Ruifeng Xu, and Qun Liu. 2024. Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios.CoRRabs/2401.17167 (2024). https://doi.org/10. 48550/ARXIV.2401.17167 arXiv:...

  35. [35]

    JetBrains. 2025. JetBrains MCP Server. https://plugins.jetbrains.com/plugin/26071-mcp-server

  36. [36]

    2025.OWASP Top 10 for LLM Apps & Gen AI Agentic Security Initiative

    Sotiropoulos John, Rosario Ron F Del, Kokuykin Evgeniy, Oakley Helen, Habler Idan, Underkoffler Kayla, Huang Ken, Steffensen Peter, Aralimatti Rakshith, Bitton Ron, et al. 2025.OWASP Top 10 for LLM Apps & Gen AI Agentic Security Initiative. Ph. D. Dissertation. OWASP

  37. [37]

    Sonu Kumar, Anubhav Girdhar, Ritesh Patil, and Divyansh Tripathi. 2025. MCP Guardian: A Security-First Layer for Safeguarding MCP-Based AI System.arXiv preprint arXiv:2504.12757(2025). https://arxiv.org/abs/2504.12757 , Vol. 1, No. 1, Article . Publication date: October 2025. 36 X Hou, Y Zhao, S Wang, and H Wang

  38. [38]

    LangChain. 2022. LangChain: Framework for developing applications powered by language models. https://github. com/langchain-ai/langchain

  39. [39]

    Xinzhe Li. 2025. A Review of Prominent Paradigms for LLM-Based Agents: Tool Use, Planning (Including RAG), and Feedback Learning. InProceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Sc...

  40. [40]

    LibreChat. 2025. LibreChat. https://librechat.ai

  41. [41]

    Jerry Liu. 2022. LlamaIndex: A data framework for LLM applications. https://github.com/run-llama/llama_index

  42. [42]

    Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, and Ruoming Pang. 2024. ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities.CoRRabs/2408.04682 (2024). https://doi.org/10.48550/ARXIV.2408.04682 arXiv:2408.04682

  43. [43]

    Baidu Maps. 2025. Baidu Maps MCP Servers. https://lbs.baidu.com/faq/api?title=mcpserver/base

  44. [44]

    Emacs MCP. 2025. Emacs MCP. https://github.com/lizqwerscott/mcp.el

  45. [45]

    Tripo3D MCP. 2025. Tripo3D MCP. https://blender-mcp.com/

  46. [46]

    mcp dockmater. 2025. Dockmaster. https://mcp-dockmaster.com

  47. [47]

    mcp.so. 2025. MCP.so. https://mcp.so/

  48. [48]

    Ivan Milev, Mislav Balunović, Maximilian Baader, and Martin Vechev. 2025. ToolFuzz–Automated Agent Tool Testing. arXiv preprint arXiv:2503.04479(2025)

  49. [49]

    OpenAI. 2023. ChatGPT plugins. https://openai.com/index/chatgpt-plugins/

  50. [50]

    OpenAI. 2023. Funcation Calling. https://platform.openai.com/docs/guides/function-calling?api-mode=responses

  51. [51]

    OpenAI. 2025. OpenAI Agents SDK. https://openai.github.io/openai-agents-python/

  52. [52]

    OpenAI. 2025. OpenAI Agents SDK - Model context protocol (MCP). https://openai.github.io/openai-agents-python/ mcp/

  53. [53]

    OpenSumi. 2025. OpenSumi. https://github.com/opensumi/core

  54. [54]

    Model Context Protocol. 2024. GitHub MCP Server. https://github.com/modelcontextprotocol/servers/tree/main/src/ github

  55. [55]

    Model Context Protocol. 2024. Slack MCP Server. https://github.com/modelcontextprotocol/servers/tree/main/src/ slack

  56. [56]

    Brandon Radosevich and John Halloran. 2025. MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits.arXiv preprint arXiv:2504.03767(2025). https://arxiv.org/abs/2504.03767

  57. [57]

    Replit. 2025. Replit. https://replit.com

  58. [58]

    Zhuocheng Shen. 2024. LLM With Tools: A Survey.CoRRabs/2409.18807 (2024). https://doi.org/10.48550/ARXIV.2409. 18807 arXiv:2409.18807

  59. [59]

    Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Suzan Ver- berne, and Zhaochun Ren. 2024. Chain of Tools: Large Language Model is an Automatic Multi-tool Learner.CoRR abs/2405.16533 (2024). https://doi.org/10.48550/ARXIV.2405.16533 arXiv:2405.16533

  60. [60]

    Zhengliang Shi, Shen Gao, Lingyong Yan, Yue Feng, Xiuyi Chen, Zhumin Chen, Dawei Yin, Suzan Verberne, and Zhaochun Ren. 2025. Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents. InTHE WEB CONFERENCE 2025. https://openreview.net/forum?id=T4wMdeFEjX

  61. [61]

    Slack Security Team. 2022. Passwordless Persistence and Privilege Escalation in Azure. https://posts.specterops.io/ passwordless-persistence-and-privilege-escalation-in-azure-98a01310be3f

  62. [62]

    Block (Square). 2025. Block (Square). https://glama.ai/mcp/servers/atblock/square-mcp/tools/team

  63. [63]

    Stripe. 2025. Stripe. https://stripe.com

  64. [64]

    Microsoft Copilot Studio. 2025. Introducing Model Context Protocol (MCP) in Copilot Studio: Simplified Integration with AI Apps and Agents. https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/introducing-model- context-protocol-mcp-in-copilot-studio-simplified-integration-with-ai-apps-and-agents/

  65. [65]

    Tencent. 2024. Tencent plugin shop. https://yuanqi.tencent.com/plugin-shop

  66. [66]

    Apify MCP Tester. 2025. Apify MCP Tester. https://apify.com/jiri.spilka/tester-mcp-client

  67. [67]

    TheiaAI/TheiaIDE. 2025. TheiaAI/TheiaIDE. https://theia-ide.org/docs/user_ai/

  68. [69]

    Zihan Wang, Hongwei Li, Rui Zhang, Yu Liu, Wenbo Jiang, Wenshu Fan, Qingchuan Zhao, and Guowen Xu. 2025. MPMA: Preference Manipulation Attack Against Model Context Protocol.arXiv preprint arXiv:2505.11154(2025). https://arxiv.org/abs/2505.11154 , Vol. 1, No. 1, Article . Publication date: October 2025. Model Context Protocol (MCP): Landscape, Security Thr...

  69. [70]

    Georg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelovic, and Jakob Nikolas Kather. 2025. LLM Agents Making Agent Tools.CoRRabs/2502.11705 (2025). https://doi.org/10.48550/ARXIV.2502.11705 arXiv:2502.11705

  70. [71]

    Ioannidis, Karthik Subbian, Jure Leskovec, and James Y

    Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, and James Y. Zou. 2024. AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurI...

  71. [72]

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, a...

  72. [73]

    Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, and Meng Han. 2025. MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications.arXiv preprint arXiv:2508.10991(2025). https://arxiv.org/abs/2508.10991

  73. [74]

    Nikolenko, and Andrey Bout

    Konstantin Yakovlev, Sergey I. Nikolenko, and Andrey Bout. 2024. Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option.CoRRabs/2410.12004 (2024). https://doi.org/10.48550/ARXIV.2410.12004 arXiv:2410.12004

  74. [75]

    Yixuan Yang, Daoyuan Wu, and Yufan Chen. 2025. MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols.arXiv preprint arXiv:2508.13220(2025). https://arxiv.org/abs/2508.13220

  75. [76]

    Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong Wu, Qi Zhang, Tao Gui, and Xuanjing Huang

  76. [77]

    https://doi.org/10.48550/ARXIV.2402.10753 arXiv:2402.10753

    ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages.CoRR abs/2402.10753 (2024). https://doi.org/10.48550/ARXIV.2402.10753 arXiv:2402.10753

  77. [78]

    Miao Yu, Fanci Meng, Xinyun Zhou, Shilong Wang, Junyuan Mao, Linsey Pang, Tianlong Chen, Kun Wang, Xinfeng Li, Yongfeng Zhang, et al. 2025. A Survey on Trustworthy LLM Agents: Threats and Countermeasures.arXiv preprint arXiv:2503.09648(2025)

  78. [79]

    Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Kan Ren, Dongsheng Li, and Deqing Yang. 2024. EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction.CoRRabs/2401.06201 (2024). https: //doi.org/10.48550/ARXIV.2401.06201 arXiv:2401.06201

  79. [80]

    Zed. 2025. Zed - Model Context Protocol. https://zed.dev/docs/assistant/model-context-protocol

  80. [81]

    Shuli Zhao, Qingsheng Hou, Zihan Zhan, Yanhao Wang, Yuchong Xie, Yu Guo, Libo Chen, Shenghong Li, and Zhi Xue

Showing first 80 references.