pith. sign in

arxiv: 2605.24248 · v2 · pith:RID3HGS7new · submitted 2026-05-22 · 💻 cs.CR · cs.AI· cs.SE

Attested Tool-Server Admission: A Security Extension to the Model Context Protocol

Pith reviewed 2026-06-30 15:28 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.SE
keywords Model Context ProtocolMCPtool server admissionattested clearanceLLM agent securityper-server allowlistaudit logtrust extension
0
0 comments X

The pith

MCP hosts can verify a server's offline-signed clearance at a well-known URI, apply per-server tool allowlists, and enforce decisions via audit logs to admit external tool servers safely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The Model Context Protocol standardizes message exchange between an LLM agent and external tool servers but supplies no trust controls, so a host accepts any self-declared tool list and dispatches calls without limits. This paper supplies three additive mechanisms that close the gap while leaving the base protocol and existing APIs unchanged. A server publishes a compact offline-signed clearance assertion at a standard URI; the host verifies it against a pinned trust root before any dispatch. A deny-by-default allowlist then restricts which of the server's tools may actually be invoked. A flavor-gated mode converts the checks into hard denials and records every decision in a tamper-evident log. An unmodified host simply ignores the new document and behaves exactly as before.

Core claim

The security gap that makes an unmediated third-party MCP connection unsafe can be closed by three mechanisms without modifying the protocol: an offline-signed clearance assertion published at a well-known URI and verified against a pinned trust root, a deny-by-default per-server tool allowlist, and a flavor-gated enforcement mode that produces hard denials and writes every decision to a tamper-evident audit log.

What carries the argument

The attested clearance assertion: a small offline-signed document a server publishes at a well-known URI that a host verifies against a pinned trust root before dispatching any tool call.

If this is right

  • Admitting a server no longer implies trusting every tool it offers, because the allowlist bounds the set that may be invoked.
  • Every admission and tool-dispatch decision is written to a tamper-evident audit log for later inspection.
  • The extension can be published as a normative MCP addendum because the schema, verification rules, error registry, and conformance vectors are supplied in RFC 2119 form.
  • Unextended hosts continue to operate exactly as today, so incremental adoption requires no protocol change.
  • Regulated deployments that must accredit external tool servers become feasible once the explicit trust model is in place.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same clearance-plus-allowlist pattern could be reused in other agent-to-tool protocols that currently lack server admission controls.
  • Organizations could publish their own clearance roots to enforce internal policy across multiple LLM agents without custom wrappers.
  • The tamper-evident logs could feed directly into existing security monitoring systems for centralized oversight of tool usage.
  • Widespread use might reduce reliance on per-application security shims by moving the trust boundary into the protocol extension itself.

Load-bearing premise

The host must hold a correctly pinned trust root and the server must publish a valid clearance document at the well-known URI before any tool dispatch occurs.

What would settle it

A test in which a host configured with a pinned trust root receives a forged clearance document at the well-known URI yet still dispatches a tool call, or in which a server never publishes the document and the host proceeds anyway.

Figures

Figures reproduced from arXiv: 2605.24248 by Alfredo Metere.

Figure 1
Figure 1. Figure 1: MCP today (a) versus the proposed attested tool-server admission (b). The extension [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Runtime configuration of the adversarial campaign. A GPU-resident local model (Ollama / [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

The Model Context Protocol (MCP) standardizes how a large-language-model (LLM) agent and an external tool server exchange messages, but not trust: a host reads a server's self-declared tool list and dispatches calls, with no notion of which servers it may use, at what sensitivity, or which of a server's tools are in bounds. This work grew out of a concrete need -- letting the Enclawed agent use Google's externally-operated MCP servers (Gmail, Calendar, Drive) safely, admitting the server and bounding the tools it may drive, without changing MCP or Enclawed's own tool application-programming interface (API). The mechanism we built, mcp-attested (shipped in both the open enclawed-oss distribution and the enclaved flavor), generalizes: the gap that makes an unmediated third-party connection unsafe for one user makes a regulated deployment impossible to accredit. We close it with three additive mechanisms: (1) a small, offline-signed clearance assertion a server publishes at a well-known Uniform Resource Identifier (URI) and a host verifies against a pinned trust root before any tool dispatch; (2) a deny-by-default per-server tool allowlist, so admitting a server is not trusting its every tool; and (3) a flavor-gated enforcement mode that turns the checks from warnings into hard denials, with every decision written to a tamper-evident audit log. We give the wire format, the verification algorithm, a security analysis, and an LLM-driven adversarial evaluation; we then state the design in normative Request-for-Comments (RFC 2119) form -- schema, verification rules, error registry, well-known registration, and machine-checkable conformance vectors -- so it can be adopted as an MCP addendum rather than reinvented. An unextended host ignores the well-known document and behaves exactly as today.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to present mcp-attested, a security extension to the Model Context Protocol (MCP) consisting of three mechanisms: (1) offline-signed clearance assertions published at a well-known URI and verified by hosts against a pinned trust root, (2) deny-by-default per-server tool allowlists, and (3) flavor-gated enforcement with tamper-evident audit logs. It supplies the wire format, verification algorithm, security analysis, an LLM-driven adversarial evaluation, and a full normative RFC-style specification (schema, verification rules, error registry, well-known registration, machine-checkable conformance vectors) for adoption as an MCP addendum, while preserving backward compatibility for unextended hosts.

Significance. If the design is sound, this work offers a practical solution for securing MCP-based LLM agent deployments in sensitive or regulated settings, such as integrating with external services like Google Workspace tools. The provision of machine-checkable conformance vectors and normative language is a notable strength that supports standardization and correct implementation. The approach avoids changes to the core MCP or tool APIs, enhancing its deployability.

major comments (2)
  1. [Evaluation section] The LLM-driven adversarial evaluation is described in the abstract but provides no quantitative results, error analysis, or specific metrics on attack success rates; this is load-bearing for assessing whether the three mechanisms meaningfully reduce the trust gap. (Evaluation section)
  2. [§ Security analysis] The central claim that the mechanisms close the MCP trust gap rests on hosts correctly pinning trust roots and servers publishing valid clearance documents; while the paper notes that unextended hosts revert to the unauthenticated state, the security analysis should include a concrete test case or failure-mode walkthrough to confirm the claim holds under partial adoption. (§ Security analysis)
minor comments (2)
  1. [Abstract] The term 'flavor-gated enforcement mode' appears without a short parenthetical expansion on first use in the abstract, reducing immediate clarity for readers.
  2. [Normative specification] Cross-references between the machine-checkable conformance vectors and the verification algorithm should be added in the normative section to make coverage explicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address each major comment below.

read point-by-point responses
  1. Referee: [Evaluation section] The LLM-driven adversarial evaluation is described in the abstract but provides no quantitative results, error analysis, or specific metrics on attack success rates; this is load-bearing for assessing whether the three mechanisms meaningfully reduce the trust gap. (Evaluation section)

    Authors: We agree that quantitative results are needed to substantiate the evaluation. The current manuscript describes the setup and qualitative outcomes of the LLM-driven adversarial evaluation but does not report numerical attack success rates or error analysis. In revision we will expand the Evaluation section with concrete metrics (pre- and post-mechanism attack success rates, false-positive rates on legitimate calls, and breakdown by attack class) drawn from the existing evaluation runs. revision: yes

  2. Referee: [§ Security analysis] The central claim that the mechanisms close the MCP trust gap rests on hosts correctly pinning trust roots and servers publishing valid clearance documents; while the paper notes that unextended hosts revert to the unauthenticated state, the security analysis should include a concrete test case or failure-mode walkthrough to confirm the claim holds under partial adoption. (§ Security analysis)

    Authors: We will add a dedicated failure-mode walkthrough subsection to § Security analysis. It will enumerate three concrete partial-adoption scenarios (all hosts extended, mixed population, and server publishes but no host pins) and show, step by step, that adopting hosts still obtain the attested guarantees while non-adopting hosts fall back exactly to the original unauthenticated MCP behavior, with no new attack surface introduced for either population. revision: yes

Circularity Check

0 steps flagged

No circularity; independent protocol design

full rationale

The paper proposes an additive security extension to MCP consisting of a clearance assertion at a well-known URI, a per-server tool allowlist, and flavor-gated enforcement with audit logging. These are defined via explicit wire formats, verification algorithms, and RFC 2119 normative rules with no equations, fitted parameters, predictions, or self-citations that reduce the central claim to its own inputs. The design is self-contained against the stated assumptions about pinned trust roots and document publication; no load-bearing step collapses by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The design rests on standard cryptographic assumptions for signature verification and introduces new protocol artifacts without fitted parameters or data-driven constants.

axioms (1)
  • standard math Standard cryptographic signature verification against a pinned trust root is reliable
    Invoked for clearance assertion verification before tool dispatch.
invented entities (2)
  • clearance assertion document no independent evidence
    purpose: Attests server admission and bounds the allowable tools
    New document format and well-known URI registration defined by the paper.
  • flavor-gated enforcement mode no independent evidence
    purpose: Converts checks into hard denials with audit logging
    New enforcement behavior introduced for regulated deployments.

pith-pipeline@v0.9.1-grok · 5870 in / 1422 out tokens · 42653 ms · 2026-06-30T15:28:54.461432+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    Introducing the model context protocol

    Anthropic. Introducing the model context protocol. https://www.anthropic.com/news/ model-context-protocol, 2024

  2. [2]

    Model context protocol specification

    Anthropic. Model context protocol specification. https://modelcontextprotocol.io/ specification, 2024. Accessed 2026

  3. [3]

    Elliott Bell and Leonard J

    D. Elliott Bell and Leonard J. LaPadula. Secure computer systems: Mathematical foundations. MITRE Technical Report 2547, I, 1973

  4. [4]

    SPIFFE: Secure production identity framework for everyone.https://spiffe.io, 2018

    Cloud Native Computing Foundation. SPIFFE: Secure production identity framework for everyone.https://spiffe.io, 2018

  5. [5]

    Enclawed — the compliance layer for agentic ai.https://enclawed.com, 2026

    Enclawed LLC. Enclawed — the compliance layer for agentic ai.https://enclawed.com, 2026

  6. [6]

    The OAuth 2.0 authorization framework

    Dick Hardt. The OAuth 2.0 authorization framework. Technical Report RFC 6749, IETF, 2012

  7. [7]

    The confused deputy: (or why capabilities might have been invented), 1988

    Norm Hardy. The confused deputy: (or why capabilities might have been invented), 1988

  8. [8]

    Agent-trust-bench: Differential defender-side profiles for multi- server mcp deployments.https://agent-trust-bench.algovoi.co.uk/, 2026

    Christopher Hopley and AlgoVoi. Agent-trust-bench: Differential defender-side profiles for multi- server mcp deployments.https://agent-trust-bench.algovoi.co.uk/, 2026. Production adversarial test suite; differential profiles across 29-tool surfaces evaluating tool-name shadowing, dynamic schema drift, and registry poisoning under multiple model personas

  9. [9]

    Edwards-curve digital signature algorithm (EdDSA)

    Simon Josefsson and Ilari Liusvaara. Edwards-curve digital signature algorithm (EdDSA). Technical Report RFC 8032, IETF, 2017

  10. [10]

    JSON-RPC 2.0 specification

    JSON-RPC Working Group. JSON-RPC 2.0 specification. https://www.jsonrpc.org/ specification, 2013

  11. [11]

    An Application-Layer Multi-Modal Covert-Channel Reference Monitor for LLM Agent Egress

    Alfredo Metere. An application-layer multi-modal covert-channel reference monitor for LLM agent egress. arXiv,https://arxiv.org/abs/2605.20734, 2026. Enclawed LLC preprint. 21

  12. [12]

    Methods for formal verification of agent skills: Three layers toward a mechan- ically checkable capability-containment proof

    Alfredo Metere. Methods for formal verification of agent skills: Three layers toward a mechan- ically checkable capability-containment proof. Zenodo,https://doi.org/10.5281/zenodo. 20100248, 2026. Enclawed LLC preprint

  13. [13]

    MITRE ATLAS: Adversarial threat landscape for artificial-intelligence systems

    MITRE. MITRE ATLAS: Adversarial threat landscape for artificial-intelligence systems. https://atlas.mitre.org, 2024

  14. [14]

    Security requirements for cryptographic modules

    National Institute of Standards and Technology. Security requirements for cryptographic modules. Technical Report FIPS PUB 140-3, NIST, 2019

  15. [15]

    Security and privacy controls for information systems and organizations

    National Institute of Standards and Technology. Security and privacy controls for information systems and organizations. Technical Report SP 800-53 Rev. 5, NIST, 2020

  16. [16]

    Sigstore: Software signing for everybody

    Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. Sigstore: Software signing for everybody. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 2353–2367, 2022

  17. [17]

    Well-known uniform resource identifiers (URIs)

    Mark Nottingham. Well-known uniform resource identifiers (URIs). Technical Report RFC 8615, IETF, 2019

  18. [18]

    OWASP top 10 for large language model applications.https://owasp

    OWASP Foundation. OWASP top 10 for large language model applications.https://owasp. org/www-project-top-10-for-large-language-model-applications/, 2025

  19. [19]

    NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails

    TraianRebedea, RazvanDinu, MakeshNarsimhanSreedhar, ChristopherParisien, andJonathan Cohen. NeMo guardrails: A toolkit for controllable and safe LLM applications with pro- grammable rails. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP), pages 431–445, 2023

  20. [20]

    The transport layer security (TLS) protocol version 1.3

    Eric Rescorla. The transport layer security (TLS) protocol version 1.3. Technical Report RFC 8446, IETF, 2018

  21. [21]

    Saltzer and Michael D

    Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. InProceedings of the IEEE, volume 63, pages 1278–1308, 1975

  22. [22]

    Survivable key compromise in software update systems

    Justin Samuel, Nick Mathewson, Justin Cappos, and Roger Dingledine. Survivable key compromise in software update systems. InProceedings of the 17th ACM Conference on Computer and Communications Security (CCS), pages 61–72, 2010

  23. [23]

    in-toto: Providing farm-to-table guarantees for bits and bytes

    Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. In28th USENIX Security Symposium (USENIX Security), pages 1393–1410, 2019. 22