pith. machine review for the scientific record. sign in

arxiv: 2603.13417 · v1 · submitted 2026-03-12 · 💻 cs.SE · cs.AI· cs.MA

Recognition: no theorem link

Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:54 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.MA
keywords Model Context ProtocolAI agentsproduction deploymenttool invocationidentity propagationerror recoveryadaptive budgetingprotocol design
0
0 comments X

The pith

The Model Context Protocol requires three new primitives—identity propagation, adaptive tool budgeting, and structured error semantics—to support reliable production-scale AI agent tool use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that MCP already standardizes tool discovery and invocation across thousands of servers, yet it leaves agents without the infrastructure needed to operate those tools safely once deployed at scale. Field experience from integrating an AI agent platform with a major cloud provider's MCP servers revealed recurring failures tied to missing support for propagating user identity, allocating timeouts across variable tool latencies, and interpreting errors in a machine-readable way. The authors respond by defining three mechanisms to close those gaps: a broker protocol that routes requests while preserving identity, a budgeting method that treats timeout assignment as an allocation problem, and an error framework that supplies deterministic recovery signals. They also catalog failures along five dimensions and supply a readiness checklist. If these additions prove necessary, production deployments would need explicit infrastructure layers beyond the current MCP specification.

Core claim

MCP provides a solid foundation for agents to find and call external tools but omits three protocol-level primitives required for safe operation at production scale: identity propagation across requests, adaptive allocation of timeout budgets over heterogeneous tool latencies, and structured machine-readable error semantics that enable deterministic agent self-correction. These gaps surfaced in an enterprise integration with a major cloud provider's MCP servers and are addressed through the Context-Aware Broker Protocol extending JSON-RPC with identity-scoped routing, Adaptive Timeout Budget Allocation framing sequential calls as a budget problem, and the Structured Error Recovery Framework.

What carries the argument

The Context-Aware Broker Protocol (CABP) for identity-scoped JSON-RPC routing, Adaptive Timeout Budget Allocation (ATBA) for latency-aware budget distribution, and Structured Error Recovery Framework (SERF) for machine-readable failure semantics, organized around the five design dimensions of server contracts, user context, timeouts, errors, and observability.

If this is right

  • Agents can maintain user identity across tool invocations through a six-stage broker pipeline.
  • Timeout values can be allocated dynamically according to observed latency distributions to reduce premature failures.
  • Structured error responses enable agents to perform deterministic self-correction instead of ad-hoc retries.
  • A five-dimension checklist allows systematic auditing of MCP deployments for production readiness.
  • Each proposed mechanism is expressed as a testable hypothesis with a reproducible experimental setup.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The mechanisms could be proposed for inclusion in a future revision of the MCP specification itself.
  • Comparable infrastructure gaps are likely present in other agent-tool protocols that focus only on discovery and invocation.
  • Replicating the experiments across additional cloud providers would test whether the three mechanisms generalize beyond the studied case.
  • Widespread adoption would shift responsibility for safe tool use from custom middleware to the protocol layer.

Load-bearing premise

The production problems seen in a single enterprise deployment with one major cloud provider's MCP servers are representative of the general challenges that affect all MCP-based agent systems.

What would settle it

A large-scale production MCP deployment that achieves low failure rates and reliable tool operation without any of the three proposed mechanisms would falsify the claim that those primitives are required.

Figures

Figures reproduced from arXiv: 2603.13417 by Vasundra Srinivasan.

Figure 1
Figure 1. Figure 1: End-to-end deployment architecture. Solid arrows indicate request flow; the dashed [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The CABP broker pipeline. Stages 2 and 3 can reject requests (dashed red arrows). [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Turn budget consumption for a 4-tool sequential chain. Top: nominal case completes [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Two-tier error handling flow. Tier 1 (protocol) errors are handled automatically by the [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Static vs. ATBA budget allocation for a 100s turn budget across 4 tools. ATBA assigns [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
read the original abstract

The Model Context Protocol (MCP) standardizes how AI agents discover and invoke external tools, with over 10,000 active servers and 97 million monthly SDK downloads as of early 2026. Yet MCP does not yet standardize how agents safely operate those tools at production scale. Three protocol-level primitives remain missing: identity propagation, adaptive tool budgeting, and structured error semantics. This paper identifies these gaps through field lessons from an enterprise deployment of an AI agent platform integrated with a major cloud provider's MCP servers (client name redacted). We propose three mechanisms to fill them: (1) the Context-Aware Broker Protocol (CABP), which extends JSON-RPC with identity-scoped request routing via a six-stage broker pipeline; (2) Adaptive Timeout Budget Allocation (ATBA), which frames sequential tool invocation as a budget allocation problem over heterogeneous latency distributions; and (3) the Structured Error Recovery Framework (SERF), which provides machine-readable failure semantics that enable deterministic agent self-correction. We organize production failure modes into five design dimensions (server contracts, user context, timeouts, errors, and observability), document concrete failure vignettes, and present a production readiness checklist. All three algorithms are formalized as testable hypotheses with reproducible experimental methodology. Field observations demonstrate that while MCP provides a solid protocol foundation, reliable agent tool integration requires infrastructure-level mechanisms that the specification does not yet address.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that the Model Context Protocol (MCP) standardizes tool discovery and invocation for AI agents but lacks three protocol-level primitives for safe production-scale operation: identity propagation, adaptive tool budgeting, and structured error semantics. Drawing from field observations in a single redacted enterprise deployment with a major cloud provider's MCP servers, it organizes production failures into five design dimensions (server contracts, user context, timeouts, errors, observability), documents vignettes, and proposes three mechanisms: Context-Aware Broker Protocol (CABP) as a six-stage broker pipeline extending JSON-RPC for identity-scoped routing; Adaptive Timeout Budget Allocation (ATBA) framing tool invocation as budget allocation over latency distributions; and Structured Error Recovery Framework (SERF) for machine-readable error semantics enabling agent self-correction. All three are formalized as testable hypotheses with reproducible experimental methodology, accompanied by a production readiness checklist.

Significance. If the proposed mechanisms are validated beyond the single deployment, the work would provide concrete, testable design patterns for reliable AI agent tool integration in a rapidly growing ecosystem (10k+ servers, 97M monthly downloads). The formalization of CABP, ATBA, and SERF as hypotheses with reproducible methodology is a strength that could support community validation and adoption in software engineering for production AI systems.

major comments (2)
  1. [Abstract] Abstract: The assertion that MCP 'does not yet standardize how agents safely operate those tools at production scale' and that the three primitives 'remain missing' rests entirely on field lessons from one redacted enterprise deployment; without comparative data from other MCP servers, open-source implementations, or multiple scales, the gaps cannot be established as protocol-level omissions rather than environment-specific issues.
  2. [Abstract] Abstract (failure vignettes and mechanisms sections): No quantitative validation results, error bars, performance metrics, or cross-deployment comparisons are reported to support the efficacy of CABP, ATBA, or SERF, leaving the central claim that these mechanisms fill universal gaps without empirical grounding beyond the single case.
minor comments (1)
  1. [Abstract] The redaction of the client name in the deployment description limits reproducibility assessment; consider adding anonymized but more detailed context on scale, tool types, and failure frequencies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the need to clarify the scope of our claims. We address each major comment below and will make targeted revisions to improve precision without altering the core contribution of identifying protocol gaps and proposing testable design patterns.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that MCP 'does not yet standardize how agents safely operate those tools at production scale' and that the three primitives 'remain missing' rests entirely on field lessons from one redacted enterprise deployment; without comparative data from other MCP servers, open-source implementations, or multiple scales, the gaps cannot be established as protocol-level omissions rather than environment-specific issues.

    Authors: We agree that the observations derive from a single enterprise deployment and that broader comparative data would strengthen the case for protocol-level status. The identified gaps, however, map directly to elements absent from the publicly available MCP specification, which standardizes discovery and invocation but provides no primitives for identity propagation, adaptive budgeting, or structured error semantics. The manuscript already frames CABP, ATBA, and SERF as testable hypotheses with reproducible methodology rather than proven universal solutions. We will revise the abstract to explicitly qualify the claims as arising from production experience with the current specification and to underscore the value of community validation across deployments. revision: partial

  2. Referee: [Abstract] Abstract (failure vignettes and mechanisms sections): No quantitative validation results, error bars, performance metrics, or cross-deployment comparisons are reported to support the efficacy of CABP, ATBA, or SERF, leaving the central claim that these mechanisms fill universal gaps without empirical grounding beyond the single case.

    Authors: The paper does not present quantitative performance results or cross-deployment metrics; its contribution centers on documenting observed failure modes, organizing them into design dimensions, and formalizing three mechanisms as hypotheses accompanied by reproducible experimental methodologies. Field observations are used to motivate the gaps, not to benchmark the proposed solutions. We will revise the abstract and relevant sections to state this scope more explicitly, add a limitations paragraph acknowledging the absence of quantitative validation, and outline directions for future empirical studies. revision: partial

Circularity Check

0 steps flagged

No significant circularity; proposals derived from field observations without reduction to fitted inputs or self-citations

full rationale

The paper identifies gaps (identity propagation, adaptive tool budgeting, structured error semantics) from field lessons in one enterprise deployment with a major cloud provider's MCP servers, then proposes CABP (six-stage broker pipeline), ATBA (budget allocation over latency distributions), and SERF (machine-readable error semantics) as new mechanisms. These are formalized as testable hypotheses with reproducible experimental methodology and organized around five design dimensions. No equations, fitted parameters renamed as predictions, self-citations, uniqueness theorems, or ansatzes appear in the derivation chain. The central claims rest on empirical vignettes rather than reducing to prior inputs by construction, rendering the analysis self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim depends on the domain assumption that the three identified primitives are missing and that the proposed mechanisms will close them, with no independent evidence supplied beyond one deployment case.

axioms (1)
  • domain assumption Field lessons from an enterprise deployment reveal three missing protocol primitives in MCP.
    Stated as the basis for identifying gaps in identity propagation, adaptive tool budgeting, and structured error semantics.
invented entities (3)
  • Context-Aware Broker Protocol (CABP) no independent evidence
    purpose: Extends JSON-RPC with identity-scoped request routing via a six-stage broker pipeline
    New proposed mechanism to handle identity propagation.
  • Adaptive Timeout Budget Allocation (ATBA) no independent evidence
    purpose: Frames sequential tool invocation as a budget allocation problem over heterogeneous latency distributions
    New proposed mechanism to handle adaptive tool budgeting.
  • Structured Error Recovery Framework (SERF) no independent evidence
    purpose: Provides machine-readable failure semantics that enable deterministic agent self-correction
    New proposed mechanism to handle structured error semantics.

pith-pipeline@v0.9.0 · 5550 in / 1418 out tokens · 63719 ms · 2026-05-15T11:54:20.076835+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 2 internal anchors

  1. [1]

    modelcontextprotocol.io/

    Model Context Protocol Specification, Version 2025-11-25.https://spec. modelcontextprotocol.io/

  2. [2]

    Donating the Model Context Protocol and Establishing the Agentic AI Foundation,

    Anthropic, “Donating the Model Context Protocol and Establishing the Agentic AI Foundation,” December 2025.https://www.anthropic.com/news/ donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation

  3. [3]

    JSON-RPC 2.0 Specification.https://www.jsonrpc.org/specification

  4. [4]

    AI Gateway and MCP Gateway Technical Breakdown,

    Kong Inc., “AI Gateway and MCP Gateway Technical Breakdown,” 2025.https://konghq. com/blog/engineering/ai-gateway-mcp-gateway-mcp-server-breakdown

  5. [5]

    Envoy AI Gateway,

    Envoy Proxy, “Envoy AI Gateway,” 2025.https://www.envoyproxy.io/

  6. [6]

    The State of MCP—Adoption, Security & Production Readiness,

    Zuplo, “The State of MCP—Adoption, Security & Production Readiness,” 2025.https: //zuplo.com/mcp-report

  7. [7]

    Build Long-Running MCP Servers on Amazon Bedrock AgentCore,

    Amazon Web Services, “Build Long-Running MCP Servers on Amazon Bedrock AgentCore,” 2025.https://aws.amazon.com/blogs/machine-learning/ build-long-running-mcp-servers-on-amazon-bedrock-agentcore-with-strands-agents-integration/

  8. [8]

    MCP Observability with OpenTelemetry,

    SigNoz, “MCP Observability with OpenTelemetry,” 2025.https://signoz.io/blog/ mcp-observability-with-otel/

  9. [9]

    MCP Observability Documentation,

    Grafana Labs, “MCP Observability Documentation,” 2025.https://grafana.com/docs/ grafana-cloud/monitor-applications/ai-observability/mcp-observability/

  10. [10]

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

    X. Hou, Y. Zhao, and H. Wang, “Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions,” arXiv preprint arXiv:2503.23278, 2025

  11. [11]

    Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

    T. Hasan, S. Bhatt, and Others, “MCP at First Glance: An Empirical Study of MCP Server Ecosystems,” arXiv preprint arXiv:2506.13538, 2025

  12. [12]

    CA-MCP: Context-Aware MCP with Shared Context Store for Multi- Agent Systems,

    W. Li and Others, “CA-MCP: Context-Aware MCP with Shared Context Store for Multi- Agent Systems,” arXiv preprint arXiv:2601.11595, 2026

  13. [13]

    MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers,

    MCPTox Authors, “MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers,” arXiv preprint arXiv:2508.14925, 2025

  14. [14]

    Multi-Agent MCP Architecture for Enterprise Integration,

    A. Krishnan, “Multi-Agent MCP Architecture for Enterprise Integration,” arXiv preprint arXiv:2504.21030, 2025

  15. [15]

    A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows,

    S. Dhar and Others, “A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows,” arXiv preprint arXiv:2512.08769, 2025

  16. [16]

    State of MCP Server Security 2025: Research Report,

    Astrix Security, “State of MCP Server Security 2025: Research Report,” 2025.https: //astrix.security/learn/blog/state-of-mcp-server-security-2025/ 23