arxiv: 2605.00827 · v1 · submitted 2026-03-13 · 💻 cs.DC · cs.AI· cs.SE

Recognition: no theorem link

Separating Intelligence from Execution: A Workflow Engine for the Model Context Protocol

Abhinav Singh Parmar

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:04 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.SE

keywords MCP Workflow EngineModel Context Protocoldeclarative blueprintLLM agentstoken efficiencyKubernetes orchestrationMCP Mediatorworkflow execution

0 comments

The pith

The MCP Workflow Engine lets an agent produce a declarative blueprint once so that complex tasks run with a single tool call and over 99% lower token cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to separate an LLM agent's one-time reasoning from repeated execution by introducing an MCP-native workflow engine. The agent outputs a single JSON blueprint that encodes sequences, loops, parallel branches, and data flows among tool calls. After that, a lone run_workflow invocation triggers the entire plan using constant, minimal tokens regardless of internal complexity. The design was validated on a 67-step Kubernetes CMDB synchronization task that builds a 1,200-node graph across two MCP servers and finishes in under 45 seconds while remaining fully deterministic and idempotent.

Core claim

We present the MCP Workflow Engine, a novel MCP-native orchestration layer that decouples intelligence (deciding what to do) from execution (carrying it out). An agent reasons once to produce a declarative workflow blueprint—a JSON document specifying a directed sequence of MCP tool calls with parameterized templates, loops, parallel branches, and data piping. Subsequent executions are triggered by a single run_workflow tool call, consuming one invocation's worth of tokens regardless of the blueprint's internal complexity. We formalize the MCP Mediator architectural pattern and implement it against the MCP SDK. The engine reduces per-execution token cost by over 99%, completes the full 1,200

What carries the argument

The declarative workflow blueprint (a JSON document encoding sequences, parameterized templates, loops, parallel branches, and data piping of MCP tool calls) executed via a single run_workflow invocation by the MCP Workflow Engine.

If this is right

Per-execution token cost becomes fixed at the price of one tool call no matter how many internal steps the blueprint contains.
Execution is deterministic and idempotent with zero agent involvement once the blueprint exists.
Large multi-server tasks such as building 1,200-node cluster graphs can be orchestrated reliably across arbitrary downstream MCP servers.
The MCP Mediator pattern cleanly composes multiple MCP servers into a single orchestrated workflow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Reusable blueprints could be stored and shared so different agents or sessions invoke the same complex task without re-planning.
Production systems might pre-generate blueprints for recurring operations to cut ongoing LLM spend and response latency.
The mediator-server approach could be adapted to other tool-calling protocols beyond MCP.

Load-bearing premise

An agent can reliably produce a correct and complete declarative blueprint in one reasoning pass without needing runtime feedback or iteration.

What would settle it

Running the same 67-step Kubernetes task repeatedly still requires the agent to reason through every individual step and consume tokens proportional to the number of actions.

Figures

Figures reproduced from arXiv: 2605.00827 by Abhinav Singh Parmar.

**Figure 1.** Figure 1: Three-layer architecture of the MCP Workflow Engine (MCP Mediator pattern). The engine is simultaneously an MCP server (upward, exposing workflow tools to agents) and an MCP client (downward, connecting to Kubernetes, Graph DB, and other downstream MCP servers). 3.2 Two-Phase Lifecycle [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Two-phase lifecycle. The LLM agent is involved only at design time (Phase 1). All subsequent [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: illustrates the five step primitives and their composition in a concrete CMDB sync example [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Template resolution pipeline. The resolver performs multi-pass substitution, resolving loop variables [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: CMDB graph schema produced by the workflow, showing 16 node labels and 20 relationship types. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

Large Language Model (LLM) agents increasingly interact with external systems through tool-calling protocols such as the Model Context Protocol (MCP). In prevailing architectures, the agent must reason about every tool invocation in every session, consuming tokens proportional to the number of actions performed--even when the task has been solved before. We present the MCP Workflow Engine, a novel MCP-native orchestration layer that decouples intelligence (deciding what to do) from execution (carrying it out). An agent reasons once to produce a declarative workflow blueprint--a JSON document specifying a directed sequence of MCP tool calls with parameterized templates, loops, parallel branches, and data piping. Subsequent executions are triggered by a single run_workflow tool call, consuming one invocation's worth of tokens regardless of the blueprint's internal complexity. We formalize the MCP Mediator architectural pattern--an MCP server that simultaneously acts as a client to downstream MCP servers--and implement it in TypeScript against the MCP SDK. We evaluate the engine on a production-scale Kubernetes CMDB synchronization task spanning 67 orchestrated steps across 2 MCP servers, 38 namespaces, 13 worker nodes, and 22 distinct resource types. The engine reduces per-execution token cost by over 99%, completes the full cluster graph--comprising 1,200+ nodes and 2,800+ relationships across 20 relationship types--in under 45 seconds, and achieves deterministic, idempotent execution with zero agent involvement at run time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean practical separation between one-time blueprint generation and repeated single-call execution for MCP tasks, with a real 67-step Kubernetes example showing big token savings, but the generation reliability is untested.

read the letter

The core idea here is straightforward and useful: instead of the agent reasoning through every tool call on every run, it produces a declarative JSON blueprint once, then a single run_workflow call handles the rest via an MCP Mediator that acts as both server and client. That single-invocation model is the actual novelty in the MCP setting, and the production-scale Kubernetes CMDB task (67 steps, 1200+ nodes) gives concrete numbers—under 45 seconds and over 99% token reduction per execution with zero runtime agent involvement. The TypeScript implementation against the MCP SDK and the formalization of the mediator pattern are solid engineering details that make the system feel deployable rather than just sketched. The evaluation on a real multi-namespace, multi-resource task is the strongest part; it shows the execution side works at scale and stays deterministic and idempotent. The soft spot is exactly what the stress test flags: everything rests on the agent generating a correct, complete blueprint in one pass for that 67-step graph. The abstract gives no success rate, no iteration counts, no example blueprints, and no discussion of error handling or partial failures. If generation usually needs feedback loops, the claimed savings don't hold for repeated use. No baselines or ablations either, so we can't tell how much of the win comes from the engine versus the task structure. This is the kind of paper that belongs in a systems or applied AI venue rather than a theory track. Practitioners building LLM agents with MCP will get immediate value from the architecture and the measured results, even if they have to add their own validation for blueprint quality. It deserves a serious referee because the system is implemented, the performance claim is falsifiable on the execution side, and the gap on generation reliability is fixable with additional experiments rather than fatal. I'd send it out for review with a note to strengthen the blueprint generation section.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces the MCP Workflow Engine, a novel orchestration layer for the Model Context Protocol that decouples LLM agent intelligence (single-pass generation of a declarative JSON workflow blueprint specifying tool calls, loops, branches, and data flows) from execution (triggered by one run_workflow call). It formalizes the MCP Mediator pattern (an MCP server acting as client to downstream servers), implements it in TypeScript, and evaluates it on a 67-step Kubernetes CMDB synchronization task across 2 servers, 38 namespaces, and 22 resource types, claiming >99% per-execution token reduction, <45s completion for a 1,200+ node graph, and deterministic idempotent execution with zero runtime agent involvement.

Significance. If the blueprint generation is reliable and the Mediator integrates without hidden state issues, the approach could meaningfully reduce token consumption and latency for repeated complex workflows in MCP-based agent systems, offering a practical separation of concerns in distributed tool-calling environments. The production-scale empirical results on a real Kubernetes task provide concrete, measurable evidence of potential efficiency gains, though generalization depends on unaddressed aspects of blueprint reliability.

major comments (3)

[Abstract/Evaluation] Abstract and Evaluation section: The 99% token reduction and zero runtime agent involvement claims rest on the unvalidated assumption that a correct, complete declarative blueprint for the 67-step task can be produced by the agent in a single reasoning pass. No success rates, iteration counts, failure modes, or blueprint examples are reported, making the per-execution savings non-generalizable and the separation benefit overstated.
[Evaluation] Evaluation section: Performance metrics (45s completion, 1,200+ nodes) are presented for a single task without baselines (e.g., direct agent tool-calling), ablations (e.g., varying blueprint size), or comparisons to alternative orchestration approaches, weakening the ability to attribute gains specifically to the workflow engine.
[Architecture] Architecture section (MCP Mediator description): The claim of clean integration with arbitrary downstream MCP servers and deterministic execution lacks discussion of error handling, failure propagation, hidden state management, or compatibility verification, which are load-bearing for the idempotent execution guarantee.

minor comments (1)

[Abstract] The abstract could more explicitly note limitations such as the single-task evaluation scope to better contextualize the results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments point by point below, indicating where revisions will be made to strengthen the paper.

read point-by-point responses

Referee: [Abstract/Evaluation] Abstract and Evaluation section: The 99% token reduction and zero runtime agent involvement claims rest on the unvalidated assumption that a correct, complete declarative blueprint for the 67-step task can be produced by the agent in a single reasoning pass. No success rates, iteration counts, failure modes, or blueprint examples are reported, making the per-execution savings non-generalizable and the separation benefit overstated.

Authors: The core contribution of the work is the decoupling of intelligence from execution, with the token reduction claim specifically applying to the execution phase once a valid blueprint has been generated. The manuscript does not claim or evaluate the reliability of single-pass blueprint generation, which is indeed an assumption for realizing the full benefit in practice. To address the referee's concern, we will add an example of the generated blueprint for the Kubernetes task to the revised manuscript (likely in an appendix) and clarify in the Evaluation section that the reported savings assume a correct blueprint is available. We agree that success rates and failure modes for blueprint generation are important for generalizability but fall outside the primary scope of this paper, which focuses on the workflow execution engine. revision: partial
Referee: [Evaluation] Evaluation section: Performance metrics (45s completion, 1,200+ nodes) are presented for a single task without baselines (e.g., direct agent tool-calling), ablations (e.g., varying blueprint size), or comparisons to alternative orchestration approaches, weakening the ability to attribute gains specifically to the workflow engine.

Authors: We acknowledge that the evaluation presents results for a single complex task without quantitative baselines or ablations. Direct comparison to a baseline of repeated agent tool-calling is challenging because a 67-step workflow would exceed practical token limits and context windows in a single session, making it infeasible to run as a direct baseline. We will revise the Evaluation section to include a discussion of this limitation and provide a qualitative comparison to alternative orchestration methods, such as traditional workflow engines or multi-turn agent loops, to better attribute the observed gains to the MCP Workflow Engine's design. revision: partial
Referee: [Architecture] Architecture section (MCP Mediator description): The claim of clean integration with arbitrary downstream MCP servers and deterministic execution lacks discussion of error handling, failure propagation, hidden state management, or compatibility verification, which are load-bearing for the idempotent execution guarantee.

Authors: We agree that additional detail on these aspects would strengthen the architecture description. In the revised manuscript, we will expand the Architecture section with a new subsection addressing error handling, failure propagation mechanisms, state management to ensure idempotency, and compatibility verification for downstream MCP servers. This will provide the necessary details to support the claims of deterministic and idempotent execution. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation without derivations or load-bearing self-citations

full rationale

The paper presents a systems description of the MCP Workflow Engine and reports measured performance results (99% token reduction, 45s completion on 1200+ node graph) from a single production-scale Kubernetes task. No equations, fitted parameters, predictions, or derivation chains appear in the provided text. Claims rest on direct execution metrics rather than any reduction to self-defined inputs, ansatzes, or self-citation chains. The central separation of intelligence from execution is demonstrated empirically and does not reduce to its own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the assumption that MCP servers expose sufficient interfaces for a mediator to orchestrate them declaratively and that agents can generate correct blueprints without iteration. No free parameters or invented physical entities; the mediator pattern and workflow engine are new architectural constructs.

axioms (2)

domain assumption MCP tool calls can be represented declaratively in JSON with templates, loops, parallel branches, and data piping without loss of expressiveness.
Invoked in the description of the workflow blueprint.
domain assumption Downstream MCP servers accept calls from an intermediary MCP server without requiring direct agent involvement or session state.
Core to the MCP Mediator pattern.

invented entities (2)

MCP Workflow Engine no independent evidence
purpose: Orchestration layer that executes declarative blueprints via a single run_workflow call.
New component introduced to separate intelligence from execution.
MCP Mediator no independent evidence
purpose: An MCP server that acts as both server and client to downstream MCP servers.
Architectural pattern formalized in the paper.

pith-pipeline@v0.9.0 · 5564 in / 1462 out tokens · 44793 ms · 2026-05-15T12:04:29.967793+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 9 internal anchors

[1]

GPT-4 Technical Report

Anthropic. Model Context Protocol Specification.https://modelcontextprotocol.io, 2024a. Accessed: 2025-12-01. OpenAI. GPT-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Gorilla: Large Language Model Connected with Massive APIs

Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive APIs.arXiv preprint arXiv:2305.15334,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Art: Automatic multi-step reasoning and tool-use for large language models

Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, and Marco Tulio Ribeiro. ART: Automatic multi-step reasoning and tool-use for large language models.arXiv preprint arXiv:2303.09014,

work page arXiv
[5]

Temporal Technologies

Accessed: 2025-12-01. Temporal Technologies. Temporal: Durable execution platform.https://temporal.io,

work page 2025
[6]

Argoproj

Accessed: 2025-12-01. Argoproj. Argo Workflows — container-native workflow engine for Kubernetes.https://argoproj.github. io/workflows,

work page 2025
[7]

Amazon Web Services

Accessed: 2025-12-01. Amazon Web Services. AWS Step Functions.https://aws.amazon.com/step-functions,

work page 2025
[8]

Prefect Technologies

Accessed: 2025-12-01. Prefect Technologies. Prefect: The modern data workflow orchestration platform.https://www.prefect.io,

work page 2025
[9]

Jia Yu and Rajkumar Buyya

Accessed: 2025-12-01. Jia Yu and Rajkumar Buyya. A taxonomy of workflow management systems for grid computing.Journal of Grid Computing, 3(3–4):171–200,

work page 2025
[10]

João Moura

Accessed: 2025-12-01. João Moura. CrewAI: Framework for orchestrating role-playing, autonomous AI agents.https://github. com/joaomdmoura/crewAI,

work page 2025
[11]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Accessed: 2025-12-01. Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint arXiv:2308.08155,

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. MetaGPT: Meta programming for a multi-agent collaborative framework.arXiv preprint arXiv:2308.00352,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

ChatDev: Communicative Agents for Software Development

Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong Sun. ChatDev: Communicative agents for software development.arXiv preprint arXiv:2307.07924,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Reasoning with language model is planning with world model.arXiv preprint arXiv:2305.14992,

Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and Zhiting Hu. Reasoning with language model is planning with world model.arXiv preprint arXiv:2305.14992,

work page arXiv
[15]

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen, Matei Zaharia, and James Zou. FrugalGPT: How to use large language models while reducing cost and improving performance.arXiv preprint arXiv:2305.05176,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

The Rise and Potential of Large Language Model Based Agents: A Survey

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.arXiv preprint arXiv:2309.07864,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

TaskWeaver: A code-first agent framework.arXiv preprint arXiv:2311.17541,

Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Lin, Srinivasan Rajmohan, Dongmei Zhang, and Qingwei Zhang. TaskWeaver: A code-first agent framework.arXiv preprint arXiv:2311.17541,

work page arXiv
[18]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Liber, Karthik Narasimhan, and Ofir Press. SWE- agent: Agent-computer interfaces enable automated software engineering.arXiv preprint arXiv:2405.15793,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

15 Separating Intelligence from ExecutionA Preprint A Full-Cluster Workflow Blueprint Summary The production cluster-cmdb-sync workflow comprises 67 steps across three phases

Accessed: 2025-12-01. 15 Separating Intelligence from ExecutionA Preprint A Full-Cluster Workflow Blueprint Summary The production cluster-cmdb-sync workflow comprises 67 steps across three phases. The full JSON blueprint (∼700 lines) is available in the reference implementation.Phase 1(8 steps) creates the Cluster node and syncs cluster-scoped resources:...

work page 2025