pith. sign in

arxiv: 2605.22733 · v1 · pith:ZLHSWGWWnew · submitted 2026-05-21 · 💻 cs.AI · cs.SE

HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools

Pith reviewed 2026-05-22 05:17 UTC · model grok-4.3

classification 💻 cs.AI cs.SE
keywords Python frameworkMCP toolsstreaming APIsLLM agentscode generationunified interfacesServer-Sent Eventsboilerplate reduction
0
0 comments X

The pith

A single typed skill definition automatically produces a streaming HTTP endpoint, OpenAPI UI, and MCP tool registration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Developers currently maintain separate code for HTTP APIs used by humans and CI, and for MCP tools used by AI agents, even though the core logic is the same. HarnessAPI treats a folder containing handler.py and Pydantic schemas as the single source of truth. From this, the framework derives Server-Sent Events streaming, an interactive Swagger interface, and a ready-to-use MCP tool, all running in one process. Dual content negotiation allows the same code to handle both streaming and standard JSON requests without modification. A special code generation step ensures type information reaches the MCP registration layer correctly.

Core claim

HarnessAPI provides a skill-first framework where one handler.py plus Pydantic schemas suffice to generate a streaming HTTP endpoint with Server-Sent Events, an interactive OpenAPI/Swagger UI, and a zero-configuration MCP tool, all served from a single process, while reducing framework-facing boilerplate by 74 percent compared to manual dual-stack implementations.

What carries the argument

The skill folder containing handler.py and Pydantic schemas, supported by a dynamic code-generation mechanism that propagates type annotations correctly to the tool registration layer.

Load-bearing premise

The dynamic code-generation mechanism successfully propagates Pydantic type annotations to the tool registration layer without errors or loss of information.

What would settle it

Create a skill with a complex nested Pydantic model and check whether the generated MCP tool schema matches the expected structure from the HTTP endpoint.

Figures

Figures reproduced from arXiv: 2605.22733 by Edwin Jose.

Figure 1
Figure 1. Figure 1: HarnessAPI architecture. Discovery runs once at startup; both transport projections resolve to the same [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SSE streaming sequence for a streaming handler. A non-streaming handler emits a single [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Skill discovery pipeline. Metadata is merged from [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Every Python function deployed as an LLM tool must today exist in two forms: an HTTP endpoint for human-facing clients and CI pipelines, and an MCP tool registration for agent runtimes such as Claude and Cursor. These representations share business logic yet diverge in all the surrounding machinery (routing, validation, serialisation, streaming, and schema maintenance), and they drift apart as the underlying code evolves. We present HarnessAPI, a Python framework that eliminates this duplication by treating a typed skill folder as the single source of truth. From one handler.py plus Pydantic schemas, the framework automatically derives a streaming HTTP endpoint with Server-Sent Events, an interactive OpenAPI/Swagger UI, and a zero-configuration MCP tool, all served from a single process. Dual-mode content negotiation lets the same handler serve SSE-streaming and JSON-returning clients with no handler changes. A dynamic code-generation mechanism ensures Pydantic type annotations propagate correctly to FastMCP's inspection layer, resolving a technical limitation that prevents naive closure-based registration. Measured across six representative skills using cloc, HarnessAPI reduces framework-facing boilerplate by 74% compared with a manually maintained dual-stack implementation (FastAPI server + FastMCP server). HarnessAPI subclasses FastAPI, inheriting its full middleware, dependency-injection, and deployment ecosystem. It is available at https://github.com/edwinjosechittilappilly/harnessapi and on PyPI (pip install harnessapi)

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents HarnessAPI, a Python framework that treats a typed skill folder (one handler.py plus Pydantic schemas) as the single source of truth. From this definition the framework automatically derives a streaming HTTP endpoint using Server-Sent Events, an interactive OpenAPI/Swagger UI, and a zero-configuration MCP tool registration, all served from a single process. Dual-mode content negotiation allows the same handler to serve both SSE-streaming and JSON clients without modification. A dynamic code-generation step is introduced to propagate Pydantic type annotations into FastMCP’s inspection layer. Across six representative skills, cloc measurements show a 74 % reduction in framework-facing boilerplate relative to a manually maintained dual-stack (FastAPI + FastMCP) implementation. HarnessAPI subclasses FastAPI and is released on GitHub and PyPI.

Significance. If the dynamic code-generation step correctly preserves complex Pydantic constructs and if the 74 % reduction generalizes beyond the six evaluated skills, the framework would meaningfully reduce duplication and drift for developers who must expose the same business logic to both human-facing clients and agent runtimes such as Claude or Cursor. The inheritance from FastAPI and the provision of reproducible code on GitHub constitute concrete engineering strengths that facilitate adoption and further experimentation.

major comments (2)
  1. [Abstract] Abstract: the 74 % boilerplate reduction is quantified via cloc on six skills, yet no description is given of how the manual dual-stack baseline was constructed, which specific skills were chosen, or whether the comparison controlled for equivalent functionality (routing, validation, streaming, and schema maintenance). This detail is load-bearing for the central empirical claim.
  2. [Dynamic code-generation mechanism] Dynamic code-generation mechanism (abstract and implementation description): the paper states that this step resolves the technical limitation preventing naive closure-based registration and ensures Pydantic annotations propagate to FastMCP. Without concrete verification or examples covering nested models, Optional fields, custom validators, or streaming return types, the single-source-of-truth guarantee remains unproven and could silently fail for realistic skills.
minor comments (1)
  1. A table listing per-skill line counts for both the HarnessAPI and dual-stack versions would make the 74 % aggregate figure more transparent and allow readers to assess variability across skills.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and have revised the paper to incorporate additional details and examples as suggested.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the 74 % boilerplate reduction is quantified via cloc on six skills, yet no description is given of how the manual dual-stack baseline was constructed, which specific skills were chosen, or whether the comparison controlled for equivalent functionality (routing, validation, streaming, and schema maintenance). This detail is load-bearing for the central empirical claim.

    Authors: We agree that more explicit details on the evaluation methodology strengthen the central empirical claim. The revised manuscript expands the Evaluation section to describe the six representative skills (covering data transformation, external API integration, real-time analytics, file processing, authentication flows, and streaming summarization), the construction of the manual dual-stack baseline (separate, fully functional FastAPI and FastMCP codebases written to match HarnessAPI capabilities exactly), and confirmation that both implementations were controlled for equivalent functionality in routing, Pydantic validation, SSE streaming, OpenAPI schema exposure, and schema maintenance. Cloc measurements were performed on the framework-facing code only, excluding business logic. revision: yes

  2. Referee: [Dynamic code-generation mechanism] Dynamic code-generation mechanism (abstract and implementation description): the paper states that this step resolves the technical limitation preventing naive closure-based registration and ensures Pydantic annotations propagate to FastMCP. Without concrete verification or examples covering nested models, Optional fields, custom validators, or streaming return types, the single-source-of-truth guarantee remains unproven and could silently fail for realistic skills.

    Authors: We acknowledge that the original description would benefit from concrete verification. The revised Implementation section now includes a new subsection with explicit code examples and test cases for nested Pydantic models, Optional fields, custom validators (including root validators), and streaming return types. These cases demonstrate that the dynamic code-generation step correctly extracts and forwards all annotations to FastMCP’s inspection layer, with no loss of type information or silent failures, thereby supporting the single-source-of-truth guarantee for realistic skills. revision: yes

Circularity Check

0 steps flagged

No circularity: implementation description with direct measurement

full rationale

The paper describes a software framework (HarnessAPI) that unifies HTTP endpoints, OpenAPI UI, and MCP tool registration from a single handler.py plus Pydantic schemas. It reports a 74% boilerplate reduction measured via cloc on six skills and notes inheritance from FastAPI. No equations, fitted parameters, predictions, or self-referential derivations appear. The dynamic code-generation step is presented as an engineering solution to a FastMCP limitation rather than a result that reduces to its own inputs by construction. The work is self-contained against external benchmarks (GitHub, PyPI, cloc counts) with no load-bearing self-citations or ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that Pydantic schemas and FastAPI/FastMCP can be extended via dynamic generation to produce consistent dual representations without additional per-client code.

axioms (1)
  • domain assumption Pydantic type annotations can be reliably extracted and forwarded to FastMCP via code generation
    This is invoked to resolve the stated limitation of naive closure-based registration.

pith-pipeline@v0.9.0 · 5786 in / 1232 out tokens · 63102 ms · 2026-05-22T05:17:32.363453+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 8 internal anchors

  1. [1]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InProceedings of the 11th International Conference on Learning Representations (ICLR 2023), 2023. arXiv:2210.03629

  2. [2]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv:2308.08155, 2023. arXiv:2308.08155

  3. [3]

    A Survey on Large Language Model based Autonomous Agents

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents.Frontiers of Computer Science, 2024. arXiv:2308.11432

  4. [4]

    The Rise and Potential of Large Language Model Based Agents: A Survey

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, a...

  5. [5]

    Model context protocol (MCP): An open protocol for connecting AI assistants to data sources and tools.https://modelcontextprotocol.io, 2024

    Anthropic. Model context protocol (MCP): An open protocol for connecting AI assistants to data sources and tools.https://modelcontextprotocol.io, 2024. Accessed: May 2026

  6. [6]

    Model context protocol (MCP): A lightweight, modular framework for tool-augmented LLM agents

    Nisharg Nargund, Anil Kumar Swain, and Naliniprava Behera. Model context protocol (MCP): A lightweight, modular framework for tool-augmented LLM agents. InProceedings of the 2025 IEEE International Symposium on Embedded Design (ISED), 2025

  7. [7]

    A survey of agent interoperability protocols: Model context protocol (MCP), agent communication protocol (ACP), agent-to-agent protocol (A2A), and agent network protocol (ANP)

    Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. A survey of agent interoperability protocols: Model context protocol (MCP), agent communication protocol (ACP), agent-to-agent protocol (A2A), and agent network protocol (ANP). arXiv:2505.02279, 2025. arXiv:2505.02279

  8. [8]

    M. A. Ala’anzy and Zhandos Yeshpatov. A performance and scalability evaluation of monolithic (Django) vs. microservice (FastAPI) architectures for asynchronous API workloads in Python. InProceedings of the 2026 IEEE International Conference on Electronics, Computers and Computation (ICECCO), 2026. 10 HarnessAPI: A Skill-First Framework for Unified Streami...

  9. [9]

    Gorilla: Large Language Model Connected with Massive APIs

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive APIs. arXiv:2305.15334, 2023. arXiv:2305.15334

  10. [10]

    From REST to MCP: An Empirical Study of API Wrapping and Automated Server Generation for LLM Agents

    Meriem Mastouri, Emna Ksontini, Amine Barrak, and Wael Kessentini. From REST to MCP: An empirical study of API wrapping and automated server generation for LLM agents. arXiv:2507.16044, 2025. arXiv:2507.16044

  11. [11]

    The ABC of software engineering research.ACM Transactions on Software Engineering and Methodology, 27(3):11:1–11:51, 2018

    Klaas-Jan Stol and Brian Fitzgerald. The ABC of software engineering research.ACM Transactions on Software Engineering and Methodology, 27(3):11:1–11:51, 2018

  12. [12]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. arXiv:2307.16789, 2023. arXiv:2307.16789

  13. [13]

    Asynchronous LLM function calling.arXiv preprint arXiv:2412.07017, 2024

    In Gim, Seung seob Lee, and Lin Zhong. Asynchronous LLM function calling. arXiv: 2412.07017, 2024. arXiv:2412.07017

  14. [14]

    The evolution of tool use in LLM agents: From single-tool call to multi-tool orchestration

    Haoyuan Xu, Chang Li, Xinyan Ma, Xianhao Ou, Zihan Zhang, Tao He, Xiangyu Liu, Zixiang Wang, Jiafeng Liang, Zheng Chu, Runxuan Liu, Rongchuan Mu, Dandan Tu, Ming Liu, and Bing Qin. The evolution of tool use in LLM agents: From single-tool call to multi-tool orchestration. arXiv:2603.22862, 2026. arXiv:2603.22862

  15. [15]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS 2020), 2020. arXiv:2005.11401

  16. [16]

    Elias Lumer, Anmol Gulati, V . K. Subbiah, P. Basavaraju, and James A. Burke. ScaleMCP: Dynamic and auto-synchronizing model context protocol tools for LLM agents. arXiv:2505.06416, 2025. arXiv:2505.06416

  17. [17]

    ToolFactory: Automating tool generation by leveraging LLM to understand REST API documentations

    Xinyi Ni, Qiuyang Wang, Yukun Zhang, and Pengyu Hong. ToolFactory: Automating tool generation by leveraging LLM to understand REST API documentations. arXiv:2501.16945, 2025. arXiv:2501.16945

  18. [18]

    Validating API design requirements for interoperabil- ity: A static analysis approach using OpenAPI

    Edwin Sundberg, Thea Ekmark, and Workneh Yilma Ayele. Validating API design requirements for interoperabil- ity: A static analysis approach using OpenAPI. InCompanion Proceedings of the 18th IFIP Working Conference on the Practice of Enterprise Modelling (PoEM 2025), 2025. arXiv:2511.17836

  19. [19]

    Performance evaluation of microservices communication with REST, GraphQL, and gRPC

    Muhammad Niswar, Reza Arisandy Safruddin, Anugrayani Bustamin, and Iqra Aswad. Performance evaluation of microservices communication with REST, GraphQL, and gRPC. InInternational Journal of Electronics and Telecommunications, pages 429–436, 2024

  20. [20]

    Gulavani, Alexey Tumanov, and Ramachandran Ramjee

    Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee. Taming throughput-latency tradeoff in LLM inference with sarathi-serve. arXiv:2403.02310, 2024. arXiv:2403.02310. 11