Mcp-agentbench: Evaluating real-world language agent performance with mcp-mediated tools

Zikang Guo, Benfeng Xu, Chiwei Zhu, et al · 2025 · arXiv 2509.09734

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

cs.CR · 2026-04-08 · conditional · novelty 7.0

MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

cs.AI · 2026-04-05 · unverdicted · novelty 6.0

The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.

Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible

cs.CR · 2026-02-08 · conditional · novelty 6.0

An anonymization framework replaces sensitive UI content with deterministic placeholders to protect privacy in mobile GUI agents while preserving task performance.

Making OpenAPI Documentation Agent-Ready: Detecting Documentation and REST Smells with a Multi-Agent LLM System

cs.SE · 2026-05-14 · unverdicted · novelty 5.0

Hermes uses multi-agent LLMs to detect 2450 documentation and REST smells across 600 OpenAPI endpoints, demonstrating that structurally valid microservice APIs are often not semantically ready for agent consumption.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents cs.AI · 2026-04-05 · unverdicted · none · ref 18
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
Making OpenAPI Documentation Agent-Ready: Detecting Documentation and REST Smells with a Multi-Agent LLM System cs.SE · 2026-05-14 · unverdicted · none · ref 10
Hermes uses multi-agent LLMs to detect 2450 documentation and REST smells across 600 OpenAPI endpoints, demonstrating that structurally valid microservice APIs are often not semantically ready for agent consumption.

Mcp-agentbench: Evaluating real-world language agent performance with mcp-mediated tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer