Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Hermes uses multi-agent LLMs to detect 2450 documentation and REST smells across 600 OpenAPI endpoints, demonstrating that structurally valid microservice APIs are often not semantically ready for agent consumption.
LLM agents enable universal interoperability by serving as automatic translators and adapters between proprietary digital services.
citing papers explorer
-
Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation
Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.
-
Making OpenAPI Documentation Agent-Ready: Detecting Documentation and REST Smells with a Multi-Agent LLM System
Hermes uses multi-agent LLMs to detect 2450 documentation and REST smells across 600 OpenAPI endpoints, demonstrating that structurally valid microservice APIs are often not semantically ready for agent consumption.
-
LLM Agents Are the Antidote to Walled Gardens
LLM agents enable universal interoperability by serving as automatic translators and adapters between proprietary digital services.