Title resolution pending

Zhenting Wang, Qi Chang, Hemani Patel, Shashank Biju, Cheng-En Wu, Quan Liu, Aolin Ding, Alireza Rezazadeh, Ankit Shah, Yujia Bao, Eugene Siow

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

PHMForge: Evaluating LLM Agents on Industrial Prognostics through MCP-Native, Algorithm-Grounded Tools

cs.AI · 2026-04-02 · unverdicted · novelty 7.0

PHMForge benchmark shows LLM agents achieve 80.8% pass@1 on prognostic tasks with native MCP tools but performance collapses from 100% to 20% when using text RAG instead.

Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation

cs.SE · 2026-02-11 · unverdicted · novelty 7.0

Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.

citing papers explorer

Showing 2 of 2 citing papers.

PHMForge: Evaluating LLM Agents on Industrial Prognostics through MCP-Native, Algorithm-Grounded Tools cs.AI · 2026-04-02 · unverdicted · none · ref 25
PHMForge benchmark shows LLM agents achieve 80.8% pass@1 on prognostic tasks with native MCP tools but performance collapses from 100% to 20% when using text RAG instead.
Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation cs.SE · 2026-02-11 · unverdicted · none · ref 24
Agent-Diff benchmarks LLM agents on enterprise API tasks using code execution and state-diff contracts to define success, evaluated on nine models across 224 tasks with code released.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer