Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.
arXiv preprint arXiv:2503.21710 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
baseline 1polarities
baseline 1representative citing papers
ARISE adds a data-flow-augmented repository graph and three-tier tool API to LLM agents, raising Function Recall@1 by 17 points, Line Recall@1 by 15 points, and Pass@1 repair rate to 22% on SWE-bench Lite.
DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.
AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
citing papers explorer
-
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench
Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.
-
ARISE: A Repository-level Graph Representation and Toolset for Agentic Fault Localization and Program Repair
ARISE adds a data-flow-augmented repository graph and three-tier tool API to LLM agents, raising Function Recall@1 by 17 points, Line Recall@1 by 15 points, and Pass@1 repair rate to 22% on SWE-bench Lite.
-
Dynamic analysis enhances issue resolution
DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.
-
AgenticSZZ: Temporal Knowledge Graph-Guided Agentic Bug-Inducing Commit Identification
AgenticSZZ reframes bug-inducing commit identification as temporal knowledge graph search navigated by an LLM agent, reporting F1 scores of 0.47-0.79 and up to 34% improvement over prior SZZ methods on three datasets.
-
Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.