A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.
Exploring autonomous agents: A closer look at why they fail when completing tasks
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
New RPS and AGS metrics show within-family distilled LLM agents have 5.9 pp higher tool-use graph similarity than cross-family pairs, with some models exceeding their teachers.
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
An empirical study of real-world issues yields a taxonomy of 34 fault types, symptoms, and root causes in agentic AI systems, validated by 145 practitioners.
citing papers explorer
-
Inference-Time Budget Control for LLM Search Agents
A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.
-
When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors
New RPS and AGS metrics show within-family distilled LLM agents have 5.9 pp higher tool-use graph similarity than cross-family pairs, with some models exceeding their teachers.
-
Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
-
Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes
An empirical study of real-world issues yields a taxonomy of 34 fault types, symptoms, and root causes in agentic AI systems, validated by 145 practitioners.