The confidence dichotomy: Analyzing and mitigating miscalibration in tool-use agents

Weihao Xuan, Qingcheng Zeng, Heli Qi, Yunze Xiao, Junjue Wang, Naoto Yokoya · 2026 · arXiv 2601.07264

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Don't Blindly Trust It: How Unreliable Feedback Breaks Tool-Using LLM Agents

cs.AI · 2026-06-19 · unverdicted · novelty 6.0

Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.

Calibration Is Not Control: Why LLM-Agent Oversight Needs Intervention

cs.AI · 2026-06-19 · unverdicted · novelty 6.0

Action-conditioned estimation of intervention advantage via prefix branching reduces control regret over calibrated scalar risk scores in LLM agent oversight across benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Don't Blindly Trust It: How Unreliable Feedback Breaks Tool-Using LLM Agents cs.AI · 2026-06-19 · unverdicted · none · ref 55
Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.
Calibration Is Not Control: Why LLM-Agent Oversight Needs Intervention cs.AI · 2026-06-19 · unverdicted · none · ref 15
Action-conditioned estimation of intervention advantage via prefix branching reduces control regret over calibrated scalar risk scores in LLM agent oversight across benchmarks.

The confidence dichotomy: Analyzing and mitigating miscalibration in tool-use agents

fields

years

verdicts

representative citing papers

citing papers explorer