Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.
Proceedings of the AAAI Conference on Artificial Intelligence , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Empirical runs across 56 settings on a fixed 500-question set show non-monotonic downstream scores and preprocessing losses, leading to a call for multi-stage RAG evaluation.
citing papers explorer
-
Don't Blindly Trust It: How Unreliable Feedback Breaks Tool-Using LLM Agents
Misleading tool feedback produces value inversion in LLM agents, with performance dropping below matched no-feedback baselines on HotpotQA and similar tasks.
-
A Systems-Level Analysis of Sensitivity, Robustness, and Stability in Retrieval-Augmented Generation
Empirical runs across 56 settings on a fixed 500-question set show non-monotonic downstream scores and preprocessing losses, leading to a call for multi-stage RAG evaluation.