ECHO is a clipped policy-gradient method that uses posterior-sensitive rewards to give turn-level epistemic credit in multi-turn information-seeking tasks, outperforming trajectory-level GRPO on a new Clue Selector Game benchmark.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CA-BED uses Bayesian experimental design and simulated conversation trees with LLM likelihoods to optimize multi-turn question selection, reporting 21.8% higher success rates than direct prompting on entity-deduction benchmarks.
citing papers explorer
-
ECHO: Learning Epistemically Adaptive Language Agents with Turn-Level Credit
ECHO is a clipped policy-gradient method that uses posterior-sensitive rewards to give turn-level epistemic credit in multi-turn information-seeking tasks, outperforming trajectory-level GRPO on a new Clue Selector Game benchmark.
-
CA-BED: Conversation-Aware Bayesian Experimental Design
CA-BED uses Bayesian experimental design and simulated conversation trees with LLM likelihoods to optimize multi-turn question selection, reporting 21.8% higher success rates than direct prompting on entity-deduction benchmarks.