ECHO is a clipped policy-gradient method that uses posterior-sensitive rewards to give turn-level epistemic credit in multi-turn information-seeking tasks, outperforming trajectory-level GRPO on a new Clue Selector Game benchmark.
Bayesian rein- forcement learning: A survey.Foundations and Trends® in Machine Learning, 8(5-6): 359–483, November 2015
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MA 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ECHO: Learning Epistemically Adaptive Language Agents with Turn-Level Credit
ECHO is a clipped policy-gradient method that uses posterior-sensitive rewards to give turn-level epistemic credit in multi-turn information-seeking tasks, outperforming trajectory-level GRPO on a new Clue Selector Game benchmark.