pith. sign in

Playing 20 question game with policy-based reinforcement learning, 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.MA 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • ECHO: Learning Epistemically Adaptive Language Agents with Turn-Level Credit cs.MA · 2026-06-29 · unverdicted · none · ref 25 · internal anchor

    ECHO is a clipped policy-gradient method that uses posterior-sensitive rewards to give turn-level epistemic credit in multi-turn information-seeking tasks, outperforming trajectory-level GRPO on a new Clue Selector Game benchmark.