ReSeek adds self-correction via a JUDGE action and a dense instructive reward (correctness plus utility) to RL training of search agents, yielding higher success and faithfulness on a new contamination-resistant benchmark.
Training large language models for retrieval-augmented question answering through backtracking correc- tion
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards
ReSeek adds self-correction via a JUDGE action and a dense instructive reward (correctness plus utility) to RL training of search agents, yielding higher success and faithfulness on a new contamination-resistant benchmark.