SSP trains search agents without supervision by co-evolving a task proposer and solver through self-play, with RAG verification ensuring ground-truth accuracy, yielding uniform gains on benchmarks in both from-scratch and continued RL settings.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
SSP trains search agents without supervision by co-evolving a task proposer and solver through self-play, with RAG verification ensuring ground-truth accuracy, yielding uniform gains on benchmarks in both from-scratch and continued RL settings.