Joint RL training of reasoning agent and document ranker via GRPO with semantic grouping and composite rewards yields consistent gains over fixed-retrieval baselines on seven QA benchmarks.
Lights of New York (1916 film)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CoSearch: Joint Training of Reasoning and Document Ranking via Reinforcement Learning for Agentic Search
Joint RL training of reasoning agent and document ranker via GRPO with semantic grouping and composite rewards yields consistent gains over fixed-retrieval baselines on seven QA benchmarks.