CW-GRPO weights GRPO advantages with per-round contribution scores from an LLM judge, improving search agent performance by 5.0% on Qwen3-8B and 6.3% on Qwen3-1.7B over standard GRPO.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
CW-GRPO weights GRPO advantages with per-round contribution scores from an LLM judge, improving search agent performance by 5.0% on Qwen3-8B and 6.3% on Qwen3-1.7B over standard GRPO.