End-to-end RL in authentic web environments produces LLM research agents that outperform prompt-engineering and RAG-based baselines by up to 28.9 and 7.2 points respectively while exhibiting emergent planning, cross-validation, and self-reflection.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
End-to-end RL in authentic web environments produces LLM research agents that outperform prompt-engineering and RAG-based baselines by up to 28.9 and 7.2 points respectively while exhibiting emergent planning, cross-validation, and self-reflection.