pith. sign in
Pith Number

pith:KQ3KH2EU

pith:2025:KQ3KH2EUT2DCCAHZ6FTOZNVZI2
not attested not anchored not stored refs resolved

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

Dayuan Fu, Lyumanshan Ye, Pengfei Liu, Pengrui Lu, Xiangkun Hu, Xiaojie Cai, Yuxiang Zheng

End-to-end RL training on the open web lets LLM agents outperform prompt and RAG baselines by up to 28.9 points while developing planning and self-reflection.

arxiv:2504.03160 v4 · 2025-04-04 · cs.AI · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KQ3KH2EUT2DCCAHZ6FTOZNVZI2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents, with emergent cognitive behaviors including planning, cross-validation, self-reflection, and honesty.

C2weakest assumption

That the multi-agent browsing architecture can reliably extract information from arbitrary real-world webpage structures at scale without introducing systematic biases or instability that would undermine the reported performance gains.

C3one line summary

End-to-end RL in authentic web environments produces LLM research agents that outperform prompt-engineering and RAG-based baselines by up to 28.9 and 7.2 points respectively while exhibiting emergent planning, cross-validation, and self-reflection.

References

18 extracted · 18 resolved · 2 Pith anchors

[1] Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi 2025
[2] Timo Schick, Jane Dwivedi-Yu, Roberto Dess`ı, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselv 2023
[3] R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning 2025 · arXiv:2503.05592
[4] Kimi k1.5: Scaling Reinforcement Learning with LLMs 2025 · arXiv:2501.12599
[5] Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2022. MuSiQue: Multihop questions via single-hop question composition. Transactions of the Association for Computational Lin 2022

Cited by

31 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.762485Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5436a3e8949e862100f9f166ecb6b94699bef612eebd8a025c2452e9a6a41bd3

Aliases

arxiv: 2504.03160 · arxiv_version: 2504.03160v4 · doi: 10.48550/arxiv.2504.03160 · pith_short_12: KQ3KH2EUT2DC · pith_short_16: KQ3KH2EUT2DCCAHZ · pith_short_8: KQ3KH2EU
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KQ3KH2EUT2DCCAHZ6FTOZNVZI2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5436a3e8949e862100f9f166ecb6b94699bef612eebd8a025c2452e9a6a41bd3
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5a6ac865085b2664dc78e29bc353e509efef8d2704b7e734cf0b98455ba9cff6",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-04-04T04:41:28Z",
    "title_canon_sha256": "7996dd2f6a35c010b6913abfcca9e480138c0124154c22fbbe4a9e4b99e57ace"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.03160",
    "kind": "arxiv",
    "version": 4
  }
}