Pith Number

pith:KQ3KH2EU

pith:2025:KQ3KH2EUT2DCCAHZ6FTOZNVZI2

not attested not anchored not stored refs resolved

DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

Dayuan Fu, Lyumanshan Ye, Pengfei Liu, Pengrui Lu, Xiangkun Hu, Xiaojie Cai, Yuxiang Zheng

End-to-end RL training on the open web lets LLM agents outperform prompt and RAG baselines by up to 28.9 points while developing planning and self-reflection.

arxiv:2504.03160 v4 · 2025-04-04 · cs.AI · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{KQ3KH2EUT2DCCAHZ6FTOZNVZI2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents, with emergent cognitive behaviors including planning, cross-validation, self-reflection, and honesty.

C2weakest assumption

That the multi-agent browsing architecture can reliably extract information from arbitrary real-world webpage structures at scale without introducing systematic biases or instability that would undermine the reported performance gains.

C3one line summary

End-to-end RL in authentic web environments produces LLM research agents that outperform prompt-engineering and RAG-based baselines by up to 28.9 and 7.2 points respectively while exhibiting emergent planning, cross-validation, and self-reflection.

References

18 extracted · 18 resolved · 2 Pith anchors

[1] Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi 2025

[2] Timo Schick, Jane Dwivedi-Yu, Roberto Dess`ı, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselv 2023

[3] R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning 2025 · arXiv:2503.05592

[4] Kimi k1.5: Scaling Reinforcement Learning with LLMs 2025 · arXiv:2501.12599

[5] Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2022. MuSiQue: Multihop questions via single-hop question composition. Transactions of the Association for Computational Lin 2022

Cited by

31 papers in Pith

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

LLM-Oriented Information Retrieval: A Denoising-First Perspective

Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models

MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models

Receipt and verification

First computed	2026-05-17T23:38:46.762485Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

5436a3e8949e862100f9f166ecb6b94699bef612eebd8a025c2452e9a6a41bd3

Aliases

arxiv: 2504.03160 · arxiv_version: 2504.03160v4 · doi: 10.48550/arxiv.2504.03160 · pith_short_12: KQ3KH2EUT2DC · pith_short_16: KQ3KH2EUT2DCCAHZ · pith_short_8: KQ3KH2EU

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/KQ3KH2EUT2DCCAHZ6FTOZNVZI2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5436a3e8949e862100f9f166ecb6b94699bef612eebd8a025c2452e9a6a41bd3

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "5a6ac865085b2664dc78e29bc353e509efef8d2704b7e734cf0b98455ba9cff6",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-04-04T04:41:28Z",
    "title_canon_sha256": "7996dd2f6a35c010b6913abfcca9e480138c0124154c22fbbe4a9e4b99e57ace"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.03160",
    "kind": "arxiv",
    "version": 4
  }
}