pith. sign in
Pith Number

pith:IS2Y2FVN

pith:2025:IS2Y2FVNQ5CACKYP7PUWAEPTDP
not attested not anchored not stored refs resolved

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Chenzheng Zhu, Fan Yang, Haofen Wang, Haoze Sun, Huajun Chen, Jeff Z. Pan, Linzhuang Sun, Mingyang Chen, Tianpeng Li, Weipeng Chen, Wen Zhang, Yijie Zhou, Zenan Zhou

ReSearch trains LLMs to interleave search operations with text reasoning using only outcome-based reinforcement learning rewards.

arxiv:2503.19470 v3 · 2025-03-25 · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IS2Y2FVNQ5CACKYP7PUWAEPTDP}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. Despite being trained on only one dataset, our models demonstrate strong generalizability across various benchmarks.

C2weakest assumption

That outcome-based reinforcement learning rewards alone are sufficient to train effective search timing and integration without any supervised reasoning traces or explicit search supervision.

C3one line summary

ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.

References

44 extracted · 44 resolved · 14 Pith anchors

[1] Claude 3.7 sonnet and claude code, 2025 2025
[2] Self-rag: Learning to retrieve, generate, and critique through self-reflection 2024
[3] Rq-rag: Learning to refine queries for retrieval augmented generation 2024
[4] Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, and Tianyu Pang 2024
[5] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948

Cited by

43 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.395691Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

44b58d16ad8744012b0ffbe96011f31bf53e5dbcc3f713af70316ef5b8f3a5f0

Aliases

arxiv: 2503.19470 · arxiv_version: 2503.19470v3 · doi: 10.48550/arxiv.2503.19470 · pith_short_12: IS2Y2FVNQ5CA · pith_short_16: IS2Y2FVNQ5CACKYP · pith_short_8: IS2Y2FVN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IS2Y2FVNQ5CACKYP7PUWAEPTDP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 44b58d16ad8744012b0ffbe96011f31bf53e5dbcc3f713af70316ef5b8f3a5f0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e59522c92d3b0f71aafdef1fc393fd60031cc735838a4baa4cae25ef063974e1",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-03-25T09:00:58Z",
    "title_canon_sha256": "bf05ce1fc3a58133438a96c34ba9f399e45a1ef5ac857af372a738e3eca2b82e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2503.19470",
    "kind": "arxiv",
    "version": 3
  }
}