pith:IS2Y2FVN
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
ReSearch trains LLMs to interleave search operations with text reasoning using only outcome-based reinforcement learning rewards.
arxiv:2503.19470 v3 · 2025-03-25 · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IS2Y2FVNQ5CACKYP7PUWAEPTDP}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. Despite being trained on only one dataset, our models demonstrate strong generalizability across various benchmarks.
That outcome-based reinforcement learning rewards alone are sufficient to train effective search timing and integration without any supervised reasoning traces or explicit search supervision.
ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.
References
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.395691Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
44b58d16ad8744012b0ffbe96011f31bf53e5dbcc3f713af70316ef5b8f3a5f0
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IS2Y2FVNQ5CACKYP7PUWAEPTDP \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 44b58d16ad8744012b0ffbe96011f31bf53e5dbcc3f713af70316ef5b8f3a5f0
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "e59522c92d3b0f71aafdef1fc393fd60031cc735838a4baa4cae25ef063974e1",
"cross_cats_sorted": [
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.AI",
"submitted_at": "2025-03-25T09:00:58Z",
"title_canon_sha256": "bf05ce1fc3a58133438a96c34ba9f399e45a1ef5ac857af372a738e3eca2b82e"
},
"schema_version": "1.0",
"source": {
"id": "2503.19470",
"kind": "arxiv",
"version": 3
}
}