Pith Number

pith:IS2Y2FVN

pith:2025:IS2Y2FVNQ5CACKYP7PUWAEPTDP

not attested not anchored not stored refs resolved

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Chenzheng Zhu, Fan Yang, Haofen Wang, Haoze Sun, Huajun Chen, Jeff Z. Pan, Linzhuang Sun, Mingyang Chen, Tianpeng Li, Weipeng Chen, Wen Zhang, Yijie Zhou, Zenan Zhou

ReSearch trains LLMs to interleave search operations with text reasoning using only outcome-based reinforcement learning rewards.

arxiv:2503.19470 v3 · 2025-03-25 · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{IS2Y2FVNQ5CACKYP7PUWAEPTDP}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. Despite being trained on only one dataset, our models demonstrate strong generalizability across various benchmarks.

C2weakest assumption

That outcome-based reinforcement learning rewards alone are sufficient to train effective search timing and integration without any supervised reasoning traces or explicit search supervision.

C3one line summary

ReSearch trains LLMs via RL to integrate search operations into reasoning steps, achieving strong generalization across benchmarks and eliciting reflection and self-correction without supervised reasoning data.

References

44 extracted · 44 resolved · 14 Pith anchors

[1] Claude 3.7 sonnet and claude code, 2025 2025

[2] Self-rag: Learning to retrieve, generate, and critique through self-reflection 2024

[3] Rq-rag: Learning to refine queries for retrieval augmented generation 2024

[4] Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, and Tianyu Pang 2024

[5] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948

Cited by

43 papers in Pith

Supervising the search process produces reliable and generalizable information-seeking agents

Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Learning from Self-Debate: Preparing Reasoning Models for Multi-Agent Debate

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Receipt and verification

First computed	2026-05-17T23:38:47.395691Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

44b58d16ad8744012b0ffbe96011f31bf53e5dbcc3f713af70316ef5b8f3a5f0

Aliases

arxiv: 2503.19470 · arxiv_version: 2503.19470v3 · doi: 10.48550/arxiv.2503.19470 · pith_short_12: IS2Y2FVNQ5CA · pith_short_16: IS2Y2FVNQ5CACKYP · pith_short_8: IS2Y2FVN

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/IS2Y2FVNQ5CACKYP7PUWAEPTDP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 44b58d16ad8744012b0ffbe96011f31bf53e5dbcc3f713af70316ef5b8f3a5f0

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e59522c92d3b0f71aafdef1fc393fd60031cc735838a4baa4cae25ef063974e1",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-03-25T09:00:58Z",
    "title_canon_sha256": "bf05ce1fc3a58133438a96c34ba9f399e45a1ef5ac857af372a738e3eca2b82e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2503.19470",
    "kind": "arxiv",
    "version": 3
  }
}