Pith Number

pith:PBT2I4KO

pith:2025:PBT2I4KORUUAHC2OC2BVYEPYJJ

not attested not anchored not stored refs resolved

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Chen Gao, Chenyang Shao, Fanjin Meng, Fengli Xu, Jiahui Gong, Jie Feng, Jingwei Wang, Jingyi Wang, Qianyue Hao, Qinglong Yang, Sijian Ren, Tianjian Ouyang, Xiaochong Lan, Xinyuan Hu, Yiwen Song, Yong Li, Yu Li, Yunke Zhang, Yuwei Yan, Zefang Zong

Reinforcement learning on reasoning trajectories combined with test-time token scaling points toward Large Reasoning Models.

arxiv:2501.09686 v3 · 2025-01-16 · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{PBT2I4KORUUAHC2OC2BVYEPYJJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction.

C2weakest assumption

That reinforcement learning applied to reasoning trajectories will reliably expand LLMs' reasoning capacity without introducing systematic biases or hallucinations that are harder to detect than in standard generation.

C3one line summary

The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

References

202 extracted · 202 resolved · 48 Pith anchors

[1] Phi-4 Technical Report 2024 · arXiv:2412.08905

[2] GPT-4 Technical Report 2023 · arXiv:2303.08774

[3] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 2022 · arXiv:2204.01691

[4] arXiv preprint arXiv:2402.10571 , year= 2024

[5] Mathqa: Towards interpretable math word problem solving with operation-based formalisms, 2019 2019

Cited by

36 papers in Pith

Large Language Models for Multi-Robot Systems: A Survey

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles

Bayesian Social Deduction with Graph-Informed Language Models

Receipt and verification

First computed	2026-05-17T23:38:50.132380Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

7867a4714e8d28038b4e16835c11f84a74534e244ec7b575293df3293f5be1cf

Aliases

arxiv: 2501.09686 · arxiv_version: 2501.09686v3 · doi: 10.48550/arxiv.2501.09686 · pith_short_12: PBT2I4KORUUA · pith_short_16: PBT2I4KORUUAHC2O · pith_short_8: PBT2I4KO

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/PBT2I4KORUUAHC2OC2BVYEPYJJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7867a4714e8d28038b4e16835c11f84a74534e244ec7b575293df3293f5be1cf

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "282c5a48b28b73fee08160a2e957058b7f8c773d182bcfbe789042d75bb24b76",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-01-16T17:37:58Z",
    "title_canon_sha256": "27a29be91192a11f36ffa1b46e5ee199fa483d41b5aac49cfac0e14c1b975c54"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.09686",
    "kind": "arxiv",
    "version": 3
  }
}