pith. sign in
Pith Number

pith:PBT2I4KO

pith:2025:PBT2I4KORUUAHC2OC2BVYEPYJJ
not attested not anchored not stored refs resolved

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Chen Gao, Chenyang Shao, Fanjin Meng, Fengli Xu, Jiahui Gong, Jie Feng, Jingwei Wang, Jingyi Wang, Qianyue Hao, Qinglong Yang, Sijian Ren, Tianjian Ouyang, Xiaochong Lan, Xinyuan Hu, Yiwen Song, Yong Li, Yu Li, Yunke Zhang, Yuwei Yan, Zefang Zong

Reinforcement learning on reasoning trajectories combined with test-time token scaling points toward Large Reasoning Models.

arxiv:2501.09686 v3 · 2025-01-16 · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{PBT2I4KORUUAHC2OC2BVYEPYJJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction.

C2weakest assumption

That reinforcement learning applied to reasoning trajectories will reliably expand LLMs' reasoning capacity without introducing systematic biases or hallucinations that are harder to detect than in standard generation.

C3one line summary

The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

References

202 extracted · 202 resolved · 48 Pith anchors

[1] Phi-4 Technical Report 2024 · arXiv:2412.08905
[2] GPT-4 Technical Report 2023 · arXiv:2303.08774
[3] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 2022 · arXiv:2204.01691
[4] arXiv preprint arXiv:2402.10571 , year= 2024
[5] Mathqa: Towards interpretable math word problem solving with operation-based formalisms, 2019 2019

Cited by

36 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.132380Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7867a4714e8d28038b4e16835c11f84a74534e244ec7b575293df3293f5be1cf

Aliases

arxiv: 2501.09686 · arxiv_version: 2501.09686v3 · doi: 10.48550/arxiv.2501.09686 · pith_short_12: PBT2I4KORUUA · pith_short_16: PBT2I4KORUUAHC2O · pith_short_8: PBT2I4KO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PBT2I4KORUUAHC2OC2BVYEPYJJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7867a4714e8d28038b4e16835c11f84a74534e244ec7b575293df3293f5be1cf
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "282c5a48b28b73fee08160a2e957058b7f8c773d182bcfbe789042d75bb24b76",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-01-16T17:37:58Z",
    "title_canon_sha256": "27a29be91192a11f36ffa1b46e5ee199fa483d41b5aac49cfac0e14c1b975c54"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.09686",
    "kind": "arxiv",
    "version": 3
  }
}