Pith Number

pith:Q7OXT2D7

pith:2025:Q7OXT2D7TV2U5BL2A5ITHAAZGE

not attested not anchored not stored refs resolved

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Bryan Dai, Chong Luo, Haoming Luo, Joey Zhou, Kai Qiu, Qingnan Ren, Tian Xie, Yuqian Hong, Zhirong Wu, Zitian Gao

Rule-based RL on 5K logic puzzles induces reflection and verification in a 7B model that transfers to AIME and AMC.

arxiv:2502.14768 v1 · 2025-02-20 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{Q7OXT2D7TV2U5BL2A5ITHAAZGE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.

C2weakest assumption

That the advanced reasoning behaviors (reflection, verification, summarization) are induced by the RL process rather than already latent in the base 7B model or triggered by the system prompt alone.

C3one line summary

Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.

References

27 extracted · 27 resolved · 0 Pith anchors

[1] Le, Sergey Levine, and Yi Ma 2025

[2] Training verifiers to solve math word problems 2021

[3] DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Sha 2025

[4] Alphazero-like tree-search can guide large language model decoding and training, 2024 2024

[5] Omni-math: A universal olympiad level mathematic benchmark for large language models, 2024 2024

Formal links

2 machine-checked theorem links

Cited by

33 papers in Pith

A Survey of Scaling in Large Language Model Reasoning

LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling

PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Receipt and verification

First computed	2026-05-17T23:38:46.595890Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

87dd79e87f9d754e857a0751338019311f6aa80cf62ca2dad3f15188522b86b2

Aliases

arxiv: 2502.14768 · arxiv_version: 2502.14768v1 · doi: 10.48550/arxiv.2502.14768 · pith_short_12: Q7OXT2D7TV2U · pith_short_16: Q7OXT2D7TV2U5BL2 · pith_short_8: Q7OXT2D7

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/Q7OXT2D7TV2U5BL2A5ITHAAZGE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 87dd79e87f9d754e857a0751338019311f6aa80cf62ca2dad3f15188522b86b2

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "99e9eb6e20d4e54ae62f41a827a7d314d315ad5ae79695f60a26a1bedff501b7",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-02-20T17:49:26Z",
    "title_canon_sha256": "58ca94316949335c21db4792a681216bb1a96c1f2781187232e95607a0904f69"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2502.14768",
    "kind": "arxiv",
    "version": 1
  }
}