Pith Number
pith:Q7OXT2D7
pith:2025:Q7OXT2D7TV2U5BL2A5ITHAAZGE
not attested
not anchored
not stored
refs resolved
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Rule-based RL on 5K logic puzzles induces reflection and verification in a 7B model that transfers to AIME and AMC.
arxiv:2502.14768 v1 · 2025-02-20 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Q7OXT2D7TV2U5BL2A5ITHAAZGE}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
1
Bitcoin timestamp
2
Internet Archive
3
Author claim
· sign in to
claim
4
Citations
5
Replications
✓
Portable graph bundle live · download bundle · merged
state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same
current state with the deterministic merge algorithm.
Claims
C1strongest claim
after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.
C2weakest assumption
That the advanced reasoning behaviors (reflection, verification, summarization) are induced by the RL process rather than already latent in the base 7B model or triggered by the system prompt alone.
C3one line summary
Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.
References
[1] Le, Sergey Levine, and Yi Ma
[2] Training verifiers to solve math word problems
[3] DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Sha
[4] Alphazero-like tree-search can guide large language model decoding and training, 2024
[5] Omni-math: A universal olympiad level mathematic benchmark for large language models, 2024
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:46.595890Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
87dd79e87f9d754e857a0751338019311f6aa80cf62ca2dad3f15188522b86b2
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Q7OXT2D7TV2U5BL2A5ITHAAZGE \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 87dd79e87f9d754e857a0751338019311f6aa80cf62ca2dad3f15188522b86b2
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "99e9eb6e20d4e54ae62f41a827a7d314d315ad5ae79695f60a26a1bedff501b7",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-02-20T17:49:26Z",
"title_canon_sha256": "58ca94316949335c21db4792a681216bb1a96c1f2781187232e95607a0904f69"
},
"schema_version": "1.0",
"source": {
"id": "2502.14768",
"kind": "arxiv",
"version": 1
}
}