pith:DHNPYQ5G
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
One training example via reinforcement learning lifts an LLM's math reasoning score from 36% to 74% on MATH500.
arxiv:2504.20571 v3 · 2025-04-29 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DHNPYQ5G2DWUV3T2BAPT6HOBNY}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the math reasoning capabilities of large language models (LLMs). Applying RLVR to the base model Qwen2.5-Math-1.5B, we identify a single example that elevates model performance on MATH500 from 36.0% to 73.6%.
That the single chosen example is not specially selected in a way that inflates generalization, and that the observed gains arise from the RL policy gradient rather than incidental effects of the training setup or prompt format.
One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:50.357211Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
19dafc43a6d0ed4aee7a081f3f1dc16e2690bce9e729cb67e4cf0fedc999d5d7
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DHNPYQ5G2DWUV3T2BAPT6HOBNY \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 19dafc43a6d0ed4aee7a081f3f1dc16e2690bce9e729cb67e4cf0fedc999d5d7
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "6438bc3f9b0ad57edc0c56bf97ba63f2ed3cd3b4b78e640459a294c532011eea",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-04-29T09:24:30Z",
"title_canon_sha256": "a3ee8a00abb8653ecd09efee5e6e3dff17022749f7ceb3f9a19d8760fc0ac677"
},
"schema_version": "1.0",
"source": {
"id": "2504.20571",
"kind": "arxiv",
"version": 3
}
}