pith:XP5QRJWO
A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning
Language models can improve their reasoning by training on responses they generate themselves.
arxiv:2510.18814 v3 · 2025-10-21 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XP5QRJWORXVGYL5LSNZ3LPVQYR}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that alternates between self-generation and training on self-generated responses. Across six math reasoning benchmarks, SePT improves a strong no-training baseline... and in some settings can even approach the performance of Reinforcement Learning with Verifiable Rewards (RLVR).
That self-generated responses supply a net positive training signal rather than reinforcing the model's existing errors or hallucinations, which must hold for the iterative self-training loop to produce sustained gains without external verification or filtering.
SePT enables LLMs to improve math reasoning on multiple benchmarks by iteratively training on their own low-temperature generated responses using an online data refresh mechanism.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-20T00:00:26.365399Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
bbfb08a6ce8dea6c2fab9373b5beb0c471ac5638b82199949a8de7ac157dbfa3
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XP5QRJWORXVGYL5LSNZ3LPVQYR \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bbfb08a6ce8dea6c2fab9373b5beb0c471ac5638b82199949a8de7ac157dbfa3
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "a2a6d74918e4bf28b16bc80f1d5e27a016ac55827482ebac9d171647432ca76e",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2025-10-21T17:15:56Z",
"title_canon_sha256": "3da41121e66a06f632b807a7bacb8a67fc88c090f8e70453dc0c1151ff1a4d99"
},
"schema_version": "1.0",
"source": {
"id": "2510.18814",
"kind": "arxiv",
"version": 3
}
}