pith. sign in
Pith Number

pith:XP5QRJWO

pith:2025:XP5QRJWORXVGYL5LSNZ3LPVQYR
not attested not anchored not stored refs pending

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

Anthony Man-Cho So, Lei Zhao, Mengqi Li, Ruoyu Sun, Xiao Li

Language models can improve their reasoning by training on responses they generate themselves.

arxiv:2510.18814 v3 · 2025-10-21 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XP5QRJWORXVGYL5LSNZ3LPVQYR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that alternates between self-generation and training on self-generated responses. Across six math reasoning benchmarks, SePT improves a strong no-training baseline... and in some settings can even approach the performance of Reinforcement Learning with Verifiable Rewards (RLVR).

C2weakest assumption

That self-generated responses supply a net positive training signal rather than reinforcing the model's existing errors or hallucinations, which must hold for the iterative self-training loop to produce sustained gains without external verification or filtering.

C3one line summary

SePT enables LLMs to improve math reasoning on multiple benchmarks by iteratively training on their own low-temperature generated responses using an online data refresh mechanism.

Formal links

2 machine-checked theorem links

Cited by

2 papers in Pith

Receipt and verification
First computed 2026-05-20T00:00:26.365399Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

bbfb08a6ce8dea6c2fab9373b5beb0c471ac5638b82199949a8de7ac157dbfa3

Aliases

arxiv: 2510.18814 · arxiv_version: 2510.18814v3 · doi: 10.48550/arxiv.2510.18814 · pith_short_12: XP5QRJWORXVG · pith_short_16: XP5QRJWORXVGYL5L · pith_short_8: XP5QRJWO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XP5QRJWORXVGYL5LSNZ3LPVQYR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bbfb08a6ce8dea6c2fab9373b5beb0c471ac5638b82199949a8de7ac157dbfa3
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a2a6d74918e4bf28b16bc80f1d5e27a016ac55827482ebac9d171647432ca76e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-10-21T17:15:56Z",
    "title_canon_sha256": "3da41121e66a06f632b807a7bacb8a67fc88c090f8e70453dc0c1151ff1a4d99"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.18814",
    "kind": "arxiv",
    "version": 3
  }
}