pith:HOUUIAHL
Correct Answers from Sound Reasoning: Verifiable Process Supervision for Language Models
Verifiable process supervision lets language models keep sound reasoning while achieving accurate answers, unlike accuracy-only reinforcement learning which trades reasoning quality for performance.
arxiv:2605.12519 v1 · 2026-04-03 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HOUUIAHLWSP7HYOLD3RRG2DJ5M}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
While accuracy-only RL improves move accuracy, it sharply degrades reasoning quality, increasing win-rate error by up to 112% and reducing internal consistency by up to 69%. In contrast, VPS preserves accuracy while significantly improving reasoning quality, reducing win-rate error by up to 30% and restoring consistency to near saturation.
That syntactic extraction of intermediate claims from the structured reasoning format will reliably produce evaluable steps that can be verified against ground-truth signals without introducing extraction errors or missing context.
Verifiable process supervision trains language models to produce accurate answers with sound, verifiable reasoning steps, outperforming accuracy-only reinforcement learning on chess by preserving accuracy while reducing reasoning errors.
References
Receipt and verification
| First computed | 2026-05-18T03:10:02.883015Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
3ba94400ebb49ff3e1cb1ee3136869eb237344f27f26b776e9424136433039e8
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HOUUIAHLWSP7HYOLD3RRG2DJ5M \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3ba94400ebb49ff3e1cb1ee3136869eb237344f27f26b776e9424136433039e8
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "68b16c46458b620ed07d9079ad8e6553e60cbe250a2c2fdf8aefda6c1428b8eb",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-04-03T15:19:46Z",
"title_canon_sha256": "e6de7b1ddb6c657d2ef4bca1b9644a204dcef7e60e77f74c1e2d2534c24f645b"
},
"schema_version": "1.0",
"source": {
"id": "2605.12519",
"kind": "arxiv",
"version": 1
}
}