pith:K23IIK52
Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards
CIPO turns failed LLM trajectories into correction signals to boost reasoning over standard RLVR.
arxiv:2605.14539 v1 · 2026-05-14 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{K23IIK52XR4XTZBXFHFTYLIWDJ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
CIPO consistently and significantly outperforms strong baselines in both reasoning and correction performance. Moreover, CIPO yields stronger pass@K gains, indicating that it improves the model's intrinsic reasoning capacity rather than merely redistributing probability mass over existing correct answers.
That correction samples derived from on-policy failed trajectories supply net-positive supervision without introducing harmful noise or distribution shift that would degrade overall policy performance.
CIPO jointly optimizes standard RLVR rewards with correction samples derived from the model's own failed attempts, yielding better reasoning and self-correction on math and code benchmarks.
References
Formal links
Receipt and verification
| First computed | 2026-05-17T23:39:05.847782Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
56b6842bbabc7979e43729cb3c2d161a48f64b1d54720bbcd63b14c749cae2a5
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/K23IIK52XR4XTZBXFHFTYLIWDJ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 56b6842bbabc7979e43729cb3c2d161a48f64b1d54720bbcd63b14c749cae2a5
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "40aa3402e4bfa600afd95ae59ac2cae7e25c8c6d54bfbfd70ea2869630467578",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-14T08:22:21Z",
"title_canon_sha256": "968d10feccf4a4b3c822fcf703350664781297d87189e9257cc76965a348f1e2"
},
"schema_version": "1.0",
"source": {
"id": "2605.14539",
"kind": "arxiv",
"version": 1
}
}