pith. sign in
Pith Number

pith:K23IIK52

pith:2026:K23IIK52XR4XTZBXFHFTYLIWDJ
not attested not anchored not stored refs resolved

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

Boxi Cao, Hongyu Lin, Jie Lou, Le Sun, Mengjie Ren, Xianpei Han, Xing Yu, Xueru Wen, Yaojie Lu

CIPO turns failed LLM trajectories into correction signals to boost reasoning over standard RLVR.

arxiv:2605.14539 v1 · 2026-05-14 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{K23IIK52XR4XTZBXFHFTYLIWDJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CIPO consistently and significantly outperforms strong baselines in both reasoning and correction performance. Moreover, CIPO yields stronger pass@K gains, indicating that it improves the model's intrinsic reasoning capacity rather than merely redistributing probability mass over existing correct answers.

C2weakest assumption

That correction samples derived from on-policy failed trajectories supply net-positive supervision without introducing harmful noise or distribution shift that would degrade overall policy performance.

C3one line summary

CIPO jointly optimizes standard RLVR rewards with correction samples derived from the model's own failed attempts, yielding better reasoning and self-correction on math and code benchmarks.

References

47 extracted · 47 resolved · 9 Pith anchors

[1] OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, 2024
[2] Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li 2025
[3] Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxian 2025
[4] Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in ope 2024
[5] Reinforcement Learning via Self-Distillation 2026 · arXiv:2601.20802

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:05.847782Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

56b6842bbabc7979e43729cb3c2d161a48f64b1d54720bbcd63b14c749cae2a5

Aliases

arxiv: 2605.14539 · arxiv_version: 2605.14539v1 · doi: 10.48550/arxiv.2605.14539 · pith_short_12: K23IIK52XR4X · pith_short_16: K23IIK52XR4XTZBX · pith_short_8: K23IIK52
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/K23IIK52XR4XTZBXFHFTYLIWDJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 56b6842bbabc7979e43729cb3c2d161a48f64b1d54720bbcd63b14c749cae2a5
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "40aa3402e4bfa600afd95ae59ac2cae7e25c8c6d54bfbfd70ea2869630467578",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-14T08:22:21Z",
    "title_canon_sha256": "968d10feccf4a4b3c822fcf703350664781297d87189e9257cc76965a348f1e2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14539",
    "kind": "arxiv",
    "version": 1
  }
}