pith:B232T5UT
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards
A gradient conflict between accuracy and calibration in RLVR is resolved by decoupling the objectives in DCPO.
arxiv:2603.09117 v3 · 2026-03-10 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{B232T5UT6PPUFSQAU2JAKTAO6N}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our theoretical analysis demonstrates that there exists a fundamental gradient conflict between the optimization for maximizing policy accuracy and minimizing calibration error. DCPO not only preserves accuracy on par with GRPO but also achieves the best calibration performance and substantially mitigates the over-confidence issue.
That the proposed decoupling in DCPO can be implemented without introducing new optimization instabilities or unintended side effects on other model behaviors.
DCPO decouples reasoning optimization from calibration in RLVR to fix overconfidence in LLMs without losing accuracy.
Cited by
Receipt and verification
| First computed | 2026-05-28T01:04:38.117064Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0eb7a9f693f3df42ca00a692054c0ef36363aa2efa9de35dcea50a62f4ad2e40
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/B232T5UT6PPUFSQAU2JAKTAO6N \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0eb7a9f693f3df42ca00a692054c0ef36363aa2efa9de35dcea50a62f4ad2e40
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "8c44414232b4e60e063d3c0d487e7cb4665ee0263df0305a43a74d82b447ca24",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-03-10T02:47:59Z",
"title_canon_sha256": "c37876e30466eadc3601baea3bd899d61a4245cc748545d8b2e24b06d8c3cc91"
},
"schema_version": "1.0",
"source": {
"id": "2603.09117",
"kind": "arxiv",
"version": 3
}
}