pith. sign in
Pith Number

pith:B232T5UT

pith:2026:B232T5UT6PPUFSQAU2JAKTAO6N
not attested not anchored not stored refs pending

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Boxi Cao, Hongyu Lin, Jinglin Yang, Le Sun, Min He, Xianpei Han, Xueru Wen, Yaojie Lu, Zhengzhao Ma

A gradient conflict between accuracy and calibration in RLVR is resolved by decoupling the objectives in DCPO.

arxiv:2603.09117 v3 · 2026-03-10 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{B232T5UT6PPUFSQAU2JAKTAO6N}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our theoretical analysis demonstrates that there exists a fundamental gradient conflict between the optimization for maximizing policy accuracy and minimizing calibration error. DCPO not only preserves accuracy on par with GRPO but also achieves the best calibration performance and substantially mitigates the over-confidence issue.

C2weakest assumption

That the proposed decoupling in DCPO can be implemented without introducing new optimization instabilities or unintended side effects on other model behaviors.

C3one line summary

DCPO decouples reasoning optimization from calibration in RLVR to fix overconfidence in LLMs without losing accuracy.

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-28T01:04:38.117064Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0eb7a9f693f3df42ca00a692054c0ef36363aa2efa9de35dcea50a62f4ad2e40

Aliases

arxiv: 2603.09117 · arxiv_version: 2603.09117v3 · doi: 10.48550/arxiv.2603.09117 · pith_short_12: B232T5UT6PPU · pith_short_16: B232T5UT6PPUFSQA · pith_short_8: B232T5UT
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/B232T5UT6PPUFSQAU2JAKTAO6N \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0eb7a9f693f3df42ca00a692054c0ef36363aa2efa9de35dcea50a62f4ad2e40
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "8c44414232b4e60e063d3c0d487e7cb4665ee0263df0305a43a74d82b447ca24",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-10T02:47:59Z",
    "title_canon_sha256": "c37876e30466eadc3601baea3bd899d61a4245cc748545d8b2e24b06d8c3cc91"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.09117",
    "kind": "arxiv",
    "version": 3
  }
}