C ^2 gspg: Confidence-calibrated group sequence policy gradient towards self-aware reasoning, 2025 a

Liu, H · 2025 · arXiv 2509.23129

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

cs.LG · 2026-04-25 · unverdicted · novelty 6.0

RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.

Calibration-Aware Policy Optimization for Reasoning LLMs

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

CAPO improves LLM calibration by up to 15% while matching or exceeding GRPO accuracy through logistic AUC loss and noise masking, enabling better abstention and scaling performance.

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

cs.LG · 2026-03-10

citing papers explorer

Showing 3 of 3 citing papers.

Process Supervision of Confidence Margin for Calibrated LLM Reasoning cs.LG · 2026-04-25 · unverdicted · none · ref 44
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
Calibration-Aware Policy Optimization for Reasoning LLMs cs.LG · 2026-04-14 · unverdicted · none · ref 19
CAPO improves LLM calibration by up to 15% while matching or exceeding GRPO accuracy through logistic AUC loss and noise masking, enabling better abstention and scaling performance.
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards cs.LG · 2026-03-10 · unreviewed · ref 16

C ^2 gspg: Confidence-calibrated group sequence policy gradient towards self-aware reasoning, 2025 a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer