pith. sign in

C ^2 gspg: Confidence-calibrated group sequence policy gradient towards self-aware reasoning, 2025 a

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 3

years

2026 3

roles

background 1

polarities

background 1

representative citing papers

Calibration-Aware Policy Optimization for Reasoning LLMs

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

CAPO improves LLM calibration by up to 15% while matching or exceeding GRPO accuracy through logistic AUC loss and noise masking, enabling better abstention and scaling performance.

citing papers explorer

Showing 3 of 3 citing papers.