RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
Calibration of pre-trained transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Uncertainty-aware fine-tuning with a decision-theory-based loss produces better-calibrated uncertainty estimates than standard training on free-form QA tasks.
citing papers explorer
-
Process Supervision of Confidence Margin for Calibrated LLM Reasoning
RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.
-
Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-Tuning
Uncertainty-aware fine-tuning with a decision-theory-based loss produces better-calibrated uncertainty estimates than standard training on free-form QA tasks.