Task calibration aligns LLM distributions in latent task spaces to make MBR decoding provably optimal and improve generation quality.
Conftuner: Training large language models to express their confidence verbally
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2roles
background 1polarities
background 1representative citing papers
Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.
citing papers explorer
-
Task-Aware Calibration: Provably Optimal Decoding in LLMs
Task calibration aligns LLM distributions in latent task spaces to make MBR decoding provably optimal and improve generation quality.
-
Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B
Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.