Conftuner: Training large language models to express their confidence verbally

Yibo Li, Miao Xiong, Jiaying Wu, Bryan Hooi · 2025 · arXiv 2508.18847

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Task-Aware Calibration: Provably Optimal Decoding in LLMs

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Task calibration aligns LLM distributions in latent task spaces to make MBR decoding provably optimal and improve generation quality.

Just how sure are you? Improving Verbalized Uncertainty Calibration in Medical VQA

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

A composite loss with Brier calibration, anchor regularization, contrastive alignment from 2x2 perturbations, and KL stabilization reduces calibration error by over 60% in medical VQA while preserving accuracy.

CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation

cs.LG · 2026-06-30 · unverdicted · novelty 5.0

CoMet decomposes MLLM uncertainty into context-specific and multiplicity-specific terms estimated by a trained post-hoc module, improving performance on open-ended multimodal benchmarks and hallucination detection.

Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B

cs.CL · 2026-04-27 · conditional · novelty 5.0

Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.

citing papers explorer

Showing 4 of 4 citing papers.

Task-Aware Calibration: Provably Optimal Decoding in LLMs cs.LG · 2026-05-11 · unverdicted · none · ref 30
Task calibration aligns LLM distributions in latent task spaces to make MBR decoding provably optimal and improve generation quality.
Just how sure are you? Improving Verbalized Uncertainty Calibration in Medical VQA cs.LG · 2026-06-25 · unverdicted · none · ref 5
A composite loss with Brier calibration, anchor regularization, contrastive alignment from 2x2 perturbations, and KL stabilization reduces calibration error by over 60% in medical VQA while preserving accuracy.
CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation cs.LG · 2026-06-30 · unverdicted · none · ref 53
CoMet decomposes MLLM uncertainty into context-specific and multiplicity-specific terms estimated by a trained post-hoc module, improving performance on open-ended multimodal benchmarks and hallucination detection.
Distilling Self-Consistency into Verbal Confidence: A Pre-Registered Negative Result and Post-Hoc Rescue on Gemma 3 4B cs.CL · 2026-04-27 · conditional · none · ref 10
Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.

Conftuner: Training large language models to express their confidence verbally

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer