Task calibration aligns LLM distributions in latent task spaces to make MBR decoding provably optimal and improve generation quality.
Conftuner: Training large language models to express their confidence verbally
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
A composite loss with Brier calibration, anchor regularization, contrastive alignment from 2x2 perturbations, and KL stabilization reduces calibration error by over 60% in medical VQA while preserving accuracy.
A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.
CoMet decomposes MLLM uncertainty into context-specific and multiplicity-specific terms estimated by a trained post-hoc module, improving performance on open-ended multimodal benchmarks and hallucination detection.
Fine-tuning Gemma 3 4B on unfiltered self-consistency targets produces a binary verbal correctness discriminator with AUROC 0.774 on TriviaQA, outperforming logit entropy after a modal-filtered pre-registration failed.
citing papers explorer
No citing papers match the current filters.