Generalization of Fine-Tuned Uncertainty Communication and Metacognition in Large Language Models

Catarina Belem; Mark Steyvers; Padhraic Smyth

arxiv: 2510.05126 · v3 · pith:PDMZDH2Fnew · submitted 2025-09-30 · 💻 cs.CL · cs.AI

Generalization of Fine-Tuned Uncertainty Communication and Metacognition in Large Language Models

Mark Steyvers , Catarina Belem , Padhraic Smyth This is my paper

classification 💻 cs.CL cs.AI

keywords confidencemodelsanswercommunicationdomainsfine-tuningtaskstraining

0 comments

read the original abstract

Background. Large language models are increasingly used in settings where confident but incorrect answers can mislead users. Reliable uncertainty communication requires a form of metacognition: monitoring when one's own answers are likely to be correct. Yet models' stated confidence is often poorly aligned with answer correctness. We test whether supervised fine-tuning improves uncertainty communication and whether gains transfer across domains and task formats. Methods. We fine-tuned two models on general knowledge, mathematics, and open-ended trivia questions. We evaluated single-question confidence estimation, in which the model reports numeric confidence for one answer, and pairwise confidence comparison, in which it chooses which of two questions it is more likely to answer correctly. We tested held-out questions from training domains and new medical, legal, and truthfulness benchmarks. We assessed calibration, discrimination, and answer accuracy before and after fine-tuning. Results. Here we show that fine-tuning improves alignment between stated confidence and observed accuracy and increases the model's ability to assign higher confidence to correct than to incorrect answers. Gains occur within training domains and, to a lesser extent, in new domains. However, single-task training does not reliably transfer between single-question confidence estimation and pairwise confidence comparison. Multitask fine-tuning produces broader gains in the models and tasks studied here. Conclusions. Uncertainty communication in large language models is trainable, but transfer across metacognitive tasks is limited. Joint training on multiple confidence tasks may support broader generalization, although further tests across model families and metacognitive tasks are needed.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

How do LLMs Compute Verbal Confidence
cs.CL 2026-03 unverdicted novelty 6.0

Mechanistic experiments on Gemma 3 27B, Qwen 2.5 7B and Magistral Small 24B show verbal confidence is cached at post-answer positions from answer tokens and captures richer answer-quality information beyond token log-...
Measuring the metacognition of AI
cs.AI 2026-03 unverdicted novelty 5.0

Meta-d' and signal detection theory provide quantitative tools to assess metacognitive sensitivity and risk-based regulation in large language models.