Do not design, learn: A trainable scoring function for uncertainty estimation in generative llms

Duygu Nur Yaldiz, Yavuz Faruk Bakman, Baturalp Buyukates, Chenyang Tao, Anil Ramakrishna, Dimitrios Dimitriadis, Jieyu Zhao, Salman Avestimehr · 2024 · arXiv 2406.11278

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.

Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure

cs.LG · 2024-12-19 · unverdicted · novelty 5.0

Negative log-likelihood of the greedy-decoded most likely sequence (G-NLL) is a principled single-sequence uncertainty measure for LLMs that achieves state-of-the-art results.

citing papers explorer

Showing 2 of 2 citing papers.

Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning cs.CL · 2026-04-10 · unverdicted · none · ref 43
Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.
Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure cs.LG · 2024-12-19 · unverdicted · none · ref 22
Negative log-likelihood of the greedy-decoded most likely sequence (G-NLL) is a principled single-sequence uncertainty measure for LLMs that achieves state-of-the-art results.

Do not design, learn: A trainable scoring function for uncertainty estimation in generative llms

fields

years

verdicts

representative citing papers

citing papers explorer