Global calibration metrics like ECE are confounded by accuracy; the proposed ACE framework with three accuracy-controlled views shows many prior calibration advantages weaken or reverse.
arXiv preprint arXiv:2311.15451 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Empirical study across multiple benchmarks finds the link between uncertainty estimators and LLM hallucinations is highly variable and often weak.
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
citing papers explorer
-
When Calibration Rankings Reverse: Accuracy-Controlled Evaluation for Fair Comparison of LLMs
Global calibration metrics like ECE are confounded by accuracy; the proposed ACE framework with three accuracy-controlled views shows many prior calibration advantages weaken or reverse.
-
Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination
Empirical study across multiple benchmarks finds the link between uncertainty estimators and LLM hallucinations is highly variable and often weak.
-
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.