arXiv preprint arXiv:2311.15451 , year=

Uncertainty-aware Language Modeling for Selective Question Answering , year = · 2023 · arXiv 2311.15451

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

When Calibration Rankings Reverse: Accuracy-Controlled Evaluation for Fair Comparison of LLMs

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

Global calibration metrics like ECE are confounded by accuracy; the proposed ACE framework with three accuracy-controlled views shows many prior calibration advantages weaken or reverse.

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

cs.CL · 2026-05-26 · unverdicted · novelty 5.0

Empirical study across multiple benchmarks finds the link between uncertainty estimators and LLM hallucinations is highly variable and often weak.

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.

citing papers explorer

Showing 3 of 3 citing papers after filters.

When Calibration Rankings Reverse: Accuracy-Controlled Evaluation for Fair Comparison of LLMs cs.CL · 2026-06-29 · unverdicted · none · ref 68
Global calibration metrics like ECE are confounded by accuracy; the proposed ACE framework with three accuracy-controlled views shows many prior calibration advantages weaken or reverse.
Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination cs.CL · 2026-05-26 · unverdicted · none · ref 55
Empirical study across multiple benchmarks finds the link between uncertainty estimators and LLM hallucinations is highly variable and often weak.
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering cs.CL · 2026-05-19 · unverdicted · none · ref 110
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.

arXiv preprint arXiv:2311.15451 , year=

fields

years

verdicts

representative citing papers

citing papers explorer