LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
Quantifying uncertainty in answers from any language model and enhancing their trustworthiness
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.
Clustered Self-Assessment groups sampled LLM responses into semantic clusters, presents clusters as multiple-choice options, and uses the LLM's assigned probabilities to those options as direct uncertainty estimates, outperforming entropy baselines with as few as two extra samples.
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.
Separate modality-specific reasoning before fusion reduces hallucinations and improves accuracy in audio-visual LLMs by enforcing isolated traces then integrating evidence.
citing papers explorer
-
Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?
LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
-
Quantifying Faithful Confidence Expression in Large Reasoning Models
A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.
-
Clustered Self-Assessment: A Simple yet Effective Method for Uncertainty Quantification in Large Language Models
Clustered Self-Assessment groups sampled LLM responses into semantic clusters, presents clusters as multiple-choice options, and uses the LLM's assigned probabilities to those options as direct uncertainty estimates, outperforming entropy baselines with as few as two extra samples.
-
Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.