LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
Quantifying uncertainty in answers from any language model and enhancing their trustworthiness
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.
A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.
Clustered Self-Assessment groups sampled LLM responses into semantic clusters, presents clusters as multiple-choice options, and uses the LLM's assigned probabilities to those options as direct uncertainty estimates, outperforming entropy baselines with as few as two extra samples.
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.
Separate modality-specific reasoning before fusion reduces hallucinations and improves accuracy in audio-visual LLMs by enforcing isolated traces then integrating evidence.
citing papers explorer
-
Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?
LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
-
Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs
RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.
-
Quantifying Faithful Confidence Expression in Large Reasoning Models
A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.
-
Clustered Self-Assessment: A Simple yet Effective Method for Uncertainty Quantification in Large Language Models
Clustered Self-Assessment groups sampled LLM responses into semantic clusters, presents clusters as multiple-choice options, and uses the LLM's assigned probabilities to those options as direct uncertainty estimates, outperforming entropy baselines with as few as two extra samples.
-
Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis
DPUA is a two-phase framework that aligns LLM uncertainty expressions with human disagreement distributions in subjectivity analysis while preserving task performance.
-
Separate First, Fuse Later: Mitigating Cross-Modal Interference in Audio-Visual LLMs Reasoning with Modality-Specific Chain-of-Thought
Separate modality-specific reasoning before fusion reduces hallucinations and improves accuracy in audio-visual LLMs by enforcing isolated traces then integrating evidence.