LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
On the probability–quality paradox in language generation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.
citing papers explorer
-
Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?
LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
-
Can Reasoning Models Detect Changes to their Chains of Thought?
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.