Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.
T:χ 2(1) = 391.82, p < .001 Threshold + Difficulty 9824.5 0.355 – Full (T + Confidence + Difficulty) 9544.4 0.374 vs
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Causal Evidence that Language Models use Confidence to Drive Behavior
Language models deploy multidimensional internal confidence representations and threshold-based policies to control abstention behavior, with causal support from activation steering experiments.