Distinguishing the Knowable from the Unknowable with Language Models

Benjamin L. Edelman; Boaz Barak; Gustaf Ahdritz; Nikhil Vyas; Tian Qin

arxiv: 2402.03563 · v2 · pith:W2D6BW3Unew · submitted 2024-02-05 · 💻 cs.LG · cs.AI· cs.CL

Distinguishing the Knowable from the Unknowable with Language Models

Gustaf Ahdritz , Tian Qin , Nikhil Vyas , Boaz Barak , Benjamin L. Edelman This is my paper

classification 💻 cs.LG cs.AIcs.CL

keywords modelsuncertaintylanguagelargerllmsmodelprobesreflecting

0 comments

read the original abstract

We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. In the absence of ground-truth probabilities, we explore a setting where, in order to (approximately) disentangle a given LLM's uncertainty, a significantly larger model stands in as a proxy for the ground truth. We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier
cs.LG 2026-06 unverdicted novelty 6.0

PROPEL amortizes solver evaluation with a trained activation probe to optimize task generators toward a target solve rate, raising the share of learnable tasks from ~10% to ~20% in coding and SWE experiments.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
cs.CL 2026-05 unverdicted novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via inte...
Disentangling Ambiguity from Instability in Large Language Models: A Clinical Text-to-SQL Case Study
cs.CL 2026-02 unverdicted novelty 6.0

CLUES decomposes semantic uncertainty into separate ambiguity and instability scores for clinical Text-to-SQL, with instability via Schur complement, outperforming Kernel Language Entropy on failure prediction while e...
Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal
cs.LG 2026-06 unverdicted novelty 5.0

A new pipeline uses interpretability to characterize concepts in preference data and shape rewards via feature or data interventions during LM post-training.