pith. sign in

super hub Canonical reference

Language Models (Mostly) Know What They Know

Canonical reference. 74% of citing Pith papers cite this work as background.

308 Pith papers citing it
Background 74% of classified citations
abstract

We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and in the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing.

hub tools

citation-role summary

background 34 method 3 baseline 2

citation-polarity summary

claims ledger

  • abstract We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at sel

authors

co-cited works

clear filters

representative citing papers

Zero-Shot Active Feature Acquisition via LLM-Elicitation

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

A framework elicits discriminative MRF statistics from an LLM and closes the model via maximum entropy to enable zero-shot active feature acquisition, outperforming baselines on IBD patient data especially for hardest cases.

Forecasting Future Behavior as a Learning Task

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.

Can LLM Rerankers Predict Their Own Ranking Performance?

cs.IR · 2026-06-02 · unverdicted · novelty 7.0

LLM rerankers can internally predict ranking quality via self-consistency of sampled outputs, matching SOTA external QPP while direct confidence is overconfident; supervised token-efficient methods improve calibration.

DECK: A Consistency x Confidence Taxonomy of LLM Hallucinations

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

The DECK taxonomy partitions LLM hallucinations into four detectability regimes using consistency and confidence axes, mapping each to scorer families and identifying a universal blind spot for output-level uncertainty quantification on knowledge-gap inputs.

Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization

cs.AI · 2026-06-01 · unverdicted · novelty 7.0

Dynamic reputation updates per objective-expert pair plus a three-arm counterfactual gate improve robustness over fixed LLM priors on synthetic tests and molecule benchmarks, but raw LLM confidence is not reliably helpful.

citing papers explorer

Showing 4 of 4 citing papers after filters.