pith:7L2HV45W
Confidence Estimation for LLMs in Multi-turn Interactions
A new logit probe called P(Sufficient) tracks how LLMs accumulate evidence across conversation turns while staying calibrated.
arxiv:2601.02179 v2 · 2026-01-05 · cs.CL
Record completeness
Claims
a novel logit-based probe we introduce, P(Sufficient), proves comparatively more effective, robustly tracking evidence accumulation and distinguishing it from conversational filler.
That the two key desiderata (per-turn calibration and monotonicity of confidence as information accumulates) are sufficient to evaluate and improve confidence estimation in multi-turn settings, and that the Hinter-Guesser paradigm produces datasets representative of real ambiguity resolution.
The work establishes a framework for multi-turn LLM confidence estimation using per-turn calibration and monotonicity, with a new P(Sufficient) probe outperforming standard methods on controlled datasets.
Formal links
Receipt and verification
| First computed | 2026-05-17T23:39:16.731029Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
faf47af3b67613152a147b01c22633e81da38582754f5c81d8878d874146e835
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7L2HV45WOYJRKKQUPMA4EJRT5A \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: faf47af3b67613152a147b01c22633e81da38582754f5c81d8878d874146e835
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "a5595beac2306f40c8e2e041591e93a866d427e323b9755d5f91967658ed55b6",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-01-05T14:58:04Z",
"title_canon_sha256": "7a81e3c4961daf7a2b103063262a54ca25677e116d5a4dd2d0a0ed356b7001cb"
},
"schema_version": "1.0",
"source": {
"id": "2601.02179",
"kind": "arxiv",
"version": 2
}
}