{"paper":{"title":"LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Adaptive Conformal Semantic Entropy quantifies LLM prompt uncertainty by clustering responses according to semantic similarity and applies conformal calibration to bound error rates on accepted outputs.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Hamed Karimi, Reza Samavi, Vaishali Meyappan","submitted_at":"2026-05-05T20:56:11Z","abstract_excerpt":"LLMs' overconfidence, particularly when hallucinating, poses a significant challenge for the deployment of the models in safety-critical settings and makes a reliable estimation of uncertainty necessary. Existing approaches for uncertainty quantification typically prioritize lexical or probabilistic measures; however, these techniques often ignore the semantic variance of different responses with similar meaning. In this paper, we propose Adaptive Conformal Semantic Entropy (ACSE), a method for estimating prompt-level uncertainty by adaptively measuring semantic dispersion in LLMs outputs. Our"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our uncertainty scoring function is based on clustering semantic entropy of multiple diverse responses to the same prompt. The function adaptively adjusts the uncertainty score based on semantic features of each cluster. ... providing a finite-sample, distribution-free guarantee such that the error rate among the accepted responses remains bounded by a user-specified tolerance.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That clustering responses by semantic similarity reliably captures meaningful dispersion in model knowledge and that the adaptive adjustment based on cluster features produces a valid uncertainty score without introducing bias or requiring post-hoc tuning that violates the conformal guarantees.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ACSE estimates LLM prompt uncertainty via adaptive clustering of semantic entropy across multiple responses and uses conformal prediction to bound error rates on accepted answers with distribution-free guarantees.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Adaptive Conformal Semantic Entropy quantifies LLM prompt uncertainty by clustering responses according to semantic similarity and applies conformal calibration to bound error rates on accepted outputs.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"2ea9eb24b147b30bac14a425fef83e7b2bdee56acff2f8c70beb1bf7b5ff1a58"},"source":{"id":"2605.04295","kind":"arxiv","version":2},"verdict":{"id":"26603bbd-2f62-4cf8-9aad-f542529a5d30","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T17:48:05.790303Z","strongest_claim":"Our uncertainty scoring function is based on clustering semantic entropy of multiple diverse responses to the same prompt. The function adaptively adjusts the uncertainty score based on semantic features of each cluster. ... providing a finite-sample, distribution-free guarantee such that the error rate among the accepted responses remains bounded by a user-specified tolerance.","one_line_summary":"ACSE estimates LLM prompt uncertainty via adaptive clustering of semantic entropy across multiple responses and uses conformal prediction to bound error rates on accepted answers with distribution-free guarantees.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That clustering responses by semantic similarity reliably captures meaningful dispersion in model knowledge and that the adaptive adjustment based on cluster features produces a valid uncertainty score without introducing bias or requiring post-hoc tuning that violates the conformal guarantees.","pith_extraction_headline":"Adaptive Conformal Semantic Entropy quantifies LLM prompt uncertainty by clustering responses according to semantic similarity and applies conformal calibration to bound error rates on accepted outputs."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.04295/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-20T12:35:19.667616Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T23:31:20.672350Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T14:32:35.719848Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"cd5123646d9c510e7d951cfaabbbf08ad26945870b69107dc341932306f5463e"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":3,"snapshot_sha256":"e263f2cdc4a551577e3289d2c6be738e8450aacc8f64b5ba382cc1862f6b63e2"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}