Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

Baharan Mirzasoleiman; Kristjan Greenewald; Yihao Xue; Youssef Mroueh

arxiv: 2502.15845 · v2 · pith:MZFUZAAVnew · submitted 2025-02-20 · 💻 cs.CL · cs.AI

Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

Yihao Xue , Kristjan Greenewald , Youssef Mroueh , Baharan Mirzasoleiman This is my paper

classification 💻 cs.CL cs.AI

keywords detectionhallucinationmethodsself-consistencyblack-boxmodeloracleperformance

0 comments

read the original abstract

Large Language Models (LLMs) often hallucinate, limiting their reliability in sensitive applications. In black-box settings, several self-consistency-based techniques have been proposed for hallucination detection. We empirically show that these methods perform nearly as well as a supervised (black-box) oracle, leaving limited room for further gains within this paradigm. To address this limitation, we explore cross-model consistency checking between the target model and an additional verifier LLM. With this extra information, we observe improved oracle performance compared to purely self-consistency-based methods. We then propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases. It dynamically switches between self-consistency and cross-consistency based on an uncertainty interval of the self-consistency classifier. We provide a geometric interpretation of consistency-based hallucination detection methods through the lens of kernel mean embeddings, offering deeper theoretical insights. Extensive experiments on QA-style hallucination detection benchmarks show that this approach maintains high detection performance while significantly reducing computational cost.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
cs.CL 2026-05 unverdicted novelty 7.0

Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for hallucination detection and novelty via standardized deviation.
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
cs.CL 2026-05 unverdicted novelty 6.0

Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for groundedness and novelty without model internals.
Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation
cs.CL 2026-04 unverdicted novelty 6.0

SHADE adaptively combines coverage and spectral signals to estimate semantic alphabet size from few LLM samples, yielding better performance than baselines in low-sample regimes for alphabet estimation and QA error detection.
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
cs.AI 2026-04 unverdicted novelty 6.0

Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.
EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors
cs.CV 2026-04 unverdicted novelty 5.0

EnsemHalDet improves VLM hallucination detection by ensembling independent detectors trained on diverse internal states, yielding higher AUC than single-detector baselines across VQA datasets.
EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors
cs.CV 2026-04 unverdicted novelty 4.0

EnsemHalDet improves hallucination detection in VLMs by ensembling independent detectors on diverse internal states, yielding higher AUC than single-detector baselines on VQA datasets.