pith. sign in

arxiv: 2502.15845 · v2 · pith:MZFUZAAVnew · submitted 2025-02-20 · 💻 cs.CL · cs.AI

Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

classification 💻 cs.CL cs.AI
keywords detectionhallucinationmethodsself-consistencyblack-boxmodeloracleperformance
0
0 comments X
read the original abstract

Large Language Models (LLMs) often hallucinate, limiting their reliability in sensitive applications. In black-box settings, several self-consistency-based techniques have been proposed for hallucination detection. We empirically show that these methods perform nearly as well as a supervised (black-box) oracle, leaving limited room for further gains within this paradigm. To address this limitation, we explore cross-model consistency checking between the target model and an additional verifier LLM. With this extra information, we observe improved oracle performance compared to purely self-consistency-based methods. We then propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases. It dynamically switches between self-consistency and cross-consistency based on an uncertainty interval of the self-consistency classifier. We provide a geometric interpretation of consistency-based hallucination detection methods through the lens of kernel mean embeddings, offering deeper theoretical insights. Extensive experiments on QA-style hallucination detection benchmarks show that this approach maintains high detection performance while significantly reducing computational cost.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

    cs.CL 2026-05 unverdicted novelty 7.0

    Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for hallucination detection and novelty via standardized deviation.

  2. Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

    cs.CL 2026-05 unverdicted novelty 6.0

    Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for groundedness and novelty without model internals.

  3. Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation

    cs.CL 2026-04 unverdicted novelty 6.0

    SHADE adaptively combines coverage and spectral signals to estimate semantic alphabet size from few LLM samples, yielding better performance than baselines in low-sample regimes for alphabet estimation and QA error detection.

  4. Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification

    cs.AI 2026-04 unverdicted novelty 6.0

    Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.

  5. EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors

    cs.CV 2026-04 unverdicted novelty 5.0

    EnsemHalDet improves VLM hallucination detection by ensembling independent detectors trained on diverse internal states, yielding higher AUC than single-detector baselines across VQA datasets.

  6. EnsemHalDet: Robust VLM Hallucination Detection via Ensemble of Internal State Detectors

    cs.CV 2026-04 unverdicted novelty 4.0

    EnsemHalDet improves hallucination detection in VLMs by ensembling independent detectors on diverse internal states, yielding higher AUC than single-detector baselines on VQA datasets.