Enhancing Hallucination Detection through Noise Injection

Apratim Bhattacharyya; Litian Liu; Reza Pourreza; Roland Memisevic; Sunny Panchal; Yao Qin; Yubing Jian

arxiv: 2502.03799 · v4 · pith:ZUUW7TRJnew · submitted 2025-02-06 · 💻 cs.CL · cs.SY· eess.SY

Enhancing Hallucination Detection through Noise Injection

Litian Liu , Reza Pourreza , Sunny Panchal , Apratim Bhattacharyya , Yubing Jian , Yao Qin , Roland Memisevic This is my paper

classification 💻 cs.CL cs.SYeess.SY

keywords modelhallucinationsdetectionuncertaintyapproachdetectinghallucinationllms

0 comments

read the original abstract

Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has linked hallucinations to model uncertainty, suggesting that hallucinations can be detected by measuring dispersion over answer distributions obtained from multiple samples drawn from a model. While drawing from the distribution over tokens defined by the model is a natural way to obtain samples, in this work, we argue that it is suboptimal for the purpose of detecting hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. To this end, we propose a very simple, training-free approach based on perturbing an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling. We demonstrate that our approach significantly improves inference-time hallucination detection over standard sampling across diverse datasets, model architectures, and uncertainty metrics.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
cs.AI 2026-04 unverdicted novelty 6.0

Cross-model semantic disagreement adds an epistemic uncertainty term that improves total uncertainty estimation over self-consistency alone, helping flag confident errors in LLMs.