Efficient Hallucination Detection for LLMs Using Uncertainty-Aware Attention Heads

Alexander Panchenko; Artem Shelmanov; Artem Vazhentsev; Ekaterina Fadeeva; Gleb Kuzmin; Ivan Lazichny; Lyudmila Rvanova; Maxim Panov; Mrinmaya Sachan; Preslav Nakov

arxiv: 2505.20045 · v3 · pith:FZVJXHJBnew · submitted 2025-05-26 · 💻 cs.CL

Efficient Hallucination Detection for LLMs Using Uncertainty-Aware Attention Heads

Artem Vazhentsev , Lyudmila Rvanova , Gleb Kuzmin , Ekaterina Fadeeva , Ivan Lazichny , Alexander Panchenko , Maxim Panov , Mrinmaya Sachan

show 3 more authors

Preslav Nakov Timothy Baldwin Artem Shelmanov

This is my paper

classification 💻 cs.CL

keywords attentionllmsrauqheadsuncertaintydetectionefficienthallucination

0 comments

read the original abstract

While large language models (LLMs) have become highly capable, they remain prone to factual inaccuracies, commonly referred to as "hallucinations." Uncertainty quantification (UQ) offers a promising way to mitigate this issue, but most existing methods are computationally intensive and/or require supervision. In this work, we propose Recurrent Attention-based Uncertainty Quantification (RAUQ), an unsupervised and efficient framework for identifying hallucinations. The method leverages an observation about transformer attention behavior: when incorrect information is generated, certain "uncertainty-aware" attention heads tend to reduce their focus on preceding tokens. RAUQ automatically detects these attention heads and combines their activation patterns with token-level confidence measures in a recurrent scheme, producing a sequence-level uncertainty estimate in just a single forward pass. Through experiments on twelve datasets spanning question answering, summarization, and translation across nine different LLMs, we show that RAUQ consistently outperforms state-of-the-art UQ baselines. Importantly, it incurs minimal overhead, requiring less than 1\% additional computation. Since it requires neither labeled data nor extensive parameter tuning, RAUQ serves as a lightweight, plug-and-play solution for real-time hallucination detection in white-box LLMs.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Uncertainty Propagation in LLM-Based Systems
cs.SE 2026-04 unverdicted novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insight...
Evolutionary Search for Automated Design of Uncertainty Quantification Methods
cs.CL 2026-04 unverdicted novelty 7.0

LLM-driven evolutionary search discovers unsupervised UQ methods as Python programs that improve ROC-AUC by up to 6.7% over manual baselines on atomic claim verification across 9 datasets with OOD generalization.
Boosting Self-Consistency with Ranking
cs.CL 2026-06 unverdicted novelty 6.0

RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on Q...
Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification
cs.CL 2026-05 unverdicted novelty 6.0

Introduces functional equivalence methods and functional entropy to predict functional correctness of LLM-generated code via uncertainty quantification, outperforming NLI-based baselines in most tested settings.
How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework
cs.CL 2026-04 unverdicted novelty 6.0

LLM OOD detectors are length-confounded; a two-pathway embedding-plus-trajectory framework detects covert OOD inputs at 0.721 average AUROC and 0.850 on jailbreaks.
The Origins of Stochasticity: Comprehensive Investigations on Uncertainty Quantification for Large Language Models
cs.AI 2026-06 unverdicted novelty 5.0

The paper introduces a four-source uncertainty taxonomy for LLMs and finds that consensus-based UQ methods outperform others while larger models show lower uncertainty estimates.
Reverse Probing: Supervised Token-level Uncertainty Quantification for Large Language Models in Clinical Text
cs.CL 2026-05 unverdicted novelty 5.0

Reverse Probing extracts token-level uncertainty from LLM internal activations on labeled clinical summaries, outperforming eight baselines with up to 4x higher AUPRC on two expert-annotated datasets while lowering co...
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps
cs.CL 2026-04 unverdicted novelty 5.0

Four attention metrics enable logistic regression classifiers that detect hallucinations in SpeechLLMs with up to +0.23 PR-AUC gains over baselines on ASR and translation tasks.
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models
cs.CL 2026-04 unverdicted novelty 5.0

SIVR detects LLM hallucinations by learning from token-wise and layer-wise variance patterns in internal hidden states, outperforming baselines with better generalization and less training data.