Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

Aleksandr Yugay; Alexandra Bazarova; Alexey Zaytsev; Alina Ermilova; Andrei Volodichev; Andrey Savchenko; Andrey Shulga; Dmitry Simakov; Julia Belikova; Konstantin Polev

arxiv: 2504.10063 · v5 · submitted 2025-04-14 · 💻 cs.CL · cs.AI

Hallucination Detection in LLMs with Topological Divergence on Attention Graphs

Alexandra Bazarova , Aleksandr Yugay , Andrey Shulga , Alina Ermilova , Andrei Volodichev , Konstantin Polev , Julia Belikova , Rauf Parchiev

show 5 more authors

Dmitry Simakov Maxim Savchenko Andrey Savchenko Serguei Barannikov Alexey Zaytsev

This is my paper

Pith reviewed 2026-05-22 20:51 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords hallucination detectiontopological divergenceattention graphslarge language modelsRAGfactuality assessmentgraph topology

0 comments

The pith

Topological divergence between prompt and response attention graphs flags hallucinations in LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TOHA, a detector that quantifies structural differences via a topological divergence metric on graphs built from attention matrices. It establishes that higher divergence values in particular attention heads consistently mark hallucinated outputs across question answering and summarization tasks, and this pattern holds without dependence on any specific dataset. A reader would care because the method needs little annotated data or compute yet reaches competitive benchmark results by treating attention structure itself as the signal of factual reliability rather than generated text content.

Core claim

Applying a topological divergence metric to graphs induced by attention matrices reveals that greater structural divergence between the prompt subgraph and the response subgraph in selected attention heads correlates with the production of factually incorrect content, and this correlation appears independently of the dataset or task.

What carries the argument

Topological divergence metric on attention-induced graphs, which measures structural differences between prompt and response subgraphs to indicate factual unreliability.

If this is right

The detector applies directly to question answering and summarization while using minimal annotated data.
Performance reaches state-of-the-art or competitive levels on several benchmarks with low computational cost.
Attention topology analysis supplies an efficient indicator of factual reliability independent of dataset.
Consistent patterns emerge in specific attention heads across different tasks and data sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same divergence measure could be tested on non-RAG generation settings to check whether the correlation persists.
Attention-head selection based on divergence might be automated to reduce manual inspection.
Combining the metric with token-level probability signals could produce hybrid detectors with higher precision.

Load-bearing premise

The topological divergence metric applied to attention-induced graphs reliably signals factual unreliability in a manner that generalizes without task-specific tuning or external labels.

What would settle it

Observation of many cases where high topological divergence occurs with accurate outputs, or low divergence occurs with hallucinations, across multiple models and datasets.

Figures

Figures reproduced from arXiv: 2504.10063 by Aleksandr Yugay, Alexandra Bazarova, Alexey Zaytsev, Alina Ermilova, Andrei Volodichev, Andrey Savchenko, Andrey Shulga, Dmitry Simakov, Julia Belikova, Konstantin Polev, Maxim Savchenko, Rauf Parchiev, Serguei Barannikov.

**Figure 2.** Figure 2: displays sample differences ∆ij across three datasets, with each marker representing some attention head. We discovered that the same four (for Mistral-7B) and three (for Llama-2-7B) heads, highlighted in pink, demonstrate similar behaviour across the datasets: they consistently appear in the upper-right corner, indicating strong separation between hallucinated and grounded samples, irrespective of the da… view at source ↗

**Figure 3.** Figure 3: (a)-(b): Detection quality dependence on the size of a probe set, models: Mistral-7B (left), LLama-2-7B [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: a): Inference time comparison (seconds) for various methods evaluated on 16 MS MARCO samples using [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Attention to the first token (<s> in this example) for (a) a hallucinated generation and (b) a grounded one. Green highlights edges and nodes corresponding to grounded tokens, while yellow indicates hallucinated tokens. Model: Mistral-7B. when Nmax = 1, which underscores the effectiveness of our topological approach. 5 Conclusion This paper introduces TOHA, a novel hallucination detection method based on… view at source ↗

**Figure 7.** Figure 7: Comparison of ROC-AUC scores (↑) for single-word versus multi-word model responses. Dataset: SQuAD. B.4 Other metrics for the head selection procedure Additionally, we investigated alternative attentionmap-based scores — including entropy, spectral norm, and the Wasserstein distance between the persistent diagrams of prompts and responses — for selecting specialized attention heads. Following the pipelin… view at source ↗

**Figure 8.** Figure 8: H0 barcode construction. As the threshold increases, the separate connected components merge, resulting in the death of topological features. The horizontal axis is a sequence of thresholds ε, and each horizontal bar corresponds to a single feature. D Properties of MTop-DivG(R, P) Proof of Proposition 3.1. The 0−th CrossBarcode coincides with the set of edges in the minimal spanning tree of the weighted… view at source ↗

**Figure 9.** Figure 9: The distributions of response lengths (in words). [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: ∆ij values for ij heads, MS MARCO and CoQA. Vertical axis corresponds to the difference on the dataset (B), horizontal to the difference on the dataset (A). The heads that separate samples best are highlighted in pink. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

read the original abstract

Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models (LLMs). We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting, which leverages a topological divergence metric to quantify the structural properties of graphs induced by attention matrices. Examining the topological divergence between prompt and response subgraphs reveals consistent patterns: higher divergence values in specific attention heads correlate with hallucinated outputs, independent of the dataset. Extensive experiments - including evaluation on question answering and summarization tasks - show that our approach achieves state-of-the-art or competitive results on several benchmarks while requiring minimal annotated data and computational resources. Our findings suggest that analyzing the topological structure of attention matrices can serve as an efficient and robust indicator of factual reliability in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces TOHA, a TOpology-based HAllucination detector for LLMs in the RAG setting. It constructs graphs from attention matrices, computes a topological divergence metric between prompt and response subgraphs, and reports that higher divergence values in specific attention heads consistently correlate with hallucinated outputs across datasets. Experiments on question answering and summarization tasks are presented as achieving state-of-the-art or competitive results while requiring minimal annotated data and computational resources.

Significance. If the reported correlation between topological divergence and hallucination holds under the described graph construction and metric, the work provides a label-efficient, internal-structure-based detection approach that generalizes without per-task retraining. This could complement existing semantic or logit-based detectors by exploiting attention-graph topology, with potential for low-overhead deployment in production RAG pipelines.

major comments (2)

[§3.2] §3.2 (graph construction and divergence definition): the procedure for inducing subgraphs from attention matrices and the exact formula for topological divergence are described at a high level but lack an explicit mathematical definition or pseudocode; without this, it is difficult to verify reproducibility or assess sensitivity to implementation choices such as edge thresholding.
[§4.3] §4.3 (cross-task results): the claim of dataset independence is supported by consistent patterns on QA and summarization, yet the section does not report statistical tests (e.g., correlation coefficients with confidence intervals or permutation tests) for the divergence-hallucination link, which is load-bearing for the central empirical claim.

minor comments (3)

[Abstract] Abstract: the phrase 'state-of-the-art or competitive results on several benchmarks' should name the specific benchmarks and report the exact metrics (e.g., F1, AUROC) to allow immediate evaluation.
[Figure 3] Figure 3 (attention graph visualizations): axis labels and color scales are underspecified; adding a legend for node/edge attributes would improve clarity.
[§2] §2 (related work): the discussion of prior attention-based hallucination detectors omits recent graph-neural approaches; adding 2-3 citations would strengthen context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each point below and will revise the manuscript accordingly to improve clarity and empirical rigor.

read point-by-point responses

Referee: [§3.2] §3.2 (graph construction and divergence definition): the procedure for inducing subgraphs from attention matrices and the exact formula for topological divergence are described at a high level but lack an explicit mathematical definition or pseudocode; without this, it is difficult to verify reproducibility or assess sensitivity to implementation choices such as edge thresholding.

Authors: We agree that the current description is at a high level and that explicit definitions are needed for reproducibility. In the revised version we will add the precise mathematical formulation for constructing the prompt and response subgraphs from attention matrices (including the edge selection rule), the exact definition of the topological divergence metric, and pseudocode for the full procedure. This addition will also permit readers to evaluate sensitivity to thresholding choices. revision: yes
Referee: [§4.3] §4.3 (cross-task results): the claim of dataset independence is supported by consistent patterns on QA and summarization, yet the section does not report statistical tests (e.g., correlation coefficients with confidence intervals or permutation tests) for the divergence-hallucination link, which is load-bearing for the central empirical claim.

Authors: We acknowledge that formal statistical support would strengthen the central claim. In the revision we will report Pearson and Spearman correlation coefficients (with bootstrap confidence intervals) between topological divergence and hallucination labels, together with permutation tests that assess whether the observed association exceeds chance levels, for both the QA and summarization settings. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces TOHA as an empirical detector that computes a topological divergence metric on graphs constructed from attention matrices of prompt and response tokens. The central claim rests on observed correlations between higher divergence in specific heads and hallucinated outputs, validated through experiments on QA and summarization benchmarks. No derivation step reduces the metric definition or head-selection procedure to a fitted parameter or self-referential input; the graph construction and divergence formula are stated independently of the hallucination labels. Cross-dataset consistency is reported as an experimental finding rather than a constructed prediction. The argument therefore remains self-contained against external benchmarks and does not rely on load-bearing self-citations or ansatzes that presuppose the target result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, background axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 5719 in / 1001 out tokens · 67568 ms · 2026-05-22T20:51:31.798345+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MTop-DivG(R, P) equals the length of the minimal spanning forest attaching R to P (Proposition 3.1)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Hallucination-aware heads identified by consistent Δij separation across datasets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models
cs.CL 2026-04 unverdicted novelty 6.0

SinkProbe detects hallucinations in LLMs by analyzing attention sinks in attention maps, showing they indicate transitions to prior-dominated computation and achieving state-of-the-art results.
Topological Data Analysis Applications in Natural Language Processing: A Survey
cs.CL 2024-11 accept novelty 6.0

This survey compiles 137 papers on Topological Data Analysis in NLP, categorizing them into theoretical explanations of language and practical integrations into ML systems while noting open challenges.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

InProceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807

Don‘t give me the details, just the summary! topic-aware convolutional neural networks for ex- treme summarization. InProceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807. Association for Computational Linguistics. Alexander Nikitin, Jannik Kossen, Yarin Gal, and Pekka Marttinen. 2024. Kernel language e...

work page 2018
[2]

Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Re- ichart, Idan Szpektor, Hadas Kotek, and Yonatan Belinkov

RAGTruth: A hallucination corpus for de- veloping trustworthy retrieval-augmented language models.arXiv preprint arXiv:2401.00396. Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Re- ichart, Idan Szpektor, Hadas Kotek, and Yonatan Belinkov. 2025. Llms know more than they show: On the intrinsic representation of llm hallucinations. InICLR. Irina Proskurina,...

work page arXiv 2025
[3]

Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mo- hammad Saleh, Balaji Lakshminarayanan, and Pe- ter J Liu

CoQA: A conversational question answering challenge.Transactions of the Association for Com- putational Linguistics, 7:249–266. Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mo- hammad Saleh, Balaji Lakshminarayanan, and Pe- ter J Liu. 2023. Out-of-distribution detection and selective generation for conditional language mod- els. InThe Eleventh Internat...

work page 2023
[4]

Redeep: Detecting hallucination in retrieval- augmented generation via mechanistic interpretabil- ity. InICLR. Christopher Tralie, Nathaniel Saul, and Rann Bar-On

work page
[5]

Topological Data Analysis Applications in Natural Language Processing: A Survey

Ripser.py: A lean persistent homology library for python.The Journal of Open Source Software, 3(29):925. Eduard Tulchinskii, Kristian Kuznetsov, Daniil Cherni- avskii, Serguei Barannikov, Sergey Nikolenko, and Evgeny Burnaev. 2023. Topological data analysis for speech processing. InProceedings of the Annual Conference of the International Speech Communica...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

This property is immediately obtained from the properties of an attention map: all its weights lie between0and1

work page
[7]

Denote by MSF(R, P) the minimum spanning forest attaching R to P . Note that we have proper- ties D.1, so MTop-Div(R, P) = X e∈MSF(R,P) w(e).(5) Therefore, we have to show that the weight of MSF(R, P) does not change significantly when all weights are changed by no more thanε. There are two possibilities: 1) after a change, all MSF edges remain the same, ...

work page
[8]

ideal” case: if a model “knows what to look at

We have to check the definition of the exact sequence: Ker(ri) =Im(r i+1). For a pair r0, r 1, it is equivalent to the surjectvity of r1. The H0 homology group of a graph corresponds to the connected components of the graph. The set of edges E≤α (G,w) ={e∈E G|we ≤α} is always a subset in the analogous set of the weighted graph (G, w(R∪P)/P ) with all weig...

work page 2016
[9]

Rouge-L scoring: we computed Rouge-L scores (using the evaluate library, v0.4.6) between the model’s response and the ground- truth answers

work page
[10]

Responses with a Rouge-L score of 1 (exact match) were labeled as grounded

Substring matching: we checked whether any ground-truth answer was a substring of the response. Responses with a Rouge-L score of 1 (exact match) were labeled as grounded. Those meeting both of the following criteria were flagged as potential hallucinations: • Rouge-L score ≤0.3 (following (Kuhn et al., 2023)); • no ground-truth answer appears as a substr...

work page 2023
[11]

warning system

Ethical risks from deployment: overcon- fidence in TOHA’s scores could lead to unchecked LLM outputs in high-stakes scenar- ios (e.g., healthcare). TOHA should be frame as a "warning system" rather than a definitive filter, and advocate for human review

work page
[12]

Attention manipulation attacks: adversarial prompts could artificially alter attention pat- terns, evading detection. 19 0 100 200 0 500 1000 1500CoQA Median: 4.0 Mistral-7B 0 50 100 0 200 400 600 Median: 6.0 LLama-2-7B 0 10 20 30 0 100 200 300 Median: 5.0 LLama-2-13B 0 50 100 0 50 100 150 Median: 12.0 Qwen2.5-7B 0 100 200 0 200 400 600 Median: 4.0 Llama-...

work page

[1] [1]

InProceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807

Don‘t give me the details, just the summary! topic-aware convolutional neural networks for ex- treme summarization. InProceedings of the 2018 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1797–1807. Association for Computational Linguistics. Alexander Nikitin, Jannik Kossen, Yarin Gal, and Pekka Marttinen. 2024. Kernel language e...

work page 2018

[2] [2]

Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Re- ichart, Idan Szpektor, Hadas Kotek, and Yonatan Belinkov

RAGTruth: A hallucination corpus for de- veloping trustworthy retrieval-augmented language models.arXiv preprint arXiv:2401.00396. Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Re- ichart, Idan Szpektor, Hadas Kotek, and Yonatan Belinkov. 2025. Llms know more than they show: On the intrinsic representation of llm hallucinations. InICLR. Irina Proskurina,...

work page arXiv 2025

[3] [3]

Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mo- hammad Saleh, Balaji Lakshminarayanan, and Pe- ter J Liu

CoQA: A conversational question answering challenge.Transactions of the Association for Com- putational Linguistics, 7:249–266. Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mo- hammad Saleh, Balaji Lakshminarayanan, and Pe- ter J Liu. 2023. Out-of-distribution detection and selective generation for conditional language mod- els. InThe Eleventh Internat...

work page 2023

[4] [4]

Redeep: Detecting hallucination in retrieval- augmented generation via mechanistic interpretabil- ity. InICLR. Christopher Tralie, Nathaniel Saul, and Rann Bar-On

work page

[5] [5]

Topological Data Analysis Applications in Natural Language Processing: A Survey

Ripser.py: A lean persistent homology library for python.The Journal of Open Source Software, 3(29):925. Eduard Tulchinskii, Kristian Kuznetsov, Daniil Cherni- avskii, Serguei Barannikov, Sergey Nikolenko, and Evgeny Burnaev. 2023. Topological data analysis for speech processing. InProceedings of the Annual Conference of the International Speech Communica...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

This property is immediately obtained from the properties of an attention map: all its weights lie between0and1

work page

[7] [7]

Denote by MSF(R, P) the minimum spanning forest attaching R to P . Note that we have proper- ties D.1, so MTop-Div(R, P) = X e∈MSF(R,P) w(e).(5) Therefore, we have to show that the weight of MSF(R, P) does not change significantly when all weights are changed by no more thanε. There are two possibilities: 1) after a change, all MSF edges remain the same, ...

work page

[8] [8]

ideal” case: if a model “knows what to look at

We have to check the definition of the exact sequence: Ker(ri) =Im(r i+1). For a pair r0, r 1, it is equivalent to the surjectvity of r1. The H0 homology group of a graph corresponds to the connected components of the graph. The set of edges E≤α (G,w) ={e∈E G|we ≤α} is always a subset in the analogous set of the weighted graph (G, w(R∪P)/P ) with all weig...

work page 2016

[9] [9]

Rouge-L scoring: we computed Rouge-L scores (using the evaluate library, v0.4.6) between the model’s response and the ground- truth answers

work page

[10] [10]

Responses with a Rouge-L score of 1 (exact match) were labeled as grounded

Substring matching: we checked whether any ground-truth answer was a substring of the response. Responses with a Rouge-L score of 1 (exact match) were labeled as grounded. Those meeting both of the following criteria were flagged as potential hallucinations: • Rouge-L score ≤0.3 (following (Kuhn et al., 2023)); • no ground-truth answer appears as a substr...

work page 2023

[11] [11]

warning system

Ethical risks from deployment: overcon- fidence in TOHA’s scores could lead to unchecked LLM outputs in high-stakes scenar- ios (e.g., healthcare). TOHA should be frame as a "warning system" rather than a definitive filter, and advocate for human review

work page

[12] [12]

Attention manipulation attacks: adversarial prompts could artificially alter attention pat- terns, evading detection. 19 0 100 200 0 500 1000 1500CoQA Median: 4.0 Mistral-7B 0 50 100 0 200 400 600 Median: 6.0 LLama-2-7B 0 10 20 30 0 100 200 300 Median: 5.0 LLama-2-13B 0 50 100 0 50 100 150 Median: 12.0 Qwen2.5-7B 0 100 200 0 200 400 600 Median: 4.0 Llama-...

work page