pith. sign in

arxiv: 2606.28520 · v1 · pith:KIC65EOKnew · submitted 2026-06-26 · 💻 cs.CV · cs.CL

Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

Pith reviewed 2026-06-30 01:33 UTC · model grok-4.3

classification 💻 cs.CV cs.CL
keywords hallucination detectionvision-language modelsclinical imagingvisual groundingcounterfactual perturbationuncertainty estimationmedical AI
0
0 comments X

The pith

A framework audits arbitrary responses from clinical vision-language models by grounding extracted entities and scoring uncertainty through counterfactual image perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method to detect hallucinations—textual claims not supported by the image—in large vision-language models used for medical image interpretation. It extracts verifiable entities from any model response, localizes them on the input image with a domain-adapted verifier, then perturbs those entities to create counterfactual versions and measures how much the localization confidence shifts. The resulting uncertainty score combines factual confidence, counterfactual confidence, and spatial overlap to decide whether an entity is hallucinated. This matters because current LVLMs are deployed in clinical settings yet can generate unsupported findings, and the approach works without any internal model access or fine-tuning. Experiments across imaging modalities and model backbones show gains over prior detection baselines together with localization maps and transfer across models.

Core claim

The central claim is that entity-level hallucination decisions can be made by computing a visual evidence uncertainty score that contrasts factual grounding results against those obtained after counterfactual entity perturbation; the score is formed from positive confidence, counterfactual confidence, and their grounding overlap, and this procedure yields improved detection performance, interpretable localization, and cross-model transfer without requiring changes to the target LVLM.

What carries the argument

Counterfactual visual grounding uncertainty: the mechanism that extracts entities, localizes them factually and after perturbation, then derives an uncertainty score for binary hallucination classification.

If this is right

  • The method improves hallucination detection performance over recent baselines on multiple medical imaging modalities and LVLM backbones.
  • It supplies interpretable localization evidence for each detected hallucination.
  • It exhibits strong cross-model transferability without retraining the target LVLM.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same grounding-plus-counterfactual pattern could be tested on non-clinical image domains where entity localization verifiers already exist.
  • If the uncertainty scores prove stable under different verifiers, the framework might support auditing pipelines that swap in new grounding models as they improve.

Load-bearing premise

The domain-adapted grounding verifier accurately localizes entities taken from arbitrary LVLM responses on clinical images.

What would settle it

A held-out test set of clinical images with known hallucinated versus supported entities where the uncertainty scores fail to separate the two groups at rates better than chance.

Figures

Figures reproduced from arXiv: 2606.28520 by Caifeng Shan, Haonan Qin, Jiong Zhang, Xiao Song, Yuqi Fang, Zhaoxu Zhang.

Figure 1
Figure 1. Figure 1: Comparison of hallucination detection paradigms. (a) Hidden state-based methods: access LVLM’s internal hidden states. (b) External verifier-based meth￾ods: rely on external expert models. (c) Ours: identifies hallucinations by aligning responses to visual regions, improving interpretability via visual evidence. and decision support. Despite rapid progress, LVLMs remain prone to halluci￾nations [7,9,2,17,1… view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline of the proposed Counterfactual-driven Visual Grounding Uncertainty Estimation method. ① Given a response R from an arbitrary LVLM, we extract entities E and construct counterfactual entities E˜ using radiological knowledge. ② Constructing the factual and counterfactual queries. ③ A trained grounding verifier predicts bounding boxes (b + e , b− e ) and confidence scores (s + e , s− e ). ④ Uncertain… view at source ↗
Figure 3
Figure 3. Figure 3: Ablation studies on hallucination detection. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of hallucination detection. (a) Non-hallucination case where fac￾tual branch detects grounded visual evidence and makes correct decision. (b) Halluci￾nation case missed by factual branch but successfully detected by our algorithm. This enables robust uncertainty estimation, effectively distinguishing hallucina￾tions by suppressing spurious visual alignments from a single factual branch. Exper… view at source ↗
read the original abstract

Large vision-language models (LVLMs) are increasingly used for clinical image understanding, yet they remain vulnerable to \emph{hallucinations}--producing textual findings or attributes not supported by the image. We present a vision-traceable hallucination detection framework that audits arbitrary LVLM responses via visual evidence grounding, requiring neither modification nor internal access to the hidden states of LVLMs. Given an LVLM response, we extract visually verifiable entities and use a medical-domain-adapted Qwen-VL grounding verifier to localize each entity on the input image. To enhance the robustness of our detection method, we introduce a counterfactual entity perturbation method and estimate visual evidence uncertainty by contrasting factual and counterfactual grounding results. Specifically, we compute an entity-level uncertainty score from the positive confidence, counterfactual confidence, and their grounding overlap for binary hallucination decision-making. Experiments on multiple medical imaging modalities and LVLM backbones demonstrate that our method consistently improves hallucination detection performance over recent baselines, while providing interpretable localization evidence and strong cross-model transferability. Code and dataset are available at https://github.com/Agentic-CliniAI/CounterVHD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents a black-box hallucination detection framework for LVLMs on clinical images. Given an LVLM response, entities are extracted and localized on the input image by a medical-domain-adapted Qwen-VL grounding verifier. A counterfactual entity perturbation is applied to produce factual and counterfactual grounding maps; an entity-level uncertainty score is then computed from positive confidence, counterfactual confidence, and grounding overlap to yield binary hallucination decisions. The authors claim consistent gains over recent baselines across multiple medical imaging modalities and LVLM backbones, plus interpretable localization evidence and strong cross-model transferability. Code and dataset are released.

Significance. If the central claims hold, the work supplies a practical, model-agnostic auditing tool that does not require hidden-state access—an important capability for safe clinical deployment of LVLMs. The counterfactual contrast and grounding-based uncertainty metric constitute a distinct technical contribution relative to prior logit- or embedding-based detectors. Releasing code and data supports reproducibility.

major comments (1)
  1. [Abstract / Method] Abstract and method description: the binary hallucination decisions and the claimed cross-model transferability rest on the accuracy of the medical-adapted Qwen-VL grounding verifier when applied to entities extracted from arbitrary target LVLM responses. No independent localization benchmark (IoU, pointing accuracy, or similar) is reported for this verifier on the precise distribution of clinical entities that appear in the evaluated responses.
minor comments (1)
  1. [Abstract] The abstract supplies no quantitative metrics, dataset sizes, statistical tests, or ablation results, which prevents readers from assessing the magnitude or reliability of the reported improvements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The single major comment raises a valid point about the grounding verifier's accuracy, which we address below with a commitment to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract / Method] Abstract and method description: the binary hallucination decisions and the claimed cross-model transferability rest on the accuracy of the medical-adapted Qwen-VL grounding verifier when applied to entities extracted from arbitrary target LVLM responses. No independent localization benchmark (IoU, pointing accuracy, or similar) is reported for this verifier on the precise distribution of clinical entities that appear in the evaluated responses.

    Authors: We agree that the absence of a direct, independent localization benchmark for the medical-adapted Qwen-VL verifier on entities drawn from the target LVLM responses is a limitation. While the verifier was domain-adapted and the end-to-end hallucination detection gains (plus cross-model transfer) provide indirect support for its utility, a standalone evaluation (e.g., IoU or pointing accuracy on annotated clinical entities) would more rigorously substantiate the claims. In the revised manuscript we will add such a benchmark: we will manually annotate a representative subset of entities extracted from the evaluated responses across modalities and report localization metrics for the verifier. This addition will also clarify the basis for the reported cross-model transferability. revision: yes

Circularity Check

0 steps flagged

No circularity; method uses external verifier and contrastive perturbation

full rationale

The derivation extracts entities, applies an external medical-adapted Qwen-VL grounding verifier, generates counterfactual perturbations, and computes an uncertainty score from positive/counterfactual confidence plus overlap. None of these steps reduce by definition or self-citation to the hallucination labels being evaluated; the chain remains independent of the target data and does not invoke load-bearing self-citations or fitted-input predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim depends on the accuracy of an external medical grounding model and the validity of counterfactual perturbation for uncertainty; no explicit free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption A medical-domain-adapted Qwen-VL model can reliably localize extracted entities in clinical images.
    The entire detection pipeline rests on this verifier's performance as described in the method section of the abstract.

pith-pipeline@v0.9.1-grok · 5745 in / 1286 out tokens · 35428 ms · 2026-06-30T01:33:28.793812+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 6 canonical work pages · 5 internal anchors

  1. [1]

    Qwen3-VL Technical Report

    Bai, S., Cai, Y., Chen, R., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

  2. [2]

    Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

    Chen, J., Yang, D., Wu, T., Jiang, Y., Hou, X., Li, M., Wang, S., Xiao, D., Li, K., Zhang, L.: Detecting and evaluating medical hallucinations in large vision language models. arXiv preprint arXiv:2406.10185 (2024)

  3. [3]

    Chen, X., Wang, C., Xue, Y., Zhang, N., Yang, X., Li, Q., Shen, Y., Liang, L., Gu, J., Chen, H.: Unified hallucination detection for multimodal large language models. In: ACL. pp. 3235–3252 (2024)

  4. [4]

    In: CVPR

    Cheng, J., Fu, B., Ye, J., Wang, G., Li, T., Wang, H., Li, R., Yao, H., Cheng, J., Li, J., et al.: Interactive medical image segmentation: A benchmark dataset and baseline. In: CVPR. pp. 20841–20851 (2025) 10 X. Song et al

  5. [5]

    Hardy, R., Kim, S.E., Rajpurkar, P., et al.: Rextrust: A model for fine-grained hallucinationdetectioninai-generatedradiologyreports.In:AAAIBridgeProgram on AI for Medicine and Healthcare. pp. 173–182 (2025)

  6. [6]

    In: Findings of EMNLP

    Jing, L., Li, R., Chen, Y., Du, X.: Faithscore: Fine-grained evaluations of hallu- cinations in large vision-language models. In: Findings of EMNLP. pp. 5042–5063 (2024)

  7. [7]

    In: MICCAI

    Khanal, B., Pokhrel, S., Bhandari, S., Rana, R., Shrestha, N., Gurung, R.B., Linte, C., Watson, A., Shrestha, Y.R., Bhattarai, B.: Hallucination-aware multimodal benchmark for gastrointestinal image analysis with large vision-language models. In: MICCAI. pp. 235–245. Springer (2025)

  8. [8]

    In: Findings of EMNLP

    Li, Q., Geng, J., Lyu, C., Zhu, D., Panov, M., Karray, F.: Reference-free hallu- cination detection for large vision-language models. In: Findings of EMNLP. pp. 4542–4551 (2024)

  9. [9]

    In: MIC- CAI

    Liao, Z., Hu, S., Zou, K., Fu, H., Zhen, L., Xia, Y.: Vision-amplified semantic entropy for hallucination detection in medical visual question answering. In: MIC- CAI. pp. 669–679. Springer (2025)

  10. [10]

    arXiv preprint arXiv:2503.20504 (2025)

    Liao, Z., Hu, S., Zou, K., Jin, M., Zhang, Y., Fu, H., Zhen, L., Xia, Y.: Univrse: Unified vision-conditioned response semantic entropy for hallucination detection in medical vision-language models. arXiv preprint arXiv:2503.20504 (2025)

  11. [11]

    OpenAI: Introducing GPT-5 (Aug 2025),https://openai.com/zh-Hans-CN/ index/introducing-gpt-5

  12. [12]

    MedGemma Technical Report

    Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

  13. [13]

    In: ACM BCB

    Song, X., Liu, J., Liu, Y., Li, Y., Lei, W., Wang, R.: Rethinking radiology report generation via causal inspired counterfactual augmentation. In: ACM BCB. pp. 1–10 (2024)

  14. [14]

    Gemini: A Family of Highly Capable Multimodal Models

    Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  15. [15]

    AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

    Wang, J., Wang, Y., Xu, G., Zhang, J., Gu, Y., Jia, H., Wang, J., Xu, H., Yan, M., Zhang, J., et al.: Amber: An llm-free multi-dimensional benchmark for mllms hallucination evaluation. arXiv preprint arXiv:2311.07397 (2023)

  16. [16]

    xAI: Grok 4 (Jul 2025),https://x.ai/news/grok-4

  17. [17]

    In: AAAI

    Xiao, W., Huang, Z., Gan, L., He, W., Li, H., Yu, Z., Shu, F., Jiang, H., Zhu, L.: Detecting and mitigating hallucination in large vision language models via fine- grained ai feedback. In: AAAI. vol. 39, pp. 25543–25551 (2025)

  18. [18]

    IEEE Trans

    Zou, K., Bai, Y., Liu, B., Chen, Y., Chen, Z., Zhou, Y., Yuan, X., Wang, M., Shen, X., Cao, X., et al.: Uncertainty-aware medical diagnostic phrase identification and grounding. IEEE Trans. Pattern Anal. Mach. Intell. (2025)