Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models

Dongxu Zhang; Hanze Li; Jinhao You; Kai Tang; Renyuan Li; Shanghang Zhang; Tao Luo; Wenya Wang; Xiande Huang; Yichen Guo

arxiv: 2505.12343 · v3 · pith:JJEFEAVBnew · submitted 2025-05-18 · 💻 cs.LG · cs.AI· cs.CV

Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models

Kai Tang , Jinhao You , Yichen Guo , Yiding Sun , Dongxu Zhang , Wenya Wang , Hanze Li , Tao Luo

show 3 more authors

Renyuan Li Xiande Huang Shanghang Zhang

This is my paper

classification 💻 cs.LG cs.AIcs.CV

keywords consistencydcladecodinginter-layerpointsaggregationhallucinationslarge

0 comments

read the original abstract

Despite the impressive capabilities of Large Vision-Language Models (LVLMs), they remain susceptible to hallucinations, where generated content is inconsistent with the input image. Existing training-free hallucination mitigation methods often suffer from unstable performance and high sensitivity to hyperparameter settings, which limits their practicality and broader adoption. In this paper, we propose Decoding with Inter-layer Consistency via Layer Aggregation (DCLA), a training-free decoding mechanism that requires no retraining, fine-tuning, or access to external knowledge bases. Specifically, DCLA constructs a dynamic semantic reference by aggregating representations from previous layers and uses it to correct semantically deviated layers, thereby enforcing inter-layer consistency. Experiments across seven LVLMs and multiple benchmarks demonstrate the generality of DCLA: it surpasses standard decoding by 28.58 MME points on LLaVA1.5-7B and 42.6 MME points on Qwen2.5-VL, while improving POPE accuracy by 2.74 percentage points in the strongest setting.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

HTDC: Hesitation-Triggered Differential Calibration for Mitigating Hallucination in Large Vision-Language Models
cs.CV 2026-04 unverdicted novelty 6.0

HTDC mitigates hallucinations in LVLMs by triggering calibration only at hesitation-prone decoding steps via contrasts with visual-nullification and semantic-nullification probes.