ILVAD is a plug-and-play method that builds a saliency map from inter-layer attention discrepancies on early tokens to enhance visual evidence focus and ground generated text, reducing hallucinations in LVLMs.
Cracking the code of hallucination in lvlms with vision-aware head divergence
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Finding the Correct Visual Evidence Without Forgetting: Mitigating Hallucination in LVLMs via Inter-Layer Visual Attention Discrepancy
ILVAD is a plug-and-play method that builds a saliency map from inter-layer attention discrepancies on early tokens to enhance visual evidence focus and ground generated text, reducing hallucinations in LVLMs.