LTS-FS locates hallucination-relevant layers in LVLMs via causal attribution on a constructed dataset and applies sparse layerwise feature steering to mitigate hallucinations while preserving general task performance.
Woodpecker: Hallucination correction for multimodal large language models.Science China Information Sciences, 67(12):220105, 2024
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.
UE-DPO quantifies epistemic uncertainty from grounding failures to direct more learning pressure on hard visual tokens in preferred samples while easing penalties on dispreferred ones.
citing papers explorer
-
Locate-then-Sparsify: Attribution Guided Sparse Strategy for Visual Hallucination Mitigation
LTS-FS locates hallucination-relevant layers in LVLMs via causal attribution on a constructed dataset and applies sparse layerwise feature steering to mitigate hallucinations while preserving general task performance.
-
Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention
VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.
-
Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models
UE-DPO quantifies epistemic uncertainty from grounding failures to direct more learning pressure on hard visual tokens in preferred samples while easing penalties on dispreferred ones.