YARD is a training-free method using Y-shaped decoder architecture and register tokens to improve contrastive decoding for hallucination reduction in LVLMs with lower latency.
Cross- image contrastive decoding: Precise, lossless suppression of language priors in large vision-language models
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
baseline 1polarities
baseline 1representative citing papers
DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.
VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.
citing papers explorer
-
YARD: Y-Architecture Register Decoding for Efficient Hallucination Mitigation in Large Vision-Language Models
YARD is a training-free method using Y-shaped decoder architecture and register tokens to improve contrastive decoding for hallucination reduction in LVLMs with lower latency.
-
Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.