Cross- image contrastive decoding: Precise, lossless suppression of language priors in large vision-language models

Jianfei Zhao, Feng Zhang, Xin Sun, Lingxing Kong, Zhixing Tan, Chong Feng · 2025 · arXiv 2505.10634

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

YARD: Y-Architecture Register Decoding for Efficient Hallucination Mitigation in Large Vision-Language Models

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

YARD is a training-free method using Y-shaped decoder architecture and register tokens to improve contrastive decoding for hallucination reduction in LVLMs with lower latency.

Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

cs.CV · 2025-11-25 · unverdicted · novelty 6.0

VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.

citing papers explorer

Showing 2 of 2 citing papers after filters.

YARD: Y-Architecture Register Decoding for Efficient Hallucination Mitigation in Large Vision-Language Models cs.CV · 2026-05-29 · unverdicted · none · ref 54
YARD is a training-free method using Y-shaped decoder architecture and register tokens to improve contrastive decoding for hallucination reduction in LVLMs with lower latency.
Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation cs.CL · 2026-04-14 · unverdicted · none · ref 63
DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.

Cross- image contrastive decoding: Precise, lossless suppression of language priors in large vision-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer