PND reduces object hallucination in VLMs via a dual-path contrast during decoding that amplifies visual features and penalizes linguistic priors, achieving reported SOTA results on POPE, MME, and CHAIR without retraining.
Vision-language models for vision tasks: A survey.IEEE transactions on pattern analysis and machine intelligence, 46(8):5625–5644
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
citing papers explorer
-
Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding
PND reduces object hallucination in VLMs via a dual-path contrast during decoding that amplifies visual features and penalizes linguistic priors, achieving reported SOTA results on POPE, MME, and CHAIR without retraining.
-
Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.