PND reduces object hallucination in VLMs via a dual-path contrast during decoding that amplifies visual features and penalizes linguistic priors, achieving reported SOTA results on POPE, MME, and CHAIR without retraining.
Qwen3 technical report
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
MODIX dynamically rescales positional indices in VLMs using intra-modal covariance-based entropy and inter-modal alignment scores to allocate finer granularity to informative content.
citing papers explorer
-
Breaking the Illusion: When Positive Meets Negative in Multimodal Decoding
PND reduces object hallucination in VLMs via a dual-path contrast during decoding that amplifies visual features and penalizes linguistic priors, achieving reported SOTA results on POPE, MME, and CHAIR without retraining.
-
MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models
MODIX dynamically rescales positional indices in VLMs using intra-modal covariance-based entropy and inter-modal alignment scores to allocate finer granularity to informative content.