CGC improves fine-grained multi-image understanding in MLLMs by constructing contrastive training instances from existing single-image annotations and adding a rule-based spatial reward, achieving SOTA on MIG-Bench and VLM2-Bench with transfer gains to other multimodal tasks.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
HTDC mitigates hallucinations in LVLMs by triggering calibration only at hesitation-prone decoding steps via contrasts with visual-nullification and semantic-nullification probes.
MESA reduces hallucinations in LVLMs via controlled selective latent intervention that preserves the original token distribution.
citing papers explorer
-
CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding
CGC improves fine-grained multi-image understanding in MLLMs by constructing contrastive training instances from existing single-image annotations and adding a rule-based spatial reward, achieving SOTA on MIG-Bench and VLM2-Bench with transfer gains to other multimodal tasks.
-
HTDC: Hesitation-Triggered Differential Calibration for Mitigating Hallucination in Large Vision-Language Models
HTDC mitigates hallucinations in LVLMs by triggering calibration only at hesitation-prone decoding steps via contrasts with visual-nullification and semantic-nullification probes.
-
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
MESA reduces hallucinations in LVLMs via controlled selective latent intervention that preserves the original token distribution.