Vision-OPD uses on-policy self-distillation from crop-conditioned to full-image policies within the same MLLM to close the regional-to-global perception gap.
Beyond unimodal shortcuts: Mllms as cross-modal reasoners for grounded named entity recognition.arXiv preprint arXiv:2602.04486, 2026
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.
citing papers explorer
-
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
Vision-OPD uses on-policy self-distillation from crop-conditioned to full-image policies within the same MLLM to close the regional-to-global perception gap.
-
Mitigating Multimodal Hallucination via Phase-wise Self-reward
PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.