Beyond unimodal shortcuts: Mllms as cross-modal reasoners for grounded named entity recognition.arXiv preprint arXiv:2602.04486, 2026

Jinlong Ma, Yu Zhang, Xuefeng Bai, Kehai Chen, Yuwei Wang, Zeming Liu, Jun Yu, Min Zhang · 2026 · arXiv 2602.04486

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Vision-OPD uses on-policy self-distillation from crop-conditioned to full-image policies within the same MLLM to close the regional-to-global perception gap.

Mitigating Multimodal Hallucination via Phase-wise Self-reward

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation cs.CV · 2026-05-18 · unverdicted · none · ref 24
Vision-OPD uses on-policy self-distillation from crop-conditioned to full-image policies within the same MLLM to close the regional-to-global perception gap.
Mitigating Multimodal Hallucination via Phase-wise Self-reward cs.CV · 2026-04-20 · unverdicted · none · ref 30
PSRD mitigates visual hallucinations in LVLMs via phase-wise self-reward decoding, cutting rates by 50% on LLaVA-1.5-7B and outperforming prior methods on five benchmarks.

Beyond unimodal shortcuts: Mllms as cross-modal reasoners for grounded named entity recognition.arXiv preprint arXiv:2602.04486, 2026

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer