Textcot: Zoom-in for enhanced multimodal text-rich image understanding.ACM Transactions on Multimedia Computing, Communications and Applications, 22(4):1–19

Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

Vision-OPD transfers an MLLM's privileged regional perception to its full-image policy through on-policy token-level self-distillation, yielding competitive results on fine-grained visual benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation cs.CV · 2026-05-18 · unverdicted · none · ref 23
Vision-OPD transfers an MLLM's privileged regional perception to its full-image policy through on-policy token-level self-distillation, yielding competitive results on fine-grained visual benchmarks.

Textcot: Zoom-in for enhanced multimodal text-rich image understanding.ACM Transactions on Multimedia Computing, Communications and Applications, 22(4):1–19

fields

years

verdicts

representative citing papers

citing papers explorer