Mini-internvl: a flexible-transfer pocket multi-modal model with 5% parameters and 90% perfor- mance.Visual Intelligence, 2(1):1–17, 2024

Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, et al · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

cs.CV · 2025-04-15 · conditional · novelty 7.0

Consensus Entropy measures inter-VLM output agreement to verify OCR reliability and enable self-improving ensembles, yielding 42.1% F1 gains over single-model judging.

Responses Fall Short of Understanding: Revealing the Gap between Internal Representations and Responses in Visual Document Understanding

cs.CL · 2026-04-06 · unverdicted · novelty 5.0

Linear probing reveals a gap between internal representations and responses in LVLMs for visual document understanding, with task information encoded more linearly in intermediate layers than the final layer, and fine-tuning those layers narrows the gap.

citing papers explorer

Showing 2 of 2 citing papers.

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR cs.CV · 2025-04-15 · conditional · none · ref 16
Consensus Entropy measures inter-VLM output agreement to verify OCR reliability and enable self-improving ensembles, yielding 42.1% F1 gains over single-model judging.
Responses Fall Short of Understanding: Revealing the Gap between Internal Representations and Responses in Visual Document Understanding cs.CL · 2026-04-06 · unverdicted · none · ref 14
Linear probing reveals a gap between internal representations and responses in LVLMs for visual document understanding, with task information encoded more linearly in intermediate layers than the final layer, and fine-tuning those layers narrows the gap.

Mini-internvl: a flexible-transfer pocket multi-modal model with 5% parameters and 90% perfor- mance.Visual Intelligence, 2(1):1–17, 2024

fields

years

verdicts

representative citing papers

citing papers explorer