Internvl: Scaling up vision foundation models and aligning for generic visual- linguistic tasks

Wenhai Wang, Zhe Chen, Yangzhou Liu, Yue Cao, Weiyun Wang, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

TempGlitch is a controlled benchmark showing that 12 evaluated VLMs perform near chance level on detecting five types of temporal glitches in gameplay videos, with denser sampling and larger models providing no reliable improvement.

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks

cs.CV · 2026-03-04 · unverdicted · novelty 6.0

PulseFocus improves multi-image reasoning in VLMs by interleaving planning and attention-gated focus blocks during chain-of-thought, achieving gains on BLINK and MuirBench.

citing papers explorer

Showing 2 of 2 citing papers.

TempGlitch: Evaluating Vision-Language Models for Temporal Glitch Detection in Gameplay Videos cs.CV · 2026-05-20 · unverdicted · none · ref 34
TempGlitch is a controlled benchmark showing that 12 evaluated VLMs perform near chance level on detecting five types of temporal glitches in gameplay videos, with denser sampling and larger models providing no reliable improvement.
Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks cs.CV · 2026-03-04 · unverdicted · none · ref 15
PulseFocus improves multi-image reasoning in VLMs by interleaving planning and attention-gated focus blocks during chain-of-thought, achieving gains on BLINK and MuirBench.

Internvl: Scaling up vision foundation models and aligning for generic visual- linguistic tasks

fields

years

verdicts

representative citing papers

citing papers explorer