pith. sign in

Baseline reference

Mmiu: Multimodal multi-image understanding for evaluating large vision-language models

Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.

8 Pith papers citing it
Baseline 50% of classified citations

citation-role summary

background 3 dataset 3

citation-polarity summary

fields

cs.CV 8

representative citing papers

CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding

cs.CV · 2026-04-24 · unverdicted · novelty 7.0

CGC improves fine-grained multi-image understanding in MLLMs by constructing contrastive training instances from existing single-image annotations and adding a rule-based spatial reward, achieving SOTA on MIG-Bench and VLM2-Bench with transfer gains to other multimodal tasks.

citing papers explorer

Showing 8 of 8 citing papers.