pith. sign in

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites.Science China Information Sciences, 67(12):220101

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

fields

cs.CV 4

years

2026 4

verdicts

UNVERDICTED 4

roles

background 3

polarities

background 3

representative citing papers

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% for high-resolution images in MLLMs via slice-based encoding plus intra-ViT early compression while matching or exceeding baseline performance on document, OCR, and VQA benchmarks.

X2SAM: Any Segmentation in Images and Videos

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

X2SAM unifies any-segmentation across images and videos in one MLLM by adding a Mask Memory module for temporal consistency and joint training on mixed datasets.

citing papers explorer

Showing 4 of 4 citing papers.