How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SpatialMosaic: A Multiview VLM Dataset for Partial Visibility

cs.CV · 2025-12-29 · unverdicted · novelty 7.0

SpatialMosaic introduces a 2M-pair multi-view QA dataset and 1M-pair benchmark for MLLMs on spatial reasoning under partial visibility, plus a hybrid baseline that integrates 3D reconstruction models as geometry encoders.

citing papers explorer

Showing 1 of 1 citing paper.

SpatialMosaic: A Multiview VLM Dataset for Partial Visibility cs.CV · 2025-12-29 · unverdicted · none · ref 6
SpatialMosaic introduces a 2M-pair multi-view QA dataset and 1M-pair benchmark for MLLMs on spatial reasoning under partial visibility, plus a hybrid baseline that integrates 3D reconstruction models as geometry encoders.

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

fields

years

verdicts

representative citing papers

citing papers explorer