On the generalization capacities of mllms for spatial intelligence.arXiv preprint arXiv:2603.06704,

Gongjie Zhang, Wenhao Li, Quanhao Qian, Jiuniu Wang, Deli Zhao, Shijian Lu, Ran Xu · arXiv 2603.06704

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

VLM3: Vision Language Models Are Native 3D Learners

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

Standard VLMs achieve expert-level 3D performance on depth estimation, pose estimation, and object understanding via three simple techniques without architecture changes or regression losses.

citing papers explorer

Showing 1 of 1 citing paper.

VLM3: Vision Language Models Are Native 3D Learners cs.CV · 2026-05-28 · unverdicted · none · ref 19
Standard VLMs achieve expert-level 3D performance on depth estimation, pose estimation, and object understanding via three simple techniques without architecture changes or regression losses.

On the generalization capacities of mllms for spatial intelligence.arXiv preprint arXiv:2603.06704,

fields

years

verdicts

representative citing papers

citing papers explorer