pith. sign in

On the generalization capacities of mllms for spatial intelligence.arXiv preprint arXiv:2603.06704,

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.CV 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

VLM3: Vision Language Models Are Native 3D Learners

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

Standard VLMs achieve expert-level 3D performance on depth estimation, pose estimation, and object understanding via three simple techniques without architecture changes or regression losses.

citing papers explorer

Showing 1 of 1 citing paper.

  • VLM3: Vision Language Models Are Native 3D Learners cs.CV · 2026-05-28 · unverdicted · none · ref 19

    Standard VLMs achieve expert-level 3D performance on depth estimation, pose estimation, and object understanding via three simple techniques without architecture changes or regression losses.