Concretely, mirror each above-surface object’s bounding box across the surface plane and rank candidates by geometric overlap with Bk

3D-to-2D correspondence

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Do Vision-Language Models Understand 3D Scenes or Just Catalogue Objects?

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

VLMs achieve 53-97% on rearrangement planning but only 6-45% on occlusion and under 7% on reflections, with failures localized to visual token compression after the vision encoder.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Concretely, mirror each above-surface object’s bounding box across the surface plane and rank candidates by geometric overlap with Bk

fields

years

verdicts

representative citing papers

citing papers explorer