3D awareness emerges implicitly in MLLMs via self-supervised geometric constraints that create an information bottleneck, removing depth and pose dependencies at inference and cutting latency by 55%.
3djcg: A unified framework for joint dense captioning and visual grounding on 3d point clouds
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
3D-IDE: 3D Implicit Depth Emergent
3D awareness emerges implicitly in MLLMs via self-supervised geometric constraints that create an information bottleneck, removing depth and pose dependencies at inference and cutting latency by 55%.