SSR3D-LLM improves fine-grained 3D grounding in unified 3D-LLMs by generating and scoring sequences of latent spatial reasoning steps from the query using fixed Mask3D proposals.
Better, stronger, faster: Tackling the trilemma in mllm-based segmentation with simultaneous textual mask prediction, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
DPVR-LF routes saturated vision tokens into a one-layer side branch after layer 4, runs text-only processing through layers 5-17, and performs late fusion at the final layer to reduce visual computation while preserving multimodal performance.
citing papers explorer
-
SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs
SSR3D-LLM improves fine-grained 3D grounding in unified 3D-LLMs by generating and scoring sequences of latent spatial reasoning steps from the query using fixed Mask3D proposals.