AFFORDMEM improves AP50 by 3.23-3.7 points on SceneFun3D splits by using a reusable cross-scene affordance memory bank and in-scene spatial memory to guide VLMs toward actionable 3D regions.
MVGGT: Multimodal visual geometry grounded transformer for multiview 3D referring expression segmentation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
TrackRef3D proposes a fully automatic multi-view consistent track-then-label method for open-world referring segmentation in 3D Gaussian Splatting using TSCM, visibility-aware descriptions, and hybrid contrastive training.
citing papers explorer
-
Grounding by Remembering: Cross-Scene and In-Scene Memory for 3D Functional Affordances
AFFORDMEM improves AP50 by 3.23-3.7 points on SceneFun3D splits by using a reusable cross-scene affordance memory bank and in-scene spatial memory to guide VLMs toward actionable 3D regions.
-
TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting
TrackRef3D proposes a fully automatic multi-view consistent track-then-label method for open-world referring segmentation in 3D Gaussian Splatting using TSCM, visibility-aware descriptions, and hybrid contrastive training.