Sam3d: Segment anything in 3d scenes

Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu · 2023 · arXiv 2306.03908

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

3AM: 3egment Anything with Geometric Consistency in Videos

cs.CV · 2026-01-13 · unverdicted · novelty 7.0

3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.

Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

cs.CV · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

Ilov3Splat learns view-consistent CLIP and instance feature fields on 3D Gaussians to support open-vocabulary object selection and segmentation without category labels.

PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion

cs.CV · 2026-01-12 · unverdicted · novelty 6.0

PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.

ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding

cs.CV · 2025-12-03 · unverdicted · novelty 6.0

ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.

CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

CAR-SAM introduces MatMul-Aware Compensation and Joint Cross-Attention Reconstruction to enable stable 4-bit post-training quantization of SAM, outperforming prior PTQ methods by 14.6% mAP on SAM-B and 6.6% on SAM-L.

Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation

cs.CV · 2026-05-08 · unverdicted · novelty 5.0 · 2 refs

DDS combines multi-granularity distillation from projected 2D features with graph diffusion on superpoints to deliver region-consistent semantic labels for 3D scenes without any dense annotations.

MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.

GraspSense: Physically Grounded Grasp and Grip Planning for a Dexterous Robotic Hand via Language-Guided Perception and Force Maps

cs.RO · 2026-04-07 · unverdicted · novelty 4.0

GraspSense computes force maps from object geometry to select mechanically safe grasp regions and regulate grip forces for dexterous hands.

citing papers explorer

Showing 8 of 8 citing papers.

3AM: 3egment Anything with Geometric Consistency in Videos cs.CV · 2026-01-13 · unverdicted · none · ref 102
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting cs.CV · 2026-05-06 · unverdicted · none · ref 25 · 2 links
Ilov3Splat learns view-consistent CLIP and instance feature fields on 3D Gaussians to support open-vocabulary object selection and segmentation without category labels.
PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion cs.CV · 2026-01-12 · unverdicted · none · ref 33
PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding cs.CV · 2025-12-03 · unverdicted · none · ref 84
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.
CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model cs.CV · 2026-05-16 · unverdicted · none · ref 23
CAR-SAM introduces MatMul-Aware Compensation and Joint Cross-Attention Reconstruction to enable stable 4-bit post-training quantization of SAM, outperforming prior PTQ methods by 14.6% mAP on SAM-B and 6.6% on SAM-L.
Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation cs.CV · 2026-05-08 · unverdicted · none · ref 12 · 2 links
DDS combines multi-granularity distillation from projected 2D features with graph diffusion on superpoints to deliver region-consistent semantic labels for 3D scenes without any dense annotations.
MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation cs.CV · 2026-04-10 · unverdicted · none · ref 61
MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.
GraspSense: Physically Grounded Grasp and Grip Planning for a Dexterous Robotic Hand via Language-Guided Perception and Force Maps cs.RO · 2026-04-07 · unverdicted · none · ref 17
GraspSense computes force maps from object geometry to select mechanically safe grasp regions and regulate grip forces for dexterous hands.

Sam3d: Segment anything in 3d scenes

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer