Sam3d: Segment anything in 3d scenes

Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu · 2023 · arXiv 2306.03908

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

3AM: 3egment Anything with Geometric Consistency in Videos

cs.CV · 2026-01-13 · unverdicted · novelty 7.0

3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.

Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views

cs.CV · 2026-06-28 · unverdicted · novelty 6.0

A feed-forward framework learns instance-structured 3D token groups from unposed multi-view images via differentiable rendering, enabling native object-level segmentation, editing, and retrieval without 3D supervision.

Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

cs.CV · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

Ilov3Splat learns view-consistent CLIP and instance feature fields on 3D Gaussians to support open-vocabulary object selection and segmentation without category labels.

PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion

cs.CV · 2026-01-12 · unverdicted · novelty 6.0

PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.

ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding

cs.CV · 2025-12-03 · unverdicted · novelty 6.0

ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.

ESAM++: Efficient Online 3D Perception on the Edge

cs.CV · 2026-05-28 · unverdicted · novelty 5.0

ESAM++ introduces a 3D Sparse Feature Pyramid Network for efficient online 3D scene perception on edge devices, claiming competitive accuracy with up to 3x faster inference and 2x smaller model size than ESAM on four benchmarks.

AgentGrounder: Zero-Shot 3D Visual Pointcloud Grounding using Multimodal Language Models

cs.CV · 2026-05-25 · unverdicted · novelty 5.0

AgentGrounder performs zero-shot 3D visual grounding on colored point clouds via an offline object lookup table and an online agent that selectively retrieves, scores geometrically, and renders images on demand, reporting gains over SeeGround on ScanRefer and Nr3D.

CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

CAR-SAM introduces MatMul-Aware Compensation and Joint Cross-Attention Reconstruction to enable stable 4-bit post-training quantization of SAM, outperforming prior PTQ methods by 14.6% mAP on SAM-B and 6.6% on SAM-L.

Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation

cs.CV · 2026-05-08 · unverdicted · novelty 5.0 · 2 refs

DDS combines multi-granularity distillation from projected 2D features with graph diffusion on superpoints to deliver region-consistent semantic labels for 3D scenes without any dense annotations.

MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.

GraspSense: Physically Grounded Grasp and Grip Planning for a Dexterous Robotic Hand via Language-Guided Perception and Force Maps

cs.RO · 2026-04-07 · unverdicted · novelty 4.0

GraspSense computes force maps from object geometry to select mechanically safe grasp regions and regulate grip forces for dexterous hands.

citing papers explorer

Showing 11 of 11 citing papers after filters.

3AM: 3egment Anything with Geometric Consistency in Videos cs.CV · 2026-01-13 · unverdicted · none · ref 102
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views cs.CV · 2026-06-28 · unverdicted · none · ref 29
A feed-forward framework learns instance-structured 3D token groups from unposed multi-view images via differentiable rendering, enabling native object-level segmentation, editing, and retrieval without 3D supervision.
Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting cs.CV · 2026-05-06 · unverdicted · none · ref 25 · 2 links
Ilov3Splat learns view-consistent CLIP and instance feature fields on 3D Gaussians to support open-vocabulary object selection and segmentation without category labels.
PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion cs.CV · 2026-01-12 · unverdicted · none · ref 33
PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding cs.CV · 2025-12-03 · unverdicted · none · ref 84
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.
ESAM++: Efficient Online 3D Perception on the Edge cs.CV · 2026-05-28 · unverdicted · none · ref 38
ESAM++ introduces a 3D Sparse Feature Pyramid Network for efficient online 3D scene perception on edge devices, claiming competitive accuracy with up to 3x faster inference and 2x smaller model size than ESAM on four benchmarks.
AgentGrounder: Zero-Shot 3D Visual Pointcloud Grounding using Multimodal Language Models cs.CV · 2026-05-25 · unverdicted · none · ref 17
AgentGrounder performs zero-shot 3D visual grounding on colored point clouds via an offline object lookup table and an online agent that selectively retrieves, scores geometrically, and renders images on demand, reporting gains over SeeGround on ScanRefer and Nr3D.
CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model cs.CV · 2026-05-16 · unverdicted · none · ref 23
CAR-SAM introduces MatMul-Aware Compensation and Joint Cross-Attention Reconstruction to enable stable 4-bit post-training quantization of SAM, outperforming prior PTQ methods by 14.6% mAP on SAM-B and 6.6% on SAM-L.
Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation cs.CV · 2026-05-08 · unverdicted · none · ref 12 · 2 links
DDS combines multi-granularity distillation from projected 2D features with graph diffusion on superpoints to deliver region-consistent semantic labels for 3D scenes without any dense annotations.
MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation cs.CV · 2026-04-10 · unverdicted · none · ref 61
MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.
GraspSense: Physically Grounded Grasp and Grip Planning for a Dexterous Robotic Hand via Language-Guided Perception and Force Maps cs.RO · 2026-04-07 · unverdicted · none · ref 17
GraspSense computes force maps from object geometry to select mechanically safe grasp regions and regulate grip forces for dexterous hands.

Sam3d: Segment anything in 3d scenes

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer