3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
Sam3d: Segment anything in 3d scenes
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 1polarities
background 1representative citing papers
Ilov3Splat learns view-consistent CLIP and instance feature fields on 3D Gaussians to support open-vocabulary object selection and segmentation without category labels.
PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.
CAR-SAM introduces MatMul-Aware Compensation and Joint Cross-Attention Reconstruction to enable stable 4-bit post-training quantization of SAM, outperforming prior PTQ methods by 14.6% mAP on SAM-B and 6.6% on SAM-L.
DDS combines multi-granularity distillation from projected 2D features with graph diffusion on superpoints to deliver region-consistent semantic labels for 3D scenes without any dense annotations.
MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.
GraspSense computes force maps from object geometry to select mechanically safe grasp regions and regulate grip forces for dexterous hands.
citing papers explorer
-
3AM: 3egment Anything with Geometric Consistency in Videos
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
-
Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting
Ilov3Splat learns view-consistent CLIP and instance feature fields on 3D Gaussians to support open-vocabulary object selection and segmentation without category labels.
-
PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion
PanoSAMic modifies SAM with multi-stage feature encoding, spatio-modal fusion, spherical attention, and dual-view fusion to achieve SOTA panoramic semantic segmentation on public RGB and RGB-D datasets.
-
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding
ShelfGaussian achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes by jointly supervising Gaussian representations with vision foundation model features at 2D image and 3D scene levels.
-
CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model
CAR-SAM introduces MatMul-Aware Compensation and Joint Cross-Attention Reconstruction to enable stable 4-bit post-training quantization of SAM, outperforming prior PTQ methods by 14.6% mAP on SAM-B and 6.6% on SAM-L.
-
Distill, Diffuse, and Semanticize (DDS): Annotation-Free 3D Scene Understanding Based on Multi-Granularity Distillation and Graph-Diffusion-Based Segmentation
DDS combines multi-granularity distillation from projected 2D features with graph diffusion on superpoints to deliver region-consistent semantic labels for 3D scenes without any dense annotations.
-
MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation
MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.
-
GraspSense: Physically Grounded Grasp and Grip Planning for a Dexterous Robotic Hand via Language-Guided Perception and Force Maps
GraspSense computes force maps from object geometry to select mechanically safe grasp regions and regulate grip forces for dexterous hands.