OpenSGA fuses vision-language, textual, and geometric features via a distance-gated attention encoder and minimum-cost-flow allocator to outperform prior methods on both frame-to-scan and subscan-to-subscan 3D scene graph alignment, backed by a new 700k-sample ScanNet-SG dataset.
hub
Fast segment anything
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
LAGO achieves state-of-the-art zero-shot performance with fewer image regions by using class-agnostic object discovery followed by confidence-controlled language-guided refinement and dual-channel aggregation.
Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.
Boxes2Pixels distills noisy SAM pseudo-masks into a compact DINOv2-based student with auxiliary localization and one-sided self-correction, delivering +6.97 anomaly mIoU and +9.71 binary IoU gains over baselines on wind turbine data with 80% fewer parameters.
OmniOVCD uses SAM 3's decoupled outputs and an SFID strategy to achieve state-of-the-art IoU scores of 67.2, 66.5, 24.5, and 27.1 on four OVCD benchmarks, surpassing prior methods.
StateScribe uses a dual-layer memory architecture for episodic scenes and object-centric changes to deliver live and historical descriptions, achieving 83.1% F1 accuracy across revisits in evaluations and user studies with BLV participants.
GRAIL autonomously grounds relational concepts in NeSy-RL by using LLM weak supervision followed by interaction-based refinement, matching or exceeding manually defined concepts on Atari games.
H-SPAM produces accurate, regular, and perfectly nested hierarchical superpixels that outperform prior hierarchical methods and match recent non-hierarchical state-of-the-art.
A deformable soft conical hand is modeled in physics simulation and its scooping trajectories are optimized via evolutionary search, enabling effective contact-rich granular tasks validated in both simulation and physical robot experiments.
AIM-CoT enhances interleaved multimodal chain-of-thought reasoning by adding context-enhanced attention generation, active visual probing via information foraging, and dynamic attention-shift triggering.
Terra produces a lightweight task-agnostic metric-semantic 3D scene graph for outdoor environments using terrain-aware place nodes and hierarchically organized regions.
FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.
A scale-robust lightweight CNN for glottis segmentation achieves 92.9% mDice at over 170 FPS with a 19 MB model size on three datasets.
Permutation-COMQ is a new post-training quantization algorithm that reorders weights within layers and uses only dot-product and rounding steps to deliver the highest reported accuracy for 2-, 4-, and 8-bit medical foundation models.
MobileSAM is a 60x smaller distilled version of SAM that matches original performance and runs 5x faster than concurrent FastSAM while supporting CPU inference.
Semantic-Fast-SAM matches prior SAM-based semantic segmentation accuracy on Cityscapes and ADE20K while running about 20 times faster by combining FastSAM with SSA labeling and CLIP for open-vocabulary cases.
citing papers explorer
-
OpenSGA: Efficient 3D Scene Graph Alignment in the Open World
OpenSGA fuses vision-language, textual, and geometric features via a distance-gated attention encoder and minimum-cost-flow allocator to outperform prior methods on both frame-to-scan and subscan-to-subscan 3D scene graph alignment, backed by a new 700k-sample ScanNet-SG dataset.
-
LAGO: Language-Guided Adaptive Object-Region Focus for Zero-Shot Visual-Text Alignment
LAGO achieves state-of-the-art zero-shot performance with fewer image regions by using class-agnostic object discovery followed by confidence-controlled language-guided refinement and dual-channel aggregation.
-
Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection
Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.
-
Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks
Boxes2Pixels distills noisy SAM pseudo-masks into a compact DINOv2-based student with auxiliary localization and one-sided self-correction, delivering +6.97 anomaly mIoU and +9.71 binary IoU gains over baselines on wind turbine data with 80% fewer parameters.
-
OmniOVCD: Streamlining Open-Vocabulary Change Detection with SAM 3
OmniOVCD uses SAM 3's decoupled outputs and an SFID strategy to achieve state-of-the-art IoU scores of 67.2, 66.5, 24.5, and 27.1 on four OVCD benchmarks, surpassing prior methods.
-
StateScribe: Towards Accessible Change Awareness Across Real-World Revisits
StateScribe uses a dual-layer memory architecture for episodic scenes and object-centric changes to deliver live and historical descriptions, achieving 83.1% F1 accuracy across revisits in evaluations and user studies with BLV participants.
-
GRAIL: Autonomous Concept Grounding for Neuro-Symbolic Reinforcement Learning
GRAIL autonomously grounds relational concepts in NeSy-RL by using LLM weak supervision followed by interaction-based refinement, matching or exceeding manually defined concepts on Atari games.
-
H-SPAM: Hierarchical Superpixel Anything Model
H-SPAM produces accurate, regular, and perfectly nested hierarchical superpixels that outperform prior hierarchical methods and match recent non-hierarchical state-of-the-art.
-
Simulation-Driven Evolutionary Motion Parameterization for Contact-Rich Granular Scooping with a Soft Conical Robotic Hand
A deformable soft conical hand is modeled in physics simulation and its scooping trajectories are optimized via evolutionary search, enabling effective contact-rich granular tasks validated in both simulation and physical robot experiments.
-
AIM-CoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning
AIM-CoT enhances interleaved multimodal chain-of-thought reasoning by adding context-enhanced attention generation, active visual probing via information foraging, and dynamic attention-shift triggering.
-
Terra: Hierarchical Terrain-Aware 3D Scene Graph for Task-Agnostic Outdoor Mapping
Terra produces a lightweight task-agnostic metric-semantic 3D scene graph for outdoor environments using terrain-aware place nodes and hierarchically organized regions.
-
FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers
FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.
-
A Real-time Scale-robust Network for Glottis Segmentation in Nasal Transnasal Intubation
A scale-robust lightweight CNN for glottis segmentation achieves 92.9% mDice at over 170 FPS with a 19 MB model size on three datasets.
-
Weight Group-wise Post-Training Quantization for Medical Foundation Model
Permutation-COMQ is a new post-training quantization algorithm that reorders weights within layers and uses only dot-product and rounding steps to deliver the highest reported accuracy for 2-, 4-, and 8-bit medical foundation models.
-
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications
MobileSAM is a 60x smaller distilled version of SAM that matches original performance and runs 5x faster than concurrent FastSAM while supporting CPU inference.
-
Semantic-Fast-SAM: Efficient Semantic Segmenter
Semantic-Fast-SAM matches prior SAM-based semantic segmentation accuracy on Cityscapes and ADE20K while running about 20 times faster by combining FastSAM with SSA labeling and CLIP for open-vocabulary cases.