Harnessing vision foundation models for high-performance, training- free open vocabulary segmentation

Yuheng Shi, Minjing Dong, Chang Xu · 2024 · arXiv 2411.09219

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation

cs.CV · 2026-05-17 · unverdicted · novelty 6.0 · 2 refs

SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.

FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

FreeOcc enables training-free open-vocabulary 3D occupancy prediction from RGB-D sequences by combining SLAM, dense Gaussian maps, off-the-shelf vision-language models, and probabilistic projection, achieving over 2x gains on benchmarks and zero-shot transfer to novel scenes.

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

RADIO-ViPE performs online open-vocabulary semantic SLAM directly from monocular RGB video in dynamic environments by tightly coupling vision-language embeddings from foundation models with geometric factor-graph optimization using adaptive robust kernels.

Cross-Attentive Multiview Fusion of Vision-Language Embeddings

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

CAMFusion fuses multiview 2D vision-language embeddings via cross-attention and multiview consistency self-supervision to produce better 3D semantic and instance representations, outperforming averaging and reaching SOTA on benchmarks including zero-shot out-of-domain cases.

Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes

cs.CV · 2026-02-26 · unverdicted · novelty 6.0

A 3D Language-Embedded Gaussians framework with opacity-aware Poisson volumetric aggregation and progressive temperature decay achieves 59.50 IoU and 21.05 mIoU on Occ-ScanNet for open-vocabulary indoor occupancy.

RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models

cs.CV · 2025-11-24 · unverdicted · novelty 6.0

RADSeg adapts the RADIO model with targeted enhancements to deliver 6-30% higher mIoU in zero-shot OVSS while using 2.5x fewer parameters and running 3.95x faster than prior large-model combinations.

citing papers explorer

Showing 6 of 6 citing papers.

SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation cs.CV · 2026-05-17 · unverdicted · none · ref 80 · 2 links
SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.
FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction cs.RO · 2026-04-30 · unverdicted · none · ref 62
FreeOcc enables training-free open-vocabulary 3D occupancy prediction from RGB-D sequences by combining SLAM, dense Gaussian maps, off-the-shelf vision-language models, and probabilistic projection, achieving over 2x gains on benchmarks and zero-shot transfer to novel scenes.
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments cs.CV · 2026-04-28 · unverdicted · none · ref 40
RADIO-ViPE performs online open-vocabulary semantic SLAM directly from monocular RGB video in dynamic environments by tightly coupling vision-language embeddings from foundation models with geometric factor-graph optimization using adaptive robust kernels.
Cross-Attentive Multiview Fusion of Vision-Language Embeddings cs.CV · 2026-04-14 · unverdicted · none · ref 31
CAMFusion fuses multiview 2D vision-language embeddings via cross-attention and multiview consistency self-supervision to produce better 3D semantic and instance representations, outperforming averaging and reaching SOTA on benchmarks including zero-shot out-of-domain cases.
Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes cs.CV · 2026-02-26 · unverdicted · none · ref 35
A 3D Language-Embedded Gaussians framework with opacity-aware Poisson volumetric aggregation and progressive temperature decay achieves 59.50 IoU and 21.05 mIoU on Occ-ScanNet for open-vocabulary indoor occupancy.
RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models cs.CV · 2025-11-24 · unverdicted · none · ref 36
RADSeg adapts the RADIO model with targeted enhancements to deliver 6-30% higher mIoU in zero-shot OVSS while using 2.5x fewer parameters and running 3.95x faster than prior large-model combinations.

Harnessing vision foundation models for high-performance, training- free open vocabulary segmentation

fields

years

verdicts

representative citing papers

citing papers explorer