Sclip: Rethinking self- attention for dense vision-language inference,

· 2023 · arXiv 2312.01597

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Vision Harnessing Agent for Open Ad-hoc Segmentation

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

VASA is a vision-guided agent for open ad-hoc segmentation that creates and validates masks through planning, tool use, and error recovery, outperforming baselines on the new PARS benchmark and RefCOCOm.

Best Segmentation Buddies for Image-Shape Correspondence

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

The work defines Best Segmentation Buddies as vertices on a 3D shape whose nearest image pixel under distilled features falls inside a given 2D segment, then uses the same features to segment the shape in 3D.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

cs.CV · 2024-03-14 · unverdicted · novelty 6.0

MM1 models achieve state-of-the-art few-shot multimodal results by pre-training on a careful mix of image-caption, interleaved, and text-only data with optimized image encoders.

SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing Images

cs.CV · 2025-12-09 · unverdicted · novelty 5.0

SAM 3 can be applied training-free to remote sensing open-vocabulary segmentation and change detection by fusing its semantic and instance heads and filtering with presence scores.

TeD-Loc: Text Distillation for Weakly Supervised Object Localization

cs.CV · 2025-01-22 · unverdicted · novelty 5.0

TeD-Loc improves weakly supervised object localization by distilling CLIP text embeddings to patch embeddings through contrastive alignment plus a localization-guided classifier and QR orthogonalization of text embeddings.

citing papers explorer

Showing 5 of 5 citing papers.

Vision Harnessing Agent for Open Ad-hoc Segmentation cs.CV · 2026-05-19 · unverdicted · none · ref 24
VASA is a vision-guided agent for open ad-hoc segmentation that creates and validates masks through planning, tool use, and error recovery, outperforming baselines on the new PARS benchmark and RefCOCOm.
Best Segmentation Buddies for Image-Shape Correspondence cs.CV · 2026-05-18 · unverdicted · none · ref 57
The work defines Best Segmentation Buddies as vertices on a 3D shape whose nearest image pixel under distilled features falls inside a given 2D segment, then uses the same features to segment the shape in 3D.
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training cs.CV · 2024-03-14 · unverdicted · none · ref 112
MM1 models achieve state-of-the-art few-shot multimodal results by pre-training on a careful mix of image-caption, interleaved, and text-only data with optimized image encoders.
SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing Images cs.CV · 2025-12-09 · unverdicted · none · ref 55
SAM 3 can be applied training-free to remote sensing open-vocabulary segmentation and change detection by fusing its semantic and instance heads and filtering with presence scores.
TeD-Loc: Text Distillation for Weakly Supervised Object Localization cs.CV · 2025-01-22 · unverdicted · none · ref 24
TeD-Loc improves weakly supervised object localization by distilling CLIP text embeddings to patch embeddings through contrastive alignment plus a localization-guided classifier and QR orthogonalization of text embeddings.

Sclip: Rethinking self- attention for dense vision-language inference,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer