Semantic-sam: Segment and recognize anything at any granularity

Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao · 2023 · arXiv 2307.04767

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

COCOTree: A Dataset and Benchmark for Open Tree-Structured Visual Decomposition

cs.CV · 2026-05-21 · unverdicted · novelty 7.0

COCOTree is a 21K-image benchmark with 1.8M nodes and an OTQ metric for the new task of open tree-structured visual decomposition.

Vision Harnessing Agent for Open Ad-hoc Segmentation

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

VASA is a vision-guided agent for open ad-hoc segmentation that creates and validates masks through planning, tool use, and error recovery, outperforming baselines on the new PARS benchmark and RefCOCOm.

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement

cs.CV · 2024-11-22 · unverdicted · novelty 7.0

VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

cs.CV · 2023-10-17 · accept · novelty 7.0

Set-of-Mark prompting marks segmented image regions with alphanumerics and masks to let GPT-4V achieve state-of-the-art zero-shot results on referring expression comprehension and segmentation benchmarks like RefCOCOg.

Amodal SAM: A Unified Amodal Segmentation Framework with Generalization

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Amodal SAM extends SAM with a Spatial Completion Adapter, Target-Aware Occlusion Synthesis for data, and consistency losses to reach SOTA amodal segmentation with strong generalization to new objects and scenes.

MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation

cs.CV · 2026-04-10 · unverdicted · novelty 5.0

MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.

Personalization Toolkit: Training Free Personalization of Large Vision Language Models

cs.CV · 2025-02-04 · unverdicted · novelty 5.0

Presents a training-free personalization toolkit for LVLMs that extracts features via vision foundation models, applies RAG for instance retrieval, and uses visual prompting for multi-concept adaptation on images and videos, claiming SOTA results on a new real-world benchmark.

UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning

cs.CV · 2026-05-05 · unverdicted · novelty 4.0

UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.

citing papers explorer

Showing 8 of 8 citing papers.

COCOTree: A Dataset and Benchmark for Open Tree-Structured Visual Decomposition cs.CV · 2026-05-21 · unverdicted · none · ref 14
COCOTree is a 21K-image benchmark with 1.8M nodes and an OTQ metric for the new task of open tree-structured visual decomposition.
Vision Harnessing Agent for Open Ad-hoc Segmentation cs.CV · 2026-05-19 · unverdicted · none · ref 64
VASA is a vision-guided agent for open ad-hoc segmentation that creates and validates masks through planning, tool use, and error recovery, outperforming baselines on the new PARS benchmark and RefCOCOm.
Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement cs.CV · 2024-11-22 · unverdicted · none · ref 16
VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V cs.CV · 2023-10-17 · accept · none · ref 22
Set-of-Mark prompting marks segmented image regions with alphanumerics and masks to let GPT-4V achieve state-of-the-art zero-shot results on referring expression comprehension and segmentation benchmarks like RefCOCOg.
Amodal SAM: A Unified Amodal Segmentation Framework with Generalization cs.CV · 2026-04-22 · unverdicted · none · ref 15
Amodal SAM extends SAM with a Spatial Completion Adapter, Target-Aware Occlusion Synthesis for data, and consistency losses to reach SOTA amodal segmentation with strong generalization to new objects and scenes.
MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation cs.CV · 2026-04-10 · unverdicted · none · ref 29
MV3DIS uses 3D-guided mask matching and depth consistency to produce more consistent multi-view 2D masks that refine into accurate zero-shot 3D instances.
Personalization Toolkit: Training Free Personalization of Large Vision Language Models cs.CV · 2025-02-04 · unverdicted · none · ref 13
Presents a training-free personalization toolkit for LVLMs that extracts features via vision foundation models, applies RAG for instance retrieval, and uses visual prompting for multi-concept adaptation on images and videos, claiming SOTA results on a new real-world benchmark.
UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning cs.CV · 2026-05-05 · unverdicted · none · ref 16
UnAC improves LMM performance on visual reasoning benchmarks by combining adaptive visual prompting, image abstraction, and gradual self-checking.

Semantic-sam: Segment and recognize anything at any granularity

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer