Lgd: Leveraging generative descriptions for zero-shot referring image segmentation

Li, J · 2025 · arXiv 2504.14467

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.

Learning to Label: A Reinforced Self-Evolving Framework for Semi-supervised Referring Expression Segmentation

cs.CV · 2026-05-27 · unverdicted · novelty 5.0

A reinforced self-evolving framework (L2L) for semi-supervised referring expression segmentation that jointly optimizes the segmentation model and pseudo-labels using multimodal priors and adaptive selection.

ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization

cs.CV · 2026-05-29 · unverdicted · novelty 4.0

ConTrans is a multi-scale encoder fusing convolutional inductive biases with transformer self-attention for improved local-global features in zero-shot temporal action localization.

citing papers explorer

Showing 3 of 3 citing papers.

SAM 3: Segment Anything with Concepts cs.CV · 2025-11-20 · unverdicted · none · ref 68
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
Learning to Label: A Reinforced Self-Evolving Framework for Semi-supervised Referring Expression Segmentation cs.CV · 2026-05-27 · unverdicted · none · ref 4
A reinforced self-evolving framework (L2L) for semi-supervised referring expression segmentation that jointly optimizes the segmentation model and pseudo-labels using multimodal priors and adaptive selection.
ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization cs.CV · 2026-05-29 · unverdicted · none · ref 8
ConTrans is a multi-scale encoder fusing convolutional inductive biases with transformer self-attention for improved local-global features in zero-shot temporal action localization.

Lgd: Leveraging generative descriptions for zero-shot referring image segmentation

fields

years

verdicts

representative citing papers

citing papers explorer