Few-Shot Semantic Segmentation Meets SAM3

arxiv: 2604.05433 · v1 · submitted 2026-04-07 · 💻 cs.CV

Few-Shot Semantic Segmentation Meets SAM3

Yi-Jen Tsai , Yen-Yu Lin , Chien-Yao Wang This is my paper

Pith reviewed 2026-05-10 18:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords few-shot semantic segmentationSAM3training-freespatial concatenationpromptable concept segmentationPASCAL-5iCOCO-20i

0 comments p. Extension

The pith

A fully frozen SAM3 performs few-shot semantic segmentation at state-of-the-art levels by concatenating support and query images on a shared canvas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that few-shot semantic segmentation does not require extensive training when using modern foundation models like SAM3. By placing a few annotated support images next to the query image in one canvas, the pre-trained model can directly segment the novel object class. This approach avoids fine-tuning and architectural modifications while outperforming many trained methods on standard benchmarks like PASCAL-5i and COCO-20i. It also finds that negative prompts, meant to suppress distractors, actually weaken the target and cause collapse in this setting.

Core claim

By repurposing its Promptable Concept Segmentation capability, a simple spatial concatenation strategy that places support and query images into a shared canvas allows a fully frozen SAM3 to perform segmentation without any fine-tuning or architectural changes, achieving state-of-the-art performance on PASCAL-5^i and COCO-20^i.

What carries the argument

The spatial concatenation of support and query images on a shared canvas that enables SAM3's pre-trained Promptable Concept Segmentation to handle few-shot tasks.

If this is right

This minimal design already achieves state-of-the-art performance on PASCAL-5^i and COCO-20^i, outperforming many heavily engineered methods.
Negative prompts can be counterproductive in few-shot settings, where they often weaken target representations and lead to prediction collapse.
Strong cross-image reasoning can emerge from simple spatial formulations.
The approach highlights limitations in how current foundation models handle conflicting prompt signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Foundation models may possess latent abilities for cross-image comparison that simple input formatting can activate without further training.
This approach could extend to other dense prediction tasks where support-query pairing is feasible.
Future model designs might incorporate explicit mechanisms to manage mixed positive and negative prompts more stably.

Load-bearing premise

That SAM3's pre-trained Promptable Concept Segmentation capability will reliably transfer to the few-shot setting when support and query images are simply placed side-by-side on one canvas.

What would settle it

Running the spatial concatenation method on PASCAL-5^i and COCO-20^i and checking whether segmentation accuracy exceeds that of heavily trained competitors or drops sharply when concatenation is removed would confirm or refute the central claim.

Figures

Figures reproduced from arXiv: 2604.05433 by Chien-Yao Wang, Yen-Yu Lin, Yi-Jen Tsai.

**Figure 1.** Figure 1: Pipeline of our SAM3-based FSS framework. By combining instance-aware positive prompts with a unified spatial formulation, our method enables the fully frozen SAM3 to perform implicit cross-image feature matching in a single forward pass without architectural modifications. 3 Methodology In this work, we investigate how the large vision foundation model, SAM3, can be adapted to the Few-Shot Segmentation (F… view at source ↗

**Figure 2.** Figure 2: visualization of 1-shot/5-shot prediction on PASCAL- [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative results of negative prompt interference in SAM3. (a) shows how adding a [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Few-Shot Semantic Segmentation (FSS) focuses on segmenting novel object categories from only a handful of annotated examples. Most existing approaches rely on extensive episodic training to learn transferable representations, which is both computationally demanding and sensitive to distribution shifts. In this work, we revisit FSS from the perspective of modern vision foundation models and explore the potential of Segment Anything Model 3 (SAM3) as a training-free solution. By repurposing its Promptable Concept Segmentation (PCS) capability, we adopt a simple spatial concatenation strategy that places support and query images into a shared canvas, allowing a fully frozen SAM3 to perform segmentation without any fine-tuning or architectural changes. Experiments on PASCAL-$5^i$ and COCO-$20^i$ show that this minimal design already achieves state-of-the-art performance, outperforming many heavily engineered methods. Beyond empirical gains, we uncover that negative prompts can be counterproductive in few-shot settings, where they often weaken target representations and lead to prediction collapse despite their intended role in suppressing distractors. These findings suggest that strong cross-image reasoning can emerge from simple spatial formulations, while also highlighting limitations in how current foundation models handle conflicting prompt signals. Code at: https://github.com/WongKinYiu/FSS-SAM3

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A frozen SAM3 does few-shot segmentation via simple spatial concatenation of support and query images, with reported SOTA results and a note on negative prompts backfiring.

read the letter

The main point is that this paper shows you can take SAM3 off the shelf, put the support image and query image side by side on one canvas, and get it to segment novel classes without any training or model changes. That spatial trick repurposes the model's existing promptable concept segmentation to handle the few-shot case directly. It is a minimal approach compared to the usual episodic training pipelines in this area. The experiments claim it beats many trained methods on PASCAL-5^i and COCO-20^i, and the authors flag that negative prompts often weaken results instead of helping suppress distractors. Both the simplicity and the prompt observation are useful to see in one place. The work is new in its concrete application of SAM3 to FSS this way; prior literature on few-shot segmentation does not describe this exact spatial recipe. It does well by keeping everything frozen and letting the layout carry the association between support and query. That lowers the barrier for using large models on low-data tasks. The soft spots are around verification of the cross-image transfer. The central assumption is that side-by-side placement will make the model reliably treat the support mask as an exemplar for unseen categories on the query side. If the full results include solid ablations on canvas layout, prompt formatting, and failure cases across classes, that helps. Without those details the gains could be narrower than claimed or sensitive to unstated choices. The SOTA numbers also need checking against the strongest recent baselines with proper stats. This is for researchers working on foundation model adaptation or efficient few-shot vision methods. A reader who wants practical ways to avoid heavy training on segmentation tasks will find value in the minimal design. It deserves a serious referee because the idea is straightforward, the benchmarks are standard, and the empirical claims are falsifiable if the code and tables hold up. I would send it for peer review to get the details and limitations clarified.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a training-free few-shot semantic segmentation method that repurposes SAM3's Promptable Concept Segmentation (PCS) capability via a simple spatial concatenation of support and query images onto a shared canvas. A fully frozen SAM3 then performs segmentation on novel classes without fine-tuning or architectural modifications. The work reports state-of-the-art results on PASCAL-5^i and COCO-20^i, and additionally observes that negative prompts often cause prediction collapse rather than suppressing distractors in this setting.

Significance. If the empirical results hold under scrutiny, the work is significant for demonstrating that cross-image concept association can emerge from minimal spatial reformulations in large vision foundation models, offering a strong, low-effort baseline that challenges the necessity of episodic training in few-shot segmentation. The public code release supports reproducibility and enables direct verification of the minimal design.

major comments (3)

[Method] Method section: The description of the spatial concatenation strategy does not isolate or ablate whether the side-by-side placement itself induces reliable cross-image concept transfer in PCS for unseen categories, versus reliance on unstated prompt formatting details. This is load-bearing for the central claim, as the skeptic correctly notes that if PCS was primarily trained on intra-image scenarios, simple concatenation may lead to ignored support regions or collapsed outputs, consistent with the paper's own negative-prompt observations.
[Experiments] Experiments section: The SOTA claim on PASCAL-5^i and COCO-20^i requires explicit tables comparing against recent baselines, including mean IoU with standard deviations, ablation on concatenation variants (e.g., different spatial arrangements or mask encodings), and confirmation that no hidden prompt engineering or post-processing was used. Without these, the outperformance over heavily engineered methods cannot be fully assessed.
[Experiments] The observation that negative prompts weaken target representations is interesting but lacks quantitative support, such as direct performance deltas with/without negative prompts across the benchmarks. This undermines the broader claim about limitations in handling conflicting prompt signals.

minor comments (2)

Ensure consistent notation for benchmarks (PASCAL-5^i vs. PASCAL-5i) throughout the text and tables.
[Abstract] The abstract mentions 'state-of-the-art performance' but the full manuscript should explicitly state the exact number of shots (e.g., 1-shot, 5-shot) and support/query splits used in the reported results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We have revised the manuscript to address all major comments by expanding the method description, adding required experimental tables and ablations, and providing quantitative support for the negative prompt observations. These changes strengthen the paper without altering its core claims.

read point-by-point responses

Referee: [Method] Method section: The description of the spatial concatenation strategy does not isolate or ablate whether the side-by-side placement itself induces reliable cross-image concept transfer in PCS for unseen categories, versus reliance on unstated prompt formatting details. This is load-bearing for the central claim, as the skeptic correctly notes that if PCS was primarily trained on intra-image scenarios, simple concatenation may lead to ignored support regions or collapsed outputs, consistent with the paper's own negative-prompt observations.

Authors: We agree that isolating the contribution of spatial concatenation is essential. In the revised manuscript, we have expanded the Method section with a precise, step-by-step description of the concatenation procedure and the exact prompt formatting provided to SAM3. We have also added ablation studies comparing side-by-side placement against alternative spatial arrangements (e.g., vertical stacking, overlapping) and different mask encodings. These results demonstrate that reliable cross-image concept transfer emerges specifically from the side-by-side layout in the frozen PCS module, independent of prompt phrasing details. We further clarify that while PCS was trained primarily on intra-image scenarios, the empirical transfer observed is enabled by the shared canvas reformulation, as evidenced by the consistent performance gains. revision: yes
Referee: [Experiments] Experiments section: The SOTA claim on PASCAL-5^i and COCO-20^i requires explicit tables comparing against recent baselines, including mean IoU with standard deviations, ablation on concatenation variants (e.g., different spatial arrangements or mask encodings), and confirmation that no hidden prompt engineering or post-processing was used. Without these, the outperformance over heavily engineered methods cannot be fully assessed.

Authors: We accept that more comprehensive experimental reporting is needed to substantiate the SOTA claims. The revised Experiments section now includes explicit tables reporting mean IoU with standard deviations (computed over multiple random seeds) for both PASCAL-5^i and COCO-20^i, with direct comparisons to recent baselines. We have incorporated the requested ablations on concatenation variants and mask encodings. We also explicitly confirm in the text, supplementary material, and released code that no hidden prompt engineering or post-processing steps were used beyond the described spatial concatenation and direct application of PCS. revision: yes
Referee: [Experiments] The observation that negative prompts weaken target representations is interesting but lacks quantitative support, such as direct performance deltas with/without negative prompts across the benchmarks. This undermines the broader claim about limitations in handling conflicting prompt signals.

Authors: We thank the referee for this suggestion. The revised manuscript now includes quantitative experiments reporting mean IoU performance deltas with and without negative prompts on both PASCAL-5^i and COCO-20^i benchmarks. These results show consistent degradation and increased collapse rates when negative prompts are applied, providing direct empirical support for the claim that negative prompts can weaken target representations in this few-shot cross-image setting and highlighting limitations in handling conflicting signals. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical validation on external benchmarks

full rationale

The paper introduces a training-free spatial concatenation method to repurpose SAM3's Promptable Concept Segmentation for few-shot semantic segmentation. It reports performance on standard external benchmarks (PASCAL-5^i and COCO-20^i) without any equations, fitted parameters, derivations, or self-referential predictions. No load-bearing steps reduce to inputs by construction; the central claim rests on empirical results rather than internal redefinitions or self-citation chains. This is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach depends on the pre-existing capabilities of SAM3 and the untested assumption that spatial layout alone suffices for cross-image prompting; no new parameters or entities are introduced.

axioms (1)

domain assumption SAM3's Promptable Concept Segmentation transfers to few-shot settings via simple spatial concatenation of support and query images
This premise is required for the training-free claim to hold.

pith-pipeline@v0.9.0 · 5523 in / 1177 out tokens · 58569 ms · 2026-05-10T18:56:55.280138+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Example-Based Object Detection
cs.CV 2026-05 unverdicted novelty 4.0

EBOD integrates SAM3 with DINOv3 and LightGlue to leverage previous error examples and suppress recurring false positives and negatives without retraining.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper

[1]

SAM 3: Segment anything with concepts.International Conference on Learning Representations (ICLR), 2026

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. SAM 3: Segment anything with concepts.International Conference on Learning Representations (ICLR), 2026

work page 2026
[2]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021

work page 2021
[3]

SANSA: Unleashing the hidden semantics in SAM2 for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 2025

Claudia Cuttano, Gabriele Trivigno, Giuseppe Averta, and Carlo Masone. SANSA: Unleashing the hidden semantics in SAM2 for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[4]

Self-support few-shot semantic segmentation

Qi Fan, Wenjie Pei, Yu-Wing Tai, and Chi-Keung Tang. Self-support few-shot semantic segmentation. InEuropean Conference on Computer Vision (ECCV), pages 701–719. Springer, 2022

work page 2022
[5]

Learning to prompt segment any- thing models.arXiv preprint arXiv:2401.04651, 2024

Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu, and Eric Xing. Learning to prompt segment anything models.arXiv preprint arXiv:2401.04651, 2024

work page arXiv 2024
[6]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023

work page 2023
[7]

Learning what not to segment: A new perspective on few-shot segmentation

Chunbo Lang, Gong Cheng, Binfei Tu, and Junwei Han. Learning what not to segment: A new perspective on few-shot segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8057–8067, 2022

work page 2022
[8]

Matcher: Segment anything with one shot using all-purpose feature matching.International Conference on Learning Representations (ICLR), 2024

Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, and Chunhua Shen. Matcher: Segment anything with one shot using all-purpose feature matching.International Conference on Learning Representations (ICLR), 2024

work page 2024
[9]

Hypercorrelation squeeze for few-shot segmenta- tion

Juhong Min, Dahyun Kang, and Minsu Cho. Hypercorrelation squeeze for few-shot segmenta- tion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6941–6952, 2021

work page 2021
[10]

DINOv2: Learning robust visual features without supervision.International Conference on Learning Representations (ICLR), 2025

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.International Conference on Learning Representations (ICLR), 2025

work page 2025
[11]

Hierarchical dense correlation distillation for few-shot segmentation

Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, and Jiaya Jia. Hierarchical dense correlation distillation for few-shot segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23641–23651, 2023. 13

work page 2023
[12]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning (ICML), pages 8748–8763. PmLR, 2021

work page 2021
[13]

SAM 2: Segment anything in images and videos.International Conference on Learning Representations (ICLR), 2025

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. SAM 2: Segment anything in images and videos.International Conference on Learning Representations (ICLR), 2025

work page 2025
[14]

VRP-SAM: SAM with visual reference prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, and Zechao Li. VRP-SAM: SAM with visual reference prompt. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23565–23574, 2024

work page 2024
[15]

Prior guided feature enrichment network for few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(2):1050–1065, 2020

Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, and Jiaya Jia. Prior guided feature enrichment network for few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(2):1050–1065, 2020

work page 2020
[16]

Adaptive FSS: a novel few-shot segmentation framework via prototype enhancement

Jing Wang, Jiangyun Li, Chen Chen, Yisi Zhang, Haoran Shen, and Tianxiang Zhang. Adaptive FSS: a novel few-shot segmentation framework via prototype enhancement. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 38, pages 5463–5471, 2024

work page 2024
[17]

Focus on query: Adversarial mining transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 36:31524–31542, 2023

Yuan Wang, Naisong Luo, and Tianzhu Zhang. Focus on query: Adversarial mining transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 36:31524–31542, 2023

work page 2023
[18]

Eliminating feature ambiguity for few-shot segmentation

Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, and Rui Zhao. Eliminating feature ambiguity for few-shot segmentation. InEuropean Conference on Computer Vision (ECCV), pages 416–433. Springer, 2024

work page 2024
[19]

Hybrid mamba for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 37:73858–73883, 2024

Qianxiong Xu, Xuanyi Liu, Lanyun Zhu, Guosheng Lin, Cheng Long, Ziyue Li, and Rui Zhao. Hybrid mamba for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 37:73858–73883, 2024

work page 2024
[20]

Self-calibrated cross attention network for few-shot segmentation

Qianxiong Xu, Wenting Zhao, Guosheng Lin, and Cheng Long. Self-calibrated cross attention network for few-shot segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 655–665, 2023

work page 2023
[21]

Unlocking the power of SAM 2 for few-shot segmentation

Qianxiong Xu, Lanyun Zhu, Xuanyi Liu, Guosheng Lin, Cheng Long, Ziyue Li, and Rui Zhao. Unlocking the power of SAM 2 for few-shot segmentation. InInternational Conference on Machine Learning (ICML), 2025

work page 2025
[22]

Bridge the points: Graph- based few-shot segment anything semantically.Advances in Neural Information Processing Systems (NeurIPS), 37:33232–33261, 2024

Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Liu, and Yunchao Wei. Bridge the points: Graph- based few-shot segment anything semantically.Advances in Neural Information Processing Systems (NeurIPS), 37:33232–33261, 2024

work page 2024
[23]

Feature-proxy transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 35:6575–6588, 2022

Jian-Wei Zhang, Yifan Sun, Yi Yang, and Wei Chen. Feature-proxy transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 35:6575–6588, 2022

work page 2022
[24]

Personalize segment anything model with one shot.International Conference on Learning Representations (ICLR), 2024

Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, and Hongsheng Li. Personalize segment anything model with one shot.International Conference on Learning Representations (ICLR), 2024. 14

work page 2024

[1] [1]

SAM 3: Segment anything with concepts.International Conference on Learning Representations (ICLR), 2026

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. SAM 3: Segment anything with concepts.International Conference on Learning Representations (ICLR), 2026

work page 2026

[2] [2]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021

work page 2021

[3] [3]

SANSA: Unleashing the hidden semantics in SAM2 for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 2025

Claudia Cuttano, Gabriele Trivigno, Giuseppe Averta, and Carlo Masone. SANSA: Unleashing the hidden semantics in SAM2 for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[4] [4]

Self-support few-shot semantic segmentation

Qi Fan, Wenjie Pei, Yu-Wing Tai, and Chi-Keung Tang. Self-support few-shot semantic segmentation. InEuropean Conference on Computer Vision (ECCV), pages 701–719. Springer, 2022

work page 2022

[5] [5]

Learning to prompt segment any- thing models.arXiv preprint arXiv:2401.04651, 2024

Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu, and Eric Xing. Learning to prompt segment anything models.arXiv preprint arXiv:2401.04651, 2024

work page arXiv 2024

[6] [6]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, 2023

work page 2023

[7] [7]

Learning what not to segment: A new perspective on few-shot segmentation

Chunbo Lang, Gong Cheng, Binfei Tu, and Junwei Han. Learning what not to segment: A new perspective on few-shot segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8057–8067, 2022

work page 2022

[8] [8]

Matcher: Segment anything with one shot using all-purpose feature matching.International Conference on Learning Representations (ICLR), 2024

Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, and Chunhua Shen. Matcher: Segment anything with one shot using all-purpose feature matching.International Conference on Learning Representations (ICLR), 2024

work page 2024

[9] [9]

Hypercorrelation squeeze for few-shot segmenta- tion

Juhong Min, Dahyun Kang, and Minsu Cho. Hypercorrelation squeeze for few-shot segmenta- tion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6941–6952, 2021

work page 2021

[10] [10]

DINOv2: Learning robust visual features without supervision.International Conference on Learning Representations (ICLR), 2025

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual features without supervision.International Conference on Learning Representations (ICLR), 2025

work page 2025

[11] [11]

Hierarchical dense correlation distillation for few-shot segmentation

Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, and Jiaya Jia. Hierarchical dense correlation distillation for few-shot segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23641–23651, 2023. 13

work page 2023

[12] [12]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning (ICML), pages 8748–8763. PmLR, 2021

work page 2021

[13] [13]

SAM 2: Segment anything in images and videos.International Conference on Learning Representations (ICLR), 2025

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. SAM 2: Segment anything in images and videos.International Conference on Learning Representations (ICLR), 2025

work page 2025

[14] [14]

VRP-SAM: SAM with visual reference prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, and Zechao Li. VRP-SAM: SAM with visual reference prompt. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23565–23574, 2024

work page 2024

[15] [15]

Prior guided feature enrichment network for few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(2):1050–1065, 2020

Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, and Jiaya Jia. Prior guided feature enrichment network for few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 44(2):1050–1065, 2020

work page 2020

[16] [16]

Adaptive FSS: a novel few-shot segmentation framework via prototype enhancement

Jing Wang, Jiangyun Li, Chen Chen, Yisi Zhang, Haoran Shen, and Tianxiang Zhang. Adaptive FSS: a novel few-shot segmentation framework via prototype enhancement. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 38, pages 5463–5471, 2024

work page 2024

[17] [17]

Focus on query: Adversarial mining transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 36:31524–31542, 2023

Yuan Wang, Naisong Luo, and Tianzhu Zhang. Focus on query: Adversarial mining transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 36:31524–31542, 2023

work page 2023

[18] [18]

Eliminating feature ambiguity for few-shot segmentation

Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, and Rui Zhao. Eliminating feature ambiguity for few-shot segmentation. InEuropean Conference on Computer Vision (ECCV), pages 416–433. Springer, 2024

work page 2024

[19] [19]

Hybrid mamba for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 37:73858–73883, 2024

Qianxiong Xu, Xuanyi Liu, Lanyun Zhu, Guosheng Lin, Cheng Long, Ziyue Li, and Rui Zhao. Hybrid mamba for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 37:73858–73883, 2024

work page 2024

[20] [20]

Self-calibrated cross attention network for few-shot segmentation

Qianxiong Xu, Wenting Zhao, Guosheng Lin, and Cheng Long. Self-calibrated cross attention network for few-shot segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 655–665, 2023

work page 2023

[21] [21]

Unlocking the power of SAM 2 for few-shot segmentation

Qianxiong Xu, Lanyun Zhu, Xuanyi Liu, Guosheng Lin, Cheng Long, Ziyue Li, and Rui Zhao. Unlocking the power of SAM 2 for few-shot segmentation. InInternational Conference on Machine Learning (ICML), 2025

work page 2025

[22] [22]

Bridge the points: Graph- based few-shot segment anything semantically.Advances in Neural Information Processing Systems (NeurIPS), 37:33232–33261, 2024

Anqi Zhang, Guangyu Gao, Jianbo Jiao, Chi Liu, and Yunchao Wei. Bridge the points: Graph- based few-shot segment anything semantically.Advances in Neural Information Processing Systems (NeurIPS), 37:33232–33261, 2024

work page 2024

[23] [23]

Feature-proxy transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 35:6575–6588, 2022

Jian-Wei Zhang, Yifan Sun, Yi Yang, and Wei Chen. Feature-proxy transformer for few-shot segmentation.Advances in Neural Information Processing Systems (NeurIPS), 35:6575–6588, 2022

work page 2022

[24] [24]

Personalize segment anything model with one shot.International Conference on Learning Representations (ICLR), 2024

Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, and Hongsheng Li. Personalize segment anything model with one shot.International Conference on Learning Representations (ICLR), 2024. 14

work page 2024