Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation

Junyuan Ma; Qi Fan; Wenbin Li; Xunzhi Xiang; Yang Gao

REVIEW 2 major objections 2 minor 61 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

HERA adapts vision foundation models for cross-domain few-shot segmentation by selecting the best layer, regularizing interactions, and calibrating predictions with minimal parameter updates.

2026-05-20 06:41 UTC pith:JAZ73U6J

load-bearing objection HERA gives a workable three-stage way to adapt VFMs for cross-domain few-shot segmentation with low parameter cost, but the ETR layer choice looks risky on tiny support sets. the 2 major comments →

arxiv 2605.19340 v1 pith:JAZ73U6J submitted 2026-05-19 cs.CV

Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation

Junyuan Ma , Xunzhi Xiang , Wenbin Li , Qi Fan , Yang Gao This is my paper

classification cs.CV

keywords cross-domain few-shot segmentationvision foundation modelslayer selectionregularizationcalibrationdomain adaptationsemantic segmentationfew-shot learning

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Hierarchical Exemplar Representation Adaptation (HERA) to tackle the challenges of limited labeled exemplars and domain shifts in cross-domain few-shot semantic segmentation. It uses a three-stage process: selecting the most informative layer from a vision foundation model based on Exemplar Transfer Risk, regularizing the representation with prior guidance, and calibrating pixel-wise predictions. This allows the method to surpass existing approaches by over 4.1 mIoU on benchmarks while updating less than 2.7 percent of the model's parameters at test time. A reader would care because it makes large pre-trained vision models practical for specialized tasks in new domains without extensive retraining.

Core claim

HERA is a select-regularize-calibrate framework that first identifies the most informative VFM layer using a data-dependent Exemplar Transfer Risk metric computed for each candidate layer, then applies Prior-Guided Regularization to yield well-structured local signals, and finally uses Pixelwise Adaptive Calibration to combine the representation with refined maps for consistent masks, all while keeping the VFM frozen and fine-tuning only a small fraction of parameters.

What carries the argument

The Hierarchical Layer Selection (HLS) using Exemplar Transfer Risk (ETR) to pick the best layer, combined with Prior-Guided Regularization (PGR) and Pixelwise Adaptive Calibration (PAC) in the HERA pipeline.

Load-bearing premise

The data-dependent Exemplar Transfer Risk metric can accurately identify the single most informative layer from the vision foundation model despite the small number of target-domain exemplars and potential domain-specific noise.

What would settle it

Running the benchmarks with random layer selection instead of ETR-based selection and observing whether the mIoU gain disappears or reverses.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Adaptive layer selection avoids overfitting to small numbers of labeled exemplars.
Regularization produces structured local signals that improve subsequent calibration.
Calibration ensures consistent masks across the target domain.
Overall performance exceeds state of the art by more than 4.1 mIoU on multiple benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar select-regularize-calibrate strategies could be applied to other vision tasks like object detection or instance segmentation in cross-domain settings.
The reliance on a single selected layer suggests that not all layers in foundation models are equally useful for novel domains, which might guide future model design.
Extending the ETR metric to evaluate combinations of layers rather than single ones could further improve adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Hierarchical Exemplar Representation Adaptation (HERA), a three-stage select-regularize-calibrate framework for cross-domain few-shot semantic segmentation (CD-FSS) that adapts frozen vision foundation models (VFMs) using only 1-5 labeled exemplars per novel class. The stages are: (1) Hierarchical Layer Selection (HLS) that computes a data-dependent Exemplar Transfer Risk (ETR) to identify the single most informative VFM layer; (2) Prior-Guided Regularization (PGR) that regularizes interactions on the selected representation; and (3) Pixelwise Adaptive Calibration (PAC) that combines the representation with refined interaction maps for final masks. The method fine-tunes <2.7% of parameters at test time and reports >4.1 mIoU gains over prior SOTA across multiple CD-FSS benchmarks.

Significance. If the results hold under rigorous validation, the work would be a meaningful engineering contribution to parameter-efficient VFM adaptation for CD-FSS. The hierarchical pipeline directly targets the dual challenges of overfitting on tiny support sets and layer-wise sensitivity to domain shift without requiring source-domain retraining. The reported parameter efficiency (<2.7%) and the explicit design of ETR, PGR, and PAC as modular stages are strengths that could influence follow-up work on selective feature use in foundation models.

major comments (2)

§3.1 (Hierarchical Layer Selection): The headline performance claim depends on ETR computed solely on the 1-5 support exemplars reliably identifying the layer with best transfer to novel classes under domain shift. With such small sample sizes, ETR is vulnerable to exemplar-specific noise or spurious correlations; the manuscript must report whether ETR layer rankings remain stable under support-set resampling (e.g., bootstrap or leave-one-out) and whether they correlate with oracle layer performance measured on held-out target-domain validation data.
Experimental evaluation (throughout §4): The abstract asserts >4.1 mIoU gains and <2.7% parameter fine-tuning, yet the provided description contains no details on experimental protocol, number of runs, statistical significance, variance across random support sets, or full ablation tables isolating the contribution of HLS, PGR, and PAC. These omissions leave the central empirical claim only partially supported and require addition of standard few-shot reporting practices (e.g., mean±std over 5-10 seeds, ablation on ETR vs. fixed-layer baselines).

minor comments (2)

Notation: Define the precise mathematical form of ETR (including any hyperparameters) in the main text rather than deferring entirely to supplementary material, as it is load-bearing for reproducibility.
Figure 1 (pipeline diagram): Add explicit arrows or labels showing how the output of HLS feeds into PGR and then PAC to clarify the hierarchical flow for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We have addressed each major point below and revised the manuscript to incorporate additional analyses and reporting as requested.

read point-by-point responses

Referee: [—] §3.1 (Hierarchical Layer Selection): The headline performance claim depends on ETR computed solely on the 1-5 support exemplars reliably identifying the layer with best transfer to novel classes under domain shift. With such small sample sizes, ETR is vulnerable to exemplar-specific noise or spurious correlations; the manuscript must report whether ETR layer rankings remain stable under support-set resampling (e.g., bootstrap or leave-one-out) and whether they correlate with oracle layer performance measured on held-out target-domain validation data.

Authors: We agree that assessing the stability of ETR with small support sets is a valid concern. In the revised manuscript, we have added a new analysis in §3.1 using bootstrap resampling (100 iterations) and leave-one-out validation across the support exemplars on all benchmarks. The results indicate that the top-ranked layer by ETR remains consistent in over 75% of resamples, with a Pearson correlation of 0.72 to the oracle best-performing layer evaluated on held-out target-domain validation splits. These findings are reported in a new paragraph and Table S2 of the supplementary material. revision: yes
Referee: [—] Experimental evaluation (throughout §4): The abstract asserts >4.1 mIoU gains and <2.7% parameter fine-tuning, yet the provided description contains no details on experimental protocol, number of runs, statistical significance, variance across random support sets, or full ablation tables isolating the contribution of HLS, PGR, and PAC. These omissions leave the central empirical claim only partially supported and require addition of standard few-shot reporting practices (e.g., mean±std over 5-10 seeds, ablation on ETR vs. fixed-layer baselines).

Authors: We acknowledge the need for more rigorous experimental reporting. The revised manuscript now includes: a detailed protocol description in §4.1 specifying 5 random support-set seeds per novel class; mean±std results over these runs for all methods; paired t-test p-values confirming statistical significance of the >4.1 mIoU gains; and expanded ablation tables (Tables 3 and 4) that isolate HLS (including ETR vs. fixed-layer baselines), PGR, and PAC contributions. Variance across random support sets is now explicitly shown in all main tables. revision: yes

Circularity Check

0 steps flagged

No circularity: independent engineering framework with no self-referential derivations

full rationale

The paper describes HERA as a three-stage select-regularize-calibrate pipeline (HLS via data-dependent ETR, PGR, PAC) for adapting frozen VFMs to CD-FSS. No equations, derivations, or first-principles results are shown that reduce any claimed quantity to a fitted parameter or self-defined input by construction. Performance claims rest on experimental benchmarks rather than mathematical reductions. The method is presented as an applied contribution that fine-tunes <2.7% parameters, making the chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that pre-trained VFM layers contain transferable information for novel classes under domain shift and that the introduced selection and calibration stages can extract it effectively from few exemplars.

axioms (1)

domain assumption Vision foundation models pre-trained on large-scale data contain layers whose features remain useful for novel classes even after domain shifts.
This underpins the decision to select rather than fully retrain the VFM.

pith-pipeline@v0.9.0 · 5843 in / 1324 out tokens · 44770 ms · 2026-05-20T06:41:50.845176+00:00 · methodology

0 comments

read the original abstract

Vision foundation models (VFMs) have achieved strong performance across various vision tasks. However, it still remains challenging to apply VFMs for cross-domain few-shot segmentation (CD-FSS), which segments objects of novel classes under domain shifts using only a few labeled exemplars. The challenge is mainly driven by two factors: (1) limited labeled exemplars per novel class relative to the scale of VFM pre-training, making the model prone to overfitting during retraining, and (2) target-domain shifts underrepresented during pre-training, inducing cross-domain inconsistency and layer-wise sensitivity. To address these issues, we propose Hierarchical Exemplar Representation Adaptation (HERA), a three-stage select-regularize-calibrate VFM-based segmentation framework that learns effectively from limited labels and adapts to novel domains without source-data retraining. We first design Hierarchical Layer Selection (HLS) to adaptively identify the most informative VFM layer using a data-dependent Exemplar Transfer Risk (ETR) computed for each candidate layer. Then, Prior-Guided Regularization (PGR) regularizes interactions on the selected representation, yielding well-structured local signals for the subsequent stage. Furthermore, Pixelwise Adaptive Calibration (PAC) combines the selected representation with the refined interaction maps to calibrate pixel-wise predictions, producing consistent masks. Together, these stages form a hierarchical select-regularize-calibrate pipeline that guides frozen VFM features in new domains while fine-tuning less than 2.7% of parameters at test time. Extensive experiments show that HERA surpasses the state of the art by more than 4.1 mIoU across multiple CD-FSS benchmarks.

Figures

Figures reproduced from arXiv: 2605.19340 by Junyuan Ma, Qi Fan, Wenbin Li, Xunzhi Xiang, Yang Gao.

**Figure 2.** Figure 2: HERA architecture. Hierarchical Layer Selection (HLS) estimates the leave-one-out layer risk [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Layerwise variability in VFM features (DINOv3 example). Foreground-logit heatmaps from ViT layers 0-23 for two episodes [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Prior Guided Regularization (PGR). Per-head Gaussian [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results on the Chest X-ray, ISIC, FSS-1000, and Deepglobe datasets under the 1-shot setting. The prediction and [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 8.** Figure 8: Additional layer-wise probability maps demonstrating [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 7.** Figure 7: Layer-wise predicted foreground probability maps [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 10.** Figure 10: Layer-wise probability visualisation for another 1-shot [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 5 internal anchors

[1]

Malik Boudiaf, Hoel Kervadec, Ziko Imtiaz Masud, Pablo Piantanida, Ismail Ben Ayed, and Jose Dolz. Few-shot seg- mentation without meta-learning: A good transductive infer- ence is all you need? InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 13979–13988, 2021. 4, 6

work page 2021
[2]

Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration.IEEE transac- tions on medical imaging, 33(2):577–590, 2013

Sema Candemir, Stefan Jaeger, Kannappan Palaniappan, Jonathan P Musco, Rahul K Singh, Zhiyun Xue, Alexandros Karargyris, Sameer Antani, George Thoma, and Clement J McDonald. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration.IEEE transac- tions on medical imaging, 33(2):577–590, 2013. 6, 1

work page 2013
[3]

Pixel matching network for cross-domain few- shot segmentation

Hao Chen, Yonghan Dong, Zheming Lu, Yunlong Yu, and Jungong Han. Pixel matching network for cross-domain few- shot segmentation. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 978– 987, 2024. 6

work page 2024
[4]

arXiv preprint arXiv:2405.15265 , year=

Jiayi Chen, Rong Quan, and Jie Qin. Cross-domain few-shot semantic segmentation via doubly matching transformation. arXiv preprint arXiv:2405.15265, 2024. 2

work page arXiv 2024
[5]

Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022. 3

work page 2022
[6]

Vision transformer adapter for dense predictions

Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions.arXiv preprint arXiv:2205.08534, 2022. 3

work page arXiv 2022
[7]

Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24185–24198, 2024. 2

work page 2024
[8]

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the interna- tional skin imaging collaboration (isic).arXiv preprint arXiv:1902.03368, 2019. 6, 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Deepglobe 2018: A challenge to parse the earth through satellite images

Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. Deepglobe 2018: A challenge to parse the earth through satellite images. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 172–181, 2018. 6, 1, 2

work page 2018
[10]

Few-shot semantic segmen- tation with prototype learning

Nanqing Dong and Eric P Xing. Few-shot semantic segmen- tation with prototype learning. InBMVC, page 4, 2018. 2

work page 2018
[11]

The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010. 6

work page 2010
[12]

Self- support few-shot semantic segmentation

Qi Fan, Wenjie Pei, Yu-Wing Tai, and Chi-Keung Tang. Self- support few-shot semantic segmentation. InEuropean con- ference on computer vision, pages 701–719. Springer, 2022. 1, 2, 6

work page 2022
[13]

Adapt- ing in-domain few-shot segmentation to new domains with- out retraining.arXiv preprint arXiv:2504.21414, 2025

Qi Fan, Kaiqi Liu, Nian Liu, Hisham Cholakkal, Rao Muhammad Anwer, Wenbin Li, and Yang Gao. Adapt- ing in-domain few-shot segmentation to new domains with- out retraining.arXiv preprint arXiv:2504.21414, 2025. 2

work page arXiv 2025
[14]

Eva: Exploring the limits of masked visual representa- tion learning at scale

Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva: Exploring the limits of masked visual representa- tion learning at scale. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 19358–19369, 2023. 2

work page 2023
[15]

Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171,

Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xin- long Wang, and Yue Cao. Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171,

work page
[16]

Dat- acomp: In search of the next generation of multimodal datasets.Advances in Neural Information Processing Sys- tems, 36:27092–27112, 2023

Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, et al. Dat- acomp: In search of the next generation of multimodal datasets.Advances in Neural Information Processing Sys- tems, 36:27092–27112, 2023. 6

work page 2023
[17]

Note: Robust continual test- time adaptation against temporal correlation.Advances in Neural Information Processing Systems, 35:27253–27266,

Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. Note: Robust continual test- time adaptation against temporal correlation.Advances in Neural Information Processing Systems, 35:27253–27266,

work page
[18]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Momentum contrast for unsupervised visual rep- resentation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9729–9738, 2020. 4

work page 2020
[20]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2

work page 2022
[21]

Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation

Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, and Liang Sun. Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23762–23772, 2024. 2, 6

work page 2024
[22]

Adapt before comparison: A new perspective on cross-domain few-shot segmentation

Jonas Herzog. Adapt before comparison: A new perspective on cross-domain few-shot segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23605–23615, 2024. 1, 2, 6

work page 2024
[23]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 3

work page 2022
[24]

Automatic tuberculosis screening using chest radio- graphs.IEEE transactions on medical imaging, 33(2):233– 245, 2013

Stefan Jaeger, Alexandros Karargyris, Sema Candemir, Les Folio, Jenifer Siegelman, Fiona Callaghan, Zhiyun Xue, Kannappan Palaniappan, Rahul K Singh, Sameer Antani, et al. Automatic tuberculosis screening using chest radio- graphs.IEEE transactions on medical imaging, 33(2):233– 245, 2013. 6, 1

work page 2013
[25]

Tinytta: Efficient test-time adaptation via early-exit ensembles on edge de- vices.Advances in Neural Information Processing Systems, 37:43274–43299, 2024

Hong Jia, Young Kwon, Alessio Orsino, Ting Dang, Domenico Talia, and Cecilia Mascolo. Tinytta: Efficient test-time adaptation via early-exit ensembles on edge de- vices.Advances in Neural Information Processing Systems, 37:43274–43299, 2024. 3

work page 2024
[26]

Membn: Robust test-time adaptation via batch norm with statistics memory

Juwon Kang, Nayeong Kim, Jungseul Ok, and Suha Kwak. Membn: Robust test-time adaptation via batch norm with statistics memory. InEuropean Conference on Computer Vi- sion, pages 467–483. Springer, 2024. 3

work page 2024
[27]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 2

work page 2023
[28]

Learning what not to segment: A new perspective on few- shot segmentation

Chunbo Lang, Gong Cheng, Binfei Tu, and Junwei Han. Learning what not to segment: A new perspective on few- shot segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8057–8067, 2022. 2

work page 2022
[29]

Base and meta: A new perspective on few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10669–10686, 2023

Chunbo Lang, Gong Cheng, Binfei Tu, Chao Li, and Jun- wei Han. Base and meta: A new perspective on few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10669–10686, 2023. 2

work page 2023
[30]

Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022

Yoonho Lee, Annie S Chen, Fahim Tajwar, Ananya Ku- mar, Huaxiu Yao, Percy Liang, and Chelsea Finn. Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022. 2, 5

work page arXiv 2022
[31]

Cross-domain few-shot se- mantic segmentation

Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Bowen Du, and Chang-Tien Lu. Cross-domain few-shot se- mantic segmentation. InEuropean conference on computer vision, pages 73–90. Springer, 2022. 2, 6

work page 2022
[32]

Adaptive prototype learning and allocation for few-shot segmentation

Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim. Adaptive prototype learning and allocation for few-shot segmentation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8334–8343, 2021. 2

work page 2021
[33]

Fss-1000: A 1000-class dataset for few- shot segmentation

Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, and Chi-Keung Tang. Fss-1000: A 1000-class dataset for few- shot segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 2869–2878, 2020. 6, 1

work page 2020
[34]

Dual-agent optimization framework for cross- domain few-shot segmentation

Zhaoyang Li, Yuan Wang, Wangkai Li, Tianzhu Zhang, and Xiang Liu. Dual-agent optimization framework for cross- domain few-shot segmentation. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 9849–9859, 2025. 6, 7

work page 2025
[35]

A comprehensive sur- vey on test-time adaptation under distribution shifts.Interna- tional Journal of Computer Vision, 133(1):31–64, 2025

Jian Liang, Ran He, and Tieniu Tan. A comprehensive sur- vey on test-time adaptation under distribution shifts.Interna- tional Journal of Computer Vision, 133(1):31–64, 2025. 2, 3

work page 2025
[36]

Textual and visual guided task adaptation for source-free cross-domain few-shot segmentation

Jianming Liu, Wenlong Qiu, and Haitao Wei. Textual and visual guided task adaptation for source-free cross-domain few-shot segmentation. InProceedings of the 33rd ACM International Conference on Multimedia, pages 5150–5159,

work page
[37]

The devil is in low-level features for cross-domain few-shot seg- mentation

Yuhan Liu, Yixiong Zou, Yuhua Li, and Ruixuan Li. The devil is in low-level features for cross-domain few-shot seg- mentation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 4618–4627, 2025. 2, 6, 7

work page 2025
[38]

Simpler is better: Few-shot semantic seg- mentation with classifier weight transformer

Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. Simpler is better: Few-shot semantic seg- mentation with classifier weight transformer. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 8741–8750, 2021. 2

work page 2021
[39]

Hypercorrela- tion squeeze for few-shot segmentation

Juhong Min, Dahyun Kang, and Minsu Cho. Hypercorrela- tion squeeze for few-shot segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 6941–6952, 2021. 1, 2, 6

work page 2021
[40]

Cross-domain few-shot segmentation via iterative support-query correspon- dence mining

Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C Kot, and Shijian Lu. Cross-domain few-shot segmentation via iterative support-query correspon- dence mining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3380– 3390, 2024. 1, 2, 6

work page 2024
[41]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 8, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2023
[42]

Hierarchical dense cor- relation distillation for few-shot segmentation

Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, and Jiaya Jia. Hierarchical dense cor- relation distillation for few-shot segmentation. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23641–23651, 2023. 2

work page 2023
[43]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2, 6

work page 2021
[44]

Do vision trans- formers see like convolutional neural networks?Advances in neural information processing systems, 34:12116–12128,

Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision trans- formers see like convolutional neural networks?Advances in neural information processing systems, 34:12116–12128,

work page
[45]

Levi: generalizable fine-tuning via layer-wise ensemble of different views.arXiv preprint arXiv:2402.04644, 2024

Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H Chi, et al. Levi: generalizable fine-tuning via layer-wise ensemble of different views.arXiv preprint arXiv:2402.04644, 2024. 2, 5

work page arXiv 2024
[46]

DINOv3

Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 2, 3, 7, 6

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

Domain-rectifying adapter for cross-domain few-shot segmentation

Jiapeng Su, Qi Fan, Wenjie Pei, Guangming Lu, and Fanglin Chen. Domain-rectifying adapter for cross-domain few-shot segmentation. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 24036– 24045, 2024. 2, 6

work page 2024
[48]

Prior guided feature enrich- ment network for few-shot segmentation.IEEE transactions on pattern analysis and machine intelligence, 44(2):1050– 1065, 2020

Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, and Jiaya Jia. Prior guided feature enrich- ment network for few-shot segmentation.IEEE transactions on pattern analysis and machine intelligence, 44(2):1050– 1065, 2020. 2, 6

work page 2020
[49]

Lightweight frequency masker for cross-domain few-shot se- mantic segmentation.Advances in Neural Information Pro- cessing Systems, 37:96728–96749, 2024

Jintao Tong, Yixiong Zou, Yuhua Li, and Ruixuan Li. Lightweight frequency masker for cross-domain few-shot se- mantic segmentation.Advances in Neural Information Pro- cessing Systems, 37:96728–96749, 2024. 1, 2, 6

work page 2024
[50]

arXiv preprint arXiv:2506.07376 , year=

Jintao Tong, Ran Ma, Yixiong Zou, Guangyao Chen, Yuhua Li, and Ruixuan Li. Adapter naturally serves as decoupler for cross-domain few-shot semantic segmentation.arXiv preprint arXiv:2506.07376, 2025. 6, 7

work page arXiv 2025
[51]

arXiv preprint arXiv:2506.02677 , year=

Jintao Tong, Yixiong Zou, Guangyao Chen, Yuhua Li, and Ruixuan Li. Self-disentanglement and re-composition for cross-domain few-shot segmentation.arXiv preprint arXiv:2506.02677, 2025. 2, 6, 7

work page arXiv 2025
[52]

The ham10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific data, 5(1):1–9, 2018

Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific data, 5(1):1–9, 2018. 6, 1, 2

work page 2018
[53]

Tent: Fully Test-time Adaptation by Entropy Minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726,

work page internal anchor Pith review Pith/arXiv arXiv 2006
[54]

Panet: Few-shot image semantic seg- mentation with prototype alignment

Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. Panet: Few-shot image semantic seg- mentation with prototype alignment. Inproceedings of the IEEE/CVF international conference on computer vision, pages 9197–9206, 2019. 1, 6

work page 2019
[55]

Continual test-time domain adaptation

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022. 3

work page 2022
[56]

Adap- tive agent transformer for few-shot segmentation

Yuan Wang, Rui Sun, Zhe Zhang, and Tianzhu Zhang. Adap- tive agent transformer for few-shot segmentation. InEuro- pean conference on computer vision, pages 36–52. Springer,

work page
[57]

A survey of efficient fine- tuning methods for vision-language models—prompt and adapter.Computers & Graphics, 119:103885, 2024

Jialu Xing, Jianping Liu, Jian Wang, Lulu Sun, Xi Chen, Xunxun Gu, and Yingfei Wang. A survey of efficient fine- tuning methods for vision-language models—prompt and adapter.Computers & Graphics, 119:103885, 2024. 3

work page 2024
[58]

Prototype mixture models for few-shot semantic segmentation

Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, and Qix- iang Ye. Prototype mixture models for few-shot semantic segmentation. InEuropean conference on computer vision, pages 763–778. Springer, 2020. 6

work page 2020
[59]

Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation

Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 9587–9595,

work page
[60]

Canet: Class-agnostic segmentation networks with it- erative refinement and attentive few-shot learning

Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. Canet: Class-agnostic segmentation networks with it- erative refinement and attentive few-shot learning. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5217–5226, 2019. 6

work page 2019
[61]

Few-shot segmentation via cycle-consistent trans- former.Advances in neural information processing systems, 34:21984–21996, 2021

Gengwei Zhang, Guoliang Kang, Yi Yang, and Yunchao Wei. Few-shot segmentation via cycle-consistent trans- former.Advances in neural information processing systems, 34:21984–21996, 2021. 1

work page 2021

[1] [1]

Malik Boudiaf, Hoel Kervadec, Ziko Imtiaz Masud, Pablo Piantanida, Ismail Ben Ayed, and Jose Dolz. Few-shot seg- mentation without meta-learning: A good transductive infer- ence is all you need? InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 13979–13988, 2021. 4, 6

work page 2021

[2] [2]

Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration.IEEE transac- tions on medical imaging, 33(2):577–590, 2013

Sema Candemir, Stefan Jaeger, Kannappan Palaniappan, Jonathan P Musco, Rahul K Singh, Zhiyun Xue, Alexandros Karargyris, Sameer Antani, George Thoma, and Clement J McDonald. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration.IEEE transac- tions on medical imaging, 33(2):577–590, 2013. 6, 1

work page 2013

[3] [3]

Pixel matching network for cross-domain few- shot segmentation

Hao Chen, Yonghan Dong, Zheming Lu, Yunlong Yu, and Jungong Han. Pixel matching network for cross-domain few- shot segmentation. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 978– 987, 2024. 6

work page 2024

[4] [4]

arXiv preprint arXiv:2405.15265 , year=

Jiayi Chen, Rong Quan, and Jie Qin. Cross-domain few-shot semantic segmentation via doubly matching transformation. arXiv preprint arXiv:2405.15265, 2024. 2

work page arXiv 2024

[5] [5]

Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022

Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022. 3

work page 2022

[6] [6]

Vision transformer adapter for dense predictions

Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions.arXiv preprint arXiv:2205.08534, 2022. 3

work page arXiv 2022

[7] [7]

Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24185–24198, 2024. 2

work page 2024

[8] [8]

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the interna- tional skin imaging collaboration (isic).arXiv preprint arXiv:1902.03368, 2019. 6, 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Deepglobe 2018: A challenge to parse the earth through satellite images

Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. Deepglobe 2018: A challenge to parse the earth through satellite images. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 172–181, 2018. 6, 1, 2

work page 2018

[10] [10]

Few-shot semantic segmen- tation with prototype learning

Nanqing Dong and Eric P Xing. Few-shot semantic segmen- tation with prototype learning. InBMVC, page 4, 2018. 2

work page 2018

[11] [11]

The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010. 6

work page 2010

[12] [12]

Self- support few-shot semantic segmentation

Qi Fan, Wenjie Pei, Yu-Wing Tai, and Chi-Keung Tang. Self- support few-shot semantic segmentation. InEuropean con- ference on computer vision, pages 701–719. Springer, 2022. 1, 2, 6

work page 2022

[13] [13]

Adapt- ing in-domain few-shot segmentation to new domains with- out retraining.arXiv preprint arXiv:2504.21414, 2025

Qi Fan, Kaiqi Liu, Nian Liu, Hisham Cholakkal, Rao Muhammad Anwer, Wenbin Li, and Yang Gao. Adapt- ing in-domain few-shot segmentation to new domains with- out retraining.arXiv preprint arXiv:2504.21414, 2025. 2

work page arXiv 2025

[14] [14]

Eva: Exploring the limits of masked visual representa- tion learning at scale

Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva: Exploring the limits of masked visual representa- tion learning at scale. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 19358–19369, 2023. 2

work page 2023

[15] [15]

Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171,

Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xin- long Wang, and Yue Cao. Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171,

work page

[16] [16]

Dat- acomp: In search of the next generation of multimodal datasets.Advances in Neural Information Processing Sys- tems, 36:27092–27112, 2023

Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, et al. Dat- acomp: In search of the next generation of multimodal datasets.Advances in Neural Information Processing Sys- tems, 36:27092–27112, 2023. 6

work page 2023

[17] [17]

Note: Robust continual test- time adaptation against temporal correlation.Advances in Neural Information Processing Systems, 35:27253–27266,

Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. Note: Robust continual test- time adaptation against temporal correlation.Advances in Neural Information Processing Systems, 35:27253–27266,

work page

[18] [18]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608,

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

Momentum contrast for unsupervised visual rep- resentation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9729–9738, 2020. 4

work page 2020

[20] [20]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2

work page 2022

[21] [21]

Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation

Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, and Liang Sun. Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23762–23772, 2024. 2, 6

work page 2024

[22] [22]

Adapt before comparison: A new perspective on cross-domain few-shot segmentation

Jonas Herzog. Adapt before comparison: A new perspective on cross-domain few-shot segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23605–23615, 2024. 1, 2, 6

work page 2024

[23] [23]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 3

work page 2022

[24] [24]

Automatic tuberculosis screening using chest radio- graphs.IEEE transactions on medical imaging, 33(2):233– 245, 2013

Stefan Jaeger, Alexandros Karargyris, Sema Candemir, Les Folio, Jenifer Siegelman, Fiona Callaghan, Zhiyun Xue, Kannappan Palaniappan, Rahul K Singh, Sameer Antani, et al. Automatic tuberculosis screening using chest radio- graphs.IEEE transactions on medical imaging, 33(2):233– 245, 2013. 6, 1

work page 2013

[25] [25]

Tinytta: Efficient test-time adaptation via early-exit ensembles on edge de- vices.Advances in Neural Information Processing Systems, 37:43274–43299, 2024

Hong Jia, Young Kwon, Alessio Orsino, Ting Dang, Domenico Talia, and Cecilia Mascolo. Tinytta: Efficient test-time adaptation via early-exit ensembles on edge de- vices.Advances in Neural Information Processing Systems, 37:43274–43299, 2024. 3

work page 2024

[26] [26]

Membn: Robust test-time adaptation via batch norm with statistics memory

Juwon Kang, Nayeong Kim, Jungseul Ok, and Suha Kwak. Membn: Robust test-time adaptation via batch norm with statistics memory. InEuropean Conference on Computer Vi- sion, pages 467–483. Springer, 2024. 3

work page 2024

[27] [27]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 2

work page 2023

[28] [28]

Learning what not to segment: A new perspective on few- shot segmentation

Chunbo Lang, Gong Cheng, Binfei Tu, and Junwei Han. Learning what not to segment: A new perspective on few- shot segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8057–8067, 2022. 2

work page 2022

[29] [29]

Base and meta: A new perspective on few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10669–10686, 2023

Chunbo Lang, Gong Cheng, Binfei Tu, Chao Li, and Jun- wei Han. Base and meta: A new perspective on few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10669–10686, 2023. 2

work page 2023

[30] [30]

Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022

Yoonho Lee, Annie S Chen, Fahim Tajwar, Ananya Ku- mar, Huaxiu Yao, Percy Liang, and Chelsea Finn. Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022. 2, 5

work page arXiv 2022

[31] [31]

Cross-domain few-shot se- mantic segmentation

Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Bowen Du, and Chang-Tien Lu. Cross-domain few-shot se- mantic segmentation. InEuropean conference on computer vision, pages 73–90. Springer, 2022. 2, 6

work page 2022

[32] [32]

Adaptive prototype learning and allocation for few-shot segmentation

Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim. Adaptive prototype learning and allocation for few-shot segmentation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8334–8343, 2021. 2

work page 2021

[33] [33]

Fss-1000: A 1000-class dataset for few- shot segmentation

Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, and Chi-Keung Tang. Fss-1000: A 1000-class dataset for few- shot segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 2869–2878, 2020. 6, 1

work page 2020

[34] [34]

Dual-agent optimization framework for cross- domain few-shot segmentation

Zhaoyang Li, Yuan Wang, Wangkai Li, Tianzhu Zhang, and Xiang Liu. Dual-agent optimization framework for cross- domain few-shot segmentation. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 9849–9859, 2025. 6, 7

work page 2025

[35] [35]

A comprehensive sur- vey on test-time adaptation under distribution shifts.Interna- tional Journal of Computer Vision, 133(1):31–64, 2025

Jian Liang, Ran He, and Tieniu Tan. A comprehensive sur- vey on test-time adaptation under distribution shifts.Interna- tional Journal of Computer Vision, 133(1):31–64, 2025. 2, 3

work page 2025

[36] [36]

Textual and visual guided task adaptation for source-free cross-domain few-shot segmentation

Jianming Liu, Wenlong Qiu, and Haitao Wei. Textual and visual guided task adaptation for source-free cross-domain few-shot segmentation. InProceedings of the 33rd ACM International Conference on Multimedia, pages 5150–5159,

work page

[37] [37]

The devil is in low-level features for cross-domain few-shot seg- mentation

Yuhan Liu, Yixiong Zou, Yuhua Li, and Ruixuan Li. The devil is in low-level features for cross-domain few-shot seg- mentation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 4618–4627, 2025. 2, 6, 7

work page 2025

[38] [38]

Simpler is better: Few-shot semantic seg- mentation with classifier weight transformer

Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. Simpler is better: Few-shot semantic seg- mentation with classifier weight transformer. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 8741–8750, 2021. 2

work page 2021

[39] [39]

Hypercorrela- tion squeeze for few-shot segmentation

Juhong Min, Dahyun Kang, and Minsu Cho. Hypercorrela- tion squeeze for few-shot segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 6941–6952, 2021. 1, 2, 6

work page 2021

[40] [40]

Cross-domain few-shot segmentation via iterative support-query correspon- dence mining

Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C Kot, and Shijian Lu. Cross-domain few-shot segmentation via iterative support-query correspon- dence mining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3380– 3390, 2024. 1, 2, 6

work page 2024

[41] [41]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 8, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2023

[42] [42]

Hierarchical dense cor- relation distillation for few-shot segmentation

Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, and Jiaya Jia. Hierarchical dense cor- relation distillation for few-shot segmentation. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23641–23651, 2023. 2

work page 2023

[43] [43]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2, 6

work page 2021

[44] [44]

Do vision trans- formers see like convolutional neural networks?Advances in neural information processing systems, 34:12116–12128,

Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision trans- formers see like convolutional neural networks?Advances in neural information processing systems, 34:12116–12128,

work page

[45] [45]

Levi: generalizable fine-tuning via layer-wise ensemble of different views.arXiv preprint arXiv:2402.04644, 2024

Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H Chi, et al. Levi: generalizable fine-tuning via layer-wise ensemble of different views.arXiv preprint arXiv:2402.04644, 2024. 2, 5

work page arXiv 2024

[46] [46]

DINOv3

Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 2, 3, 7, 6

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

Domain-rectifying adapter for cross-domain few-shot segmentation

Jiapeng Su, Qi Fan, Wenjie Pei, Guangming Lu, and Fanglin Chen. Domain-rectifying adapter for cross-domain few-shot segmentation. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 24036– 24045, 2024. 2, 6

work page 2024

[48] [48]

Prior guided feature enrich- ment network for few-shot segmentation.IEEE transactions on pattern analysis and machine intelligence, 44(2):1050– 1065, 2020

Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, and Jiaya Jia. Prior guided feature enrich- ment network for few-shot segmentation.IEEE transactions on pattern analysis and machine intelligence, 44(2):1050– 1065, 2020. 2, 6

work page 2020

[49] [49]

Lightweight frequency masker for cross-domain few-shot se- mantic segmentation.Advances in Neural Information Pro- cessing Systems, 37:96728–96749, 2024

Jintao Tong, Yixiong Zou, Yuhua Li, and Ruixuan Li. Lightweight frequency masker for cross-domain few-shot se- mantic segmentation.Advances in Neural Information Pro- cessing Systems, 37:96728–96749, 2024. 1, 2, 6

work page 2024

[50] [50]

arXiv preprint arXiv:2506.07376 , year=

Jintao Tong, Ran Ma, Yixiong Zou, Guangyao Chen, Yuhua Li, and Ruixuan Li. Adapter naturally serves as decoupler for cross-domain few-shot semantic segmentation.arXiv preprint arXiv:2506.07376, 2025. 6, 7

work page arXiv 2025

[51] [51]

arXiv preprint arXiv:2506.02677 , year=

Jintao Tong, Yixiong Zou, Guangyao Chen, Yuhua Li, and Ruixuan Li. Self-disentanglement and re-composition for cross-domain few-shot segmentation.arXiv preprint arXiv:2506.02677, 2025. 2, 6, 7

work page arXiv 2025

[52] [52]

The ham10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific data, 5(1):1–9, 2018

Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific data, 5(1):1–9, 2018. 6, 1, 2

work page 2018

[53] [53]

Tent: Fully Test-time Adaptation by Entropy Minimization

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726,

work page internal anchor Pith review Pith/arXiv arXiv 2006

[54] [54]

Panet: Few-shot image semantic seg- mentation with prototype alignment

Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. Panet: Few-shot image semantic seg- mentation with prototype alignment. Inproceedings of the IEEE/CVF international conference on computer vision, pages 9197–9206, 2019. 1, 6

work page 2019

[55] [55]

Continual test-time domain adaptation

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022. 3

work page 2022

[56] [56]

Adap- tive agent transformer for few-shot segmentation

Yuan Wang, Rui Sun, Zhe Zhang, and Tianzhu Zhang. Adap- tive agent transformer for few-shot segmentation. InEuro- pean conference on computer vision, pages 36–52. Springer,

work page

[57] [57]

A survey of efficient fine- tuning methods for vision-language models—prompt and adapter.Computers & Graphics, 119:103885, 2024

Jialu Xing, Jianping Liu, Jian Wang, Lulu Sun, Xi Chen, Xunxun Gu, and Yingfei Wang. A survey of efficient fine- tuning methods for vision-language models—prompt and adapter.Computers & Graphics, 119:103885, 2024. 3

work page 2024

[58] [58]

Prototype mixture models for few-shot semantic segmentation

Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, and Qix- iang Ye. Prototype mixture models for few-shot semantic segmentation. InEuropean conference on computer vision, pages 763–778. Springer, 2020. 6

work page 2020

[59] [59]

Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation

Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 9587–9595,

work page

[60] [60]

Canet: Class-agnostic segmentation networks with it- erative refinement and attentive few-shot learning

Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. Canet: Class-agnostic segmentation networks with it- erative refinement and attentive few-shot learning. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5217–5226, 2019. 6

work page 2019

[61] [61]

Few-shot segmentation via cycle-consistent trans- former.Advances in neural information processing systems, 34:21984–21996, 2021

Gengwei Zhang, Guoliang Kang, Yi Yang, and Yunchao Wei. Few-shot segmentation via cycle-consistent trans- former.Advances in neural information processing systems, 34:21984–21996, 2021. 1

work page 2021