Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation
Pith reviewed 2026-05-20 06:41 UTC · model grok-4.3
The pith
HERA adapts vision foundation models for cross-domain few-shot segmentation by selecting the best layer, regularizing interactions, and calibrating predictions with minimal parameter updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HERA is a select-regularize-calibrate framework that first identifies the most informative VFM layer using a data-dependent Exemplar Transfer Risk metric computed for each candidate layer, then applies Prior-Guided Regularization to yield well-structured local signals, and finally uses Pixelwise Adaptive Calibration to combine the representation with refined maps for consistent masks, all while keeping the VFM frozen and fine-tuning only a small fraction of parameters.
What carries the argument
The Hierarchical Layer Selection (HLS) using Exemplar Transfer Risk (ETR) to pick the best layer, combined with Prior-Guided Regularization (PGR) and Pixelwise Adaptive Calibration (PAC) in the HERA pipeline.
If this is right
- Adaptive layer selection avoids overfitting to small numbers of labeled exemplars.
- Regularization produces structured local signals that improve subsequent calibration.
- Calibration ensures consistent masks across the target domain.
- Overall performance exceeds state of the art by more than 4.1 mIoU on multiple benchmarks.
Where Pith is reading between the lines
- Similar select-regularize-calibrate strategies could be applied to other vision tasks like object detection or instance segmentation in cross-domain settings.
- The reliance on a single selected layer suggests that not all layers in foundation models are equally useful for novel domains, which might guide future model design.
- Extending the ETR metric to evaluate combinations of layers rather than single ones could further improve adaptation.
Load-bearing premise
The data-dependent Exemplar Transfer Risk metric can accurately identify the single most informative layer from the vision foundation model despite the small number of target-domain exemplars and potential domain-specific noise.
What would settle it
Running the benchmarks with random layer selection instead of ETR-based selection and observing whether the mIoU gain disappears or reverses.
Figures
read the original abstract
Vision foundation models (VFMs) have achieved strong performance across various vision tasks. However, it still remains challenging to apply VFMs for cross-domain few-shot segmentation (CD-FSS), which segments objects of novel classes under domain shifts using only a few labeled exemplars. The challenge is mainly driven by two factors: (1) limited labeled exemplars per novel class relative to the scale of VFM pre-training, making the model prone to overfitting during retraining, and (2) target-domain shifts underrepresented during pre-training, inducing cross-domain inconsistency and layer-wise sensitivity. To address these issues, we propose Hierarchical Exemplar Representation Adaptation (HERA), a three-stage select-regularize-calibrate VFM-based segmentation framework that learns effectively from limited labels and adapts to novel domains without source-data retraining. We first design Hierarchical Layer Selection (HLS) to adaptively identify the most informative VFM layer using a data-dependent Exemplar Transfer Risk (ETR) computed for each candidate layer. Then, Prior-Guided Regularization (PGR) regularizes interactions on the selected representation, yielding well-structured local signals for the subsequent stage. Furthermore, Pixelwise Adaptive Calibration (PAC) combines the selected representation with the refined interaction maps to calibrate pixel-wise predictions, producing consistent masks. Together, these stages form a hierarchical select-regularize-calibrate pipeline that guides frozen VFM features in new domains while fine-tuning less than 2.7% of parameters at test time. Extensive experiments show that HERA surpasses the state of the art by more than 4.1 mIoU across multiple CD-FSS benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Hierarchical Exemplar Representation Adaptation (HERA), a three-stage select-regularize-calibrate framework for cross-domain few-shot semantic segmentation (CD-FSS) that adapts frozen vision foundation models (VFMs) using only 1-5 labeled exemplars per novel class. The stages are: (1) Hierarchical Layer Selection (HLS) that computes a data-dependent Exemplar Transfer Risk (ETR) to identify the single most informative VFM layer; (2) Prior-Guided Regularization (PGR) that regularizes interactions on the selected representation; and (3) Pixelwise Adaptive Calibration (PAC) that combines the representation with refined interaction maps for final masks. The method fine-tunes <2.7% of parameters at test time and reports >4.1 mIoU gains over prior SOTA across multiple CD-FSS benchmarks.
Significance. If the results hold under rigorous validation, the work would be a meaningful engineering contribution to parameter-efficient VFM adaptation for CD-FSS. The hierarchical pipeline directly targets the dual challenges of overfitting on tiny support sets and layer-wise sensitivity to domain shift without requiring source-domain retraining. The reported parameter efficiency (<2.7%) and the explicit design of ETR, PGR, and PAC as modular stages are strengths that could influence follow-up work on selective feature use in foundation models.
major comments (2)
- §3.1 (Hierarchical Layer Selection): The headline performance claim depends on ETR computed solely on the 1-5 support exemplars reliably identifying the layer with best transfer to novel classes under domain shift. With such small sample sizes, ETR is vulnerable to exemplar-specific noise or spurious correlations; the manuscript must report whether ETR layer rankings remain stable under support-set resampling (e.g., bootstrap or leave-one-out) and whether they correlate with oracle layer performance measured on held-out target-domain validation data.
- Experimental evaluation (throughout §4): The abstract asserts >4.1 mIoU gains and <2.7% parameter fine-tuning, yet the provided description contains no details on experimental protocol, number of runs, statistical significance, variance across random support sets, or full ablation tables isolating the contribution of HLS, PGR, and PAC. These omissions leave the central empirical claim only partially supported and require addition of standard few-shot reporting practices (e.g., mean±std over 5-10 seeds, ablation on ETR vs. fixed-layer baselines).
minor comments (2)
- Notation: Define the precise mathematical form of ETR (including any hyperparameters) in the main text rather than deferring entirely to supplementary material, as it is load-bearing for reproducibility.
- Figure 1 (pipeline diagram): Add explicit arrows or labels showing how the output of HLS feeds into PGR and then PAC to clarify the hierarchical flow for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We have addressed each major point below and revised the manuscript to incorporate additional analyses and reporting as requested.
read point-by-point responses
-
Referee: [—] §3.1 (Hierarchical Layer Selection): The headline performance claim depends on ETR computed solely on the 1-5 support exemplars reliably identifying the layer with best transfer to novel classes under domain shift. With such small sample sizes, ETR is vulnerable to exemplar-specific noise or spurious correlations; the manuscript must report whether ETR layer rankings remain stable under support-set resampling (e.g., bootstrap or leave-one-out) and whether they correlate with oracle layer performance measured on held-out target-domain validation data.
Authors: We agree that assessing the stability of ETR with small support sets is a valid concern. In the revised manuscript, we have added a new analysis in §3.1 using bootstrap resampling (100 iterations) and leave-one-out validation across the support exemplars on all benchmarks. The results indicate that the top-ranked layer by ETR remains consistent in over 75% of resamples, with a Pearson correlation of 0.72 to the oracle best-performing layer evaluated on held-out target-domain validation splits. These findings are reported in a new paragraph and Table S2 of the supplementary material. revision: yes
-
Referee: [—] Experimental evaluation (throughout §4): The abstract asserts >4.1 mIoU gains and <2.7% parameter fine-tuning, yet the provided description contains no details on experimental protocol, number of runs, statistical significance, variance across random support sets, or full ablation tables isolating the contribution of HLS, PGR, and PAC. These omissions leave the central empirical claim only partially supported and require addition of standard few-shot reporting practices (e.g., mean±std over 5-10 seeds, ablation on ETR vs. fixed-layer baselines).
Authors: We acknowledge the need for more rigorous experimental reporting. The revised manuscript now includes: a detailed protocol description in §4.1 specifying 5 random support-set seeds per novel class; mean±std results over these runs for all methods; paired t-test p-values confirming statistical significance of the >4.1 mIoU gains; and expanded ablation tables (Tables 3 and 4) that isolate HLS (including ETR vs. fixed-layer baselines), PGR, and PAC contributions. Variance across random support sets is now explicitly shown in all main tables. revision: yes
Circularity Check
No circularity: independent engineering framework with no self-referential derivations
full rationale
The paper describes HERA as a three-stage select-regularize-calibrate pipeline (HLS via data-dependent ETR, PGR, PAC) for adapting frozen VFMs to CD-FSS. No equations, derivations, or first-principles results are shown that reduce any claimed quantity to a fitted parameter or self-defined input by construction. Performance claims rest on experimental benchmarks rather than mathematical reductions. The method is presented as an applied contribution that fine-tunes <2.7% parameters, making the chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Vision foundation models pre-trained on large-scale data contain layers whose features remain useful for novel classes even after domain shifts.
Reference graph
Works this paper leans on
-
[1]
Malik Boudiaf, Hoel Kervadec, Ziko Imtiaz Masud, Pablo Piantanida, Ismail Ben Ayed, and Jose Dolz. Few-shot seg- mentation without meta-learning: A good transductive infer- ence is all you need? InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 13979–13988, 2021. 4, 6
work page 2021
-
[2]
Sema Candemir, Stefan Jaeger, Kannappan Palaniappan, Jonathan P Musco, Rahul K Singh, Zhiyun Xue, Alexandros Karargyris, Sameer Antani, George Thoma, and Clement J McDonald. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration.IEEE transac- tions on medical imaging, 33(2):577–590, 2013. 6, 1
work page 2013
-
[3]
Pixel matching network for cross-domain few- shot segmentation
Hao Chen, Yonghan Dong, Zheming Lu, Yunlong Yu, and Jungong Han. Pixel matching network for cross-domain few- shot segmentation. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 978– 987, 2024. 6
work page 2024
-
[4]
Cross-domain few-shot semantic segmentation via doubly matching transformation
Jiayi Chen, Rong Quan, and Jie Qin. Cross-domain few-shot semantic segmentation via doubly matching transformation. arXiv preprint arXiv:2405.15265, 2024. 2
-
[5]
Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recogni- tion.Advances in Neural Information Processing Systems, 35:16664–16678, 2022. 3
work page 2022
-
[6]
Vision transformer adapter for dense predictions,
Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions.arXiv preprint arXiv:2205.08534, 2022. 3
-
[7]
Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24185–24198, 2024. 2
work page 2024
-
[8]
Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the interna- tional skin imaging collaboration (isic).arXiv preprint arXiv:1902.03368, 2019. 6, 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
Deepglobe 2018: A challenge to parse the earth through satellite images
Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. Deepglobe 2018: A challenge to parse the earth through satellite images. InProceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 172–181, 2018. 6, 1, 2
work page 2018
-
[10]
Few-shot semantic segmen- tation with prototype learning
Nanqing Dong and Eric P Xing. Few-shot semantic segmen- tation with prototype learning. InBMVC, page 4, 2018. 2
work page 2018
-
[11]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338, 2010. 6
work page 2010
-
[12]
Self- support few-shot semantic segmentation
Qi Fan, Wenjie Pei, Yu-Wing Tai, and Chi-Keung Tang. Self- support few-shot semantic segmentation. InEuropean con- ference on computer vision, pages 701–719. Springer, 2022. 1, 2, 6
work page 2022
-
[13]
Qi Fan, Kaiqi Liu, Nian Liu, Hisham Cholakkal, Rao Muhammad Anwer, Wenbin Li, and Yang Gao. Adapt- ing in-domain few-shot segmentation to new domains with- out retraining.arXiv preprint arXiv:2504.21414, 2025. 2
-
[14]
Eva: Exploring the limits of masked visual representa- tion learning at scale
Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva: Exploring the limits of masked visual representa- tion learning at scale. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 19358–19369, 2023. 2
work page 2023
-
[15]
Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171,
Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xin- long Wang, and Yue Cao. Eva-02: A visual representation for neon genesis.Image and Vision Computing, 149:105171,
-
[16]
Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, et al. Dat- acomp: In search of the next generation of multimodal datasets.Advances in Neural Information Processing Sys- tems, 36:27092–27112, 2023. 6
work page 2023
-
[17]
Taesik Gong, Jongheon Jeong, Taewon Kim, Yewon Kim, Jinwoo Shin, and Sung-Ju Lee. Note: Robust continual test- time adaptation against temporal correlation.Advances in Neural Information Processing Systems, 35:27253–27266,
-
[18]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, and Sai Qian Zhang. Parameter-efficient fine-tuning for large models: A comprehensive survey.arXiv preprint arXiv:2403.14608,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
Momentum contrast for unsupervised visual rep- resentation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9729–9738, 2020. 4
work page 2020
-
[20]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2
work page 2022
-
[21]
Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation
Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, and Liang Sun. Apseg: Auto-prompt network for cross-domain few-shot semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23762–23772, 2024. 2, 6
work page 2024
-
[22]
Adapt before comparison: A new perspective on cross-domain few-shot segmentation
Jonas Herzog. Adapt before comparison: A new perspective on cross-domain few-shot segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23605–23615, 2024. 1, 2, 6
work page 2024
-
[23]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 3
work page 2022
-
[24]
Stefan Jaeger, Alexandros Karargyris, Sema Candemir, Les Folio, Jenifer Siegelman, Fiona Callaghan, Zhiyun Xue, Kannappan Palaniappan, Rahul K Singh, Sameer Antani, et al. Automatic tuberculosis screening using chest radio- graphs.IEEE transactions on medical imaging, 33(2):233– 245, 2013. 6, 1
work page 2013
-
[25]
Hong Jia, Young Kwon, Alessio Orsino, Ting Dang, Domenico Talia, and Cecilia Mascolo. Tinytta: Efficient test-time adaptation via early-exit ensembles on edge de- vices.Advances in Neural Information Processing Systems, 37:43274–43299, 2024. 3
work page 2024
-
[26]
Membn: Robust test-time adaptation via batch norm with statistics memory
Juwon Kang, Nayeong Kim, Jungseul Ok, and Suha Kwak. Membn: Robust test-time adaptation via batch norm with statistics memory. InEuropean Conference on Computer Vi- sion, pages 467–483. Springer, 2024. 3
work page 2024
-
[27]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 2
work page 2023
-
[28]
Learning what not to segment: A new perspective on few- shot segmentation
Chunbo Lang, Gong Cheng, Binfei Tu, and Junwei Han. Learning what not to segment: A new perspective on few- shot segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 8057–8067, 2022. 2
work page 2022
-
[29]
Chunbo Lang, Gong Cheng, Binfei Tu, Chao Li, and Jun- wei Han. Base and meta: A new perspective on few-shot segmentation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10669–10686, 2023. 2
work page 2023
-
[30]
Yoonho Lee, Annie S Chen, Fahim Tajwar, Ananya Ku- mar, Huaxiu Yao, Percy Liang, and Chelsea Finn. Surgical fine-tuning improves adaptation to distribution shifts.arXiv preprint arXiv:2210.11466, 2022. 2, 5
-
[31]
Cross-domain few-shot se- mantic segmentation
Shuo Lei, Xuchao Zhang, Jianfeng He, Fanglan Chen, Bowen Du, and Chang-Tien Lu. Cross-domain few-shot se- mantic segmentation. InEuropean conference on computer vision, pages 73–90. Springer, 2022. 2, 6
work page 2022
-
[32]
Adaptive prototype learning and allocation for few-shot segmentation
Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, and Joongkyu Kim. Adaptive prototype learning and allocation for few-shot segmentation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8334–8343, 2021. 2
work page 2021
-
[33]
Fss-1000: A 1000-class dataset for few- shot segmentation
Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, and Chi-Keung Tang. Fss-1000: A 1000-class dataset for few- shot segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 2869–2878, 2020. 6, 1
work page 2020
-
[34]
Dual-agent optimization framework for cross- domain few-shot segmentation
Zhaoyang Li, Yuan Wang, Wangkai Li, Tianzhu Zhang, and Xiang Liu. Dual-agent optimization framework for cross- domain few-shot segmentation. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 9849–9859, 2025. 6, 7
work page 2025
-
[35]
Jian Liang, Ran He, and Tieniu Tan. A comprehensive sur- vey on test-time adaptation under distribution shifts.Interna- tional Journal of Computer Vision, 133(1):31–64, 2025. 2, 3
work page 2025
-
[36]
Textual and visual guided task adaptation for source-free cross-domain few-shot segmentation
Jianming Liu, Wenlong Qiu, and Haitao Wei. Textual and visual guided task adaptation for source-free cross-domain few-shot segmentation. InProceedings of the 33rd ACM International Conference on Multimedia, pages 5150–5159,
-
[37]
The devil is in low-level features for cross-domain few-shot seg- mentation
Yuhan Liu, Yixiong Zou, Yuhua Li, and Ruixuan Li. The devil is in low-level features for cross-domain few-shot seg- mentation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 4618–4627, 2025. 2, 6, 7
work page 2025
-
[38]
Simpler is better: Few-shot semantic seg- mentation with classifier weight transformer
Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. Simpler is better: Few-shot semantic seg- mentation with classifier weight transformer. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 8741–8750, 2021. 2
work page 2021
-
[39]
Hypercorrela- tion squeeze for few-shot segmentation
Juhong Min, Dahyun Kang, and Minsu Cho. Hypercorrela- tion squeeze for few-shot segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 6941–6952, 2021. 1, 2, 6
work page 2021
-
[40]
Cross-domain few-shot segmentation via iterative support-query correspon- dence mining
Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C Kot, and Shijian Lu. Cross-domain few-shot segmentation via iterative support-query correspon- dence mining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3380– 3390, 2024. 1, 2, 6
work page 2024
-
[41]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2, 8, 6, 7
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[42]
Hierarchical dense cor- relation distillation for few-shot segmentation
Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, and Jiaya Jia. Hierarchical dense cor- relation distillation for few-shot segmentation. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 23641–23651, 2023. 2
work page 2023
-
[43]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2, 6
work page 2021
-
[44]
Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision trans- formers see like convolutional neural networks?Advances in neural information processing systems, 34:12116–12128,
-
[45]
Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H Chi, et al. Levi: generalizable fine-tuning via layer-wise ensemble of different views.arXiv preprint arXiv:2402.04644, 2024. 2, 5
-
[46]
Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 2, 3, 7, 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Domain-rectifying adapter for cross-domain few-shot segmentation
Jiapeng Su, Qi Fan, Wenjie Pei, Guangming Lu, and Fanglin Chen. Domain-rectifying adapter for cross-domain few-shot segmentation. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 24036– 24045, 2024. 2, 6
work page 2024
-
[48]
Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, and Jiaya Jia. Prior guided feature enrich- ment network for few-shot segmentation.IEEE transactions on pattern analysis and machine intelligence, 44(2):1050– 1065, 2020. 2, 6
work page 2020
-
[49]
Jintao Tong, Yixiong Zou, Yuhua Li, and Ruixuan Li. Lightweight frequency masker for cross-domain few-shot se- mantic segmentation.Advances in Neural Information Pro- cessing Systems, 37:96728–96749, 2024. 1, 2, 6
work page 2024
-
[50]
Jintao Tong, Ran Ma, Yixiong Zou, Guangyao Chen, Yuhua Li, and Ruixuan Li. Adapter naturally serves as decoupler for cross-domain few-shot semantic segmentation.arXiv preprint arXiv:2506.07376, 2025. 6, 7
-
[51]
Jintao Tong, Yixiong Zou, Guangyao Chen, Yuhua Li, and Ruixuan Li. Self-disentanglement and re-composition for cross-domain few-shot segmentation.arXiv preprint arXiv:2506.02677, 2025. 2, 6, 7
-
[52]
Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000 dataset, a large collection of multi-source der- matoscopic images of common pigmented skin lesions.Sci- entific data, 5(1):1–9, 2018. 6, 1, 2
work page 2018
-
[53]
Tent: Fully Test-time Adaptation by Entropy Minimization
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Ol- shausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization.arXiv preprint arXiv:2006.10726,
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[54]
Panet: Few-shot image semantic seg- mentation with prototype alignment
Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. Panet: Few-shot image semantic seg- mentation with prototype alignment. Inproceedings of the IEEE/CVF international conference on computer vision, pages 9197–9206, 2019. 1, 6
work page 2019
-
[55]
Continual test-time domain adaptation
Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. Continual test-time domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022. 3
work page 2022
-
[56]
Adap- tive agent transformer for few-shot segmentation
Yuan Wang, Rui Sun, Zhe Zhang, and Tianzhu Zhang. Adap- tive agent transformer for few-shot segmentation. InEuro- pean conference on computer vision, pages 36–52. Springer,
-
[57]
Jialu Xing, Jianping Liu, Jian Wang, Lulu Sun, Xi Chen, Xunxun Gu, and Yingfei Wang. A survey of efficient fine- tuning methods for vision-language models—prompt and adapter.Computers & Graphics, 119:103885, 2024. 3
work page 2024
-
[58]
Prototype mixture models for few-shot semantic segmentation
Boyu Yang, Chang Liu, Bohao Li, Jianbin Jiao, and Qix- iang Ye. Prototype mixture models for few-shot semantic segmentation. InEuropean conference on computer vision, pages 763–778. Springer, 2020. 6
work page 2020
-
[59]
Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation
Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 9587–9595,
-
[60]
Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. Canet: Class-agnostic segmentation networks with it- erative refinement and attentive few-shot learning. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5217–5226, 2019. 6
work page 2019
-
[61]
Gengwei Zhang, Guoliang Kang, Yi Yang, and Yunchao Wei. Few-shot segmentation via cycle-consistent trans- former.Advances in neural information processing systems, 34:21984–21996, 2021. 1
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.