pith. machine review for the scientific record. sign in

arxiv: 2604.23314 · v1 · submitted 2026-04-25 · 💻 cs.CV

Recognition: unknown

Learning from Noisy Prompts: Saliency-Guided Prompt Distillation for Robust Segmentation with SAM

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords SAMmedical image segmentationnoisy promptsprompt distillationsaliency guidanceMRICTfoundation models
0
0 comments X

The pith

Saliency-guided prompt distillation turns noisy clinical inputs into reliable guidance for the Segment Anything Model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SPD, a framework that adapts SAM for medical segmentation when prompts are coarse or ambiguous, as is common in real clinical data like centerline annotations. It trains a lightweight saliency head on the images to generate anatomical localization maps, then uses those maps to validate and enrich the original prompts by drawing consensus from neighboring slices. A pairwise consistency term across slices keeps the resulting masks anatomically coherent. The approach is motivated by the fact that SAM's zero-shot performance drops sharply without precise prompts, yet such precision is rarely available in practice.

Core claim

SPD first learns data-driven anatomical priors through a lightweight saliency head to obtain confident localization maps. These priors then drive Contextual Prompt Distillation, which validates and enriches noisy prompts using cues from anatomically adjacent slices, producing a consensus prompt set. A Pairwise Slice Consistency objective further enforces local anatomical coherence during segmentation, allowing the method to outperform existing SAM adaptations and supervised baselines on four MRI and CT benchmarks in both region-based and boundary-based metrics.

What carries the argument

The Saliency-Guided Prompt Distillation (SPD) framework, which uses a saliency head to extract anatomical priors and then applies contextual distillation plus slice consistency to convert unreliable prompts into robust inputs.

If this is right

  • Large gains appear in both region-based and boundary-based metrics across four challenging MRI and CT benchmarks.
  • SAM becomes usable in clinical workflows that supply only weak, drifting prompts such as centerlines.
  • Foundation-model deployment becomes feasible in settings where only imperfect annotations exist.
  • Local anatomical coherence is maintained through an explicit pairwise slice consistency objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same distillation pattern could be applied to other promptable foundation models facing noisy inputs.
  • Bootstrapping the saliency head from even fewer labels might reduce annotation burden further.
  • Extending the slice-consistency idea to full 3D volumes could improve volumetric segmentation stability.

Load-bearing premise

A lightweight saliency head trained on the available data can produce reliable anatomical priors capable of validating and enriching noisy prompts via consensus from anatomically adjacent slices.

What would settle it

A new test set of medical volumes where the saliency head produces priors that do not improve prompt quality and SPD shows no gains over unmodified SAM would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.23314 by Alexander Harry Fitzhugh, Bernhard Kainz, Chen Qin, Jingxuan Kang, Nikesh Jathanna, Phillip Lung, Shahnaz Jamil-Copley, Shaoming Zheng, Shuang Li, Uday Bharat Patel, Yusuf Kiberu, Ziqi Zhang.

Figure 1
Figure 1. Figure 1: Noisy Prompts. Radiologists’ centerline annotations (red points) and ground-truth masks (green semi-transparent re￾gions) on (left) an axial and (right) a coronal abdominal MRI view of the terminal ileum (TI). Prompts are derived from real clinical annotations, which are not TI-specific: some TI regions remain unannotated despite being part of the green ground-truth masks, while several annotations extend … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our SPD framework. Stage I (Anatomical Prior Learning): A lightweight saliency head is trained alongside LoRA-adapted encoder features to generate a high-confidence saliency map, which serves as a reliable anatomical prior. Stage II (Prompt￾Guided Segmentation): Saliency map guides the Contextual Prompt Distillation module to filter local prompts and integrate consistent cues from neighboring s… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison. Each column corresponds to a different method. The ground truth is overlaid as a semi-transparent green mask, while predictions are outlined in red. The top two rows present challenging axial and coronal views from the TI dataset, and the bottom three rows show results on the Scar, FUMPE and KiTS datasets. Crohn’s Disease Terminal Ileum Segmentation Dataset (TI). We utilize a T2-wei… view at source ↗
Figure 5
Figure 5. Figure 5: Zero-shot performance of a frozen SAM model guided by different prompt sources. “1”, “3”, and “5” denote the number of randomly sampled points from the original center￾line annotations, “Full” indicates using all centerline points, and “Consensus” refers to our proposed consensus prompts. SPD directly addresses the prompt sensitivity bottleneck. By distilling reliable guidance from noisy prompts via our sa… view at source ↗
Figure 6
Figure 6. Figure 6: Impact of guidance source in PSC on segmentation performance. “No PSC” omits consistency enforcement. “Prev Only” and “Next Only” enforce anatomical coherence with the preceding or succeeding slice, respectively. “Bi-directional” in￾tegrates both directions. pendix C. Hyperparameter Sensitivity Analysis. We further inves￾tigate the impact of guidance source in the Pairwise Slice Consistency mechanism by va… view at source ↗
Figure 7
Figure 7. Figure 7: Ablation on the number of contextual slices n in CPD on the TI dataset. A. Implementation Details We implement SPD using the PyTorch framework and con￾duct all experiments on 8 NVIDIA RTX 4090 GPUs, with a per-GPU batch size of 4. The model is trained using the Adam optimizer with an initial learning rate of 10−4 and weight decay of 10−5 . To stabilize training, we use the ReduceLROnPlateau learning rate s… view at source ↗
Figure 9
Figure 9. Figure 9: Comparison with recent 3D SAM-based baselines on view at source ↗
read the original abstract

Segmentation is central to clinical diagnosis and monitoring, yet the reliability of modern foundation models in medical imaging still depends on the availability of precise prompts. The Segment Anything Model (SAM) offers powerful zero-shot capabilities, although it collapses under the weak, generic, and noisy prompts that dominate real clinical workflows. In practice, annotations such as centerline points are coarse and ambiguous, often drifting across neighboring anatomy and misguiding SAM toward inconsistent or incomplete masks. We introduce SPD, a Saliency-Guided Prompt Distillation framework that converts these unreliable cues into robust guidance. SPD first learns data-driven anatomical priors through a lightweight saliency head to obtain confident localization maps. These priors then drive Contextual Prompt Distillation, which validates and enriches noisy prompts using cues from anatomically adjacent slices, producing a consensus prompt set that matches the behavior of expert reasoning. A Pairwise Slice Consistency objective further enforces local anatomical coherence during segmentation. Experiments on four challenging MRI and CT benchmarks demonstrate that SPD consistently outperforms existing SAM adaptations and supervised baselines, delivering large gains in both region-based and boundary-based metrics. SPD provides a practical and principled path toward reliable foundation model deployment in clinical environments where only imperfect prompts are available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Saliency-Guided Prompt Distillation (SPD), a framework to improve SAM-based segmentation in medical images when only noisy, coarse prompts (e.g., centerlines) are available. SPD trains a lightweight saliency head on the data to produce anatomical localization maps, applies contextual prompt distillation that enriches prompts via consensus from adjacent slices, and adds a pairwise slice consistency loss. Experiments on four MRI and CT benchmarks are reported to show consistent outperformance versus existing SAM adaptations and supervised baselines, with large gains in both region-based and boundary-based metrics.

Significance. If the empirical gains hold and the saliency priors prove reliable, the work addresses a genuine practical gap: deploying foundation models like SAM in clinical workflows where expert-level prompts are unavailable. The distillation-via-consensus idea is a reasonable extension of prompt engineering and slice-wise coherence priors. No machine-checked proofs or parameter-free derivations are present, but the method is presented as data-driven and falsifiable via the benchmark comparisons.

major comments (2)
  1. [Experiments] Experiments section: the central claim of 'consistent outperformance' and 'large gains' on four benchmarks is load-bearing for the paper's contribution, yet the abstract and summary provide no numerical values, standard deviations, baseline details, or statistical significance tests. Without these, the magnitude and robustness of the reported improvements cannot be evaluated.
  2. [Method] Method section (saliency head and Contextual Prompt Distillation): the framework assumes the lightweight saliency head produces reliable anatomical priors that validate/enrich noisy prompts via adjacent-slice consensus. No quantitative assessment of the head's localization accuracy (e.g., Dice or IoU overlap with ground-truth anatomy) or analysis of failure modes (pathology-induced inconsistency, slice-thickness variation) is supplied. If these priors are inaccurate, the distillation and consistency objectives risk amplifying rather than correcting prompt noise, directly undermining the outperformance claim.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., average Dice improvement) to support the qualitative claims of 'large gains'.
  2. [Method] Notation for the saliency map and distilled prompt set should be introduced with explicit equations or a diagram to improve readability of the distillation step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the two major comments point by point below, indicating the changes we will incorporate in the revised manuscript.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the central claim of 'consistent outperformance' and 'large gains' on four benchmarks is load-bearing for the paper's contribution, yet the abstract and summary provide no numerical values, standard deviations, baseline details, or statistical significance tests. Without these, the magnitude and robustness of the reported improvements cannot be evaluated.

    Authors: We agree that the abstract would benefit from explicit numerical support for the performance claims. The experiments section of the manuscript already contains full tables with per-dataset Dice, HD95, and other metrics (including standard deviations across multiple runs and comparisons to all listed baselines). In the revision we will add a concise summary of the key quantitative gains and statistical significance results directly into the abstract. revision: yes

  2. Referee: [Method] Method section (saliency head and Contextual Prompt Distillation): the framework assumes the lightweight saliency head produces reliable anatomical priors that validate/enrich noisy prompts via adjacent-slice consensus. No quantitative assessment of the head's localization accuracy (e.g., Dice or IoU overlap with ground-truth anatomy) or analysis of failure modes (pathology-induced inconsistency, slice-thickness variation) is supplied. If these priors are inaccurate, the distillation and consistency objectives risk amplifying rather than correcting prompt noise, directly undermining the outperformance claim.

    Authors: The referee correctly identifies a gap: while the saliency head is trained jointly and its utility is shown through end-to-end segmentation gains, we did not report an isolated localization accuracy evaluation (Dice/IoU against ground-truth anatomy) or a dedicated failure-mode analysis. We will add a new ablation subsection that quantifies the saliency maps' overlap with ground truth and discusses robustness under pathology and slice-thickness variation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical pipeline: a lightweight saliency head is trained on the available data to produce localization maps, which then inform Contextual Prompt Distillation and a Pairwise Slice Consistency loss. These steps are presented as independent, data-driven components rather than tautological definitions or predictions that reduce to their own fitted inputs by construction. The central claims rest on experimental outperformance across four MRI/CT benchmarks, with no load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatz smuggling. The method does not claim first-principles derivations that collapse to the inputs; it is a standard supervised adaptation of SAM with added modules whose validity is assessed externally via benchmark metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about the existence of learnable anatomical priors and cross-slice consistency in medical images; no free parameters or invented physical entities are described in the abstract.

axioms (2)
  • domain assumption Medical images contain learnable anatomical priors that a lightweight saliency head can extract to produce confident localization maps
    Invoked as the first step of SPD to obtain priors from noisy data
  • domain assumption Anatomically adjacent slices contain consistent cues that can validate and enrich noisy prompts into a consensus set
    Central to Contextual Prompt Distillation and Pairwise Slice Consistency

pith-pipeline@v0.9.0 · 5562 in / 1437 out tokens · 81678 ms · 2026-05-08T08:32:58.932784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    Maier-Hein, Peter M

    Olivier Bernard, Alain Lalande, Clement Zotti, Freder- ick Cervenansky, Xin Yang, Pheng-Ann Heng, Irem Cetin, Karim Lekadir, Oscar Camara, Miguel Angel Gonza- lez Ballester, Gerard Sanroma, Sandy Napel, Steffen Pe- tersen, Georgios Tziritas, Elias Grinias, Mahendra Khened, Varghese Alex Kollerathu, Ganapathy Krishnamurthi, Marc- Michel Roh ´e, Xavier Penn...

  2. [2]

    Swin-unet: Unet-like pure transformer for medical image segmentation

    Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xi- aopeng Zhang, Qi Tian, and Manning Wang. Swin-unet: Unet-like pure transformer for medical image segmentation. InEuropean conference on computer vision, pages 205–218. Springer, 2022. 2

  3. [3]

    Ma-sam: Modality-agnostic sam adaptation for 3d med- ical image segmentation, 2023

    Cheng Chen, Juzheng Miao, Dufan Wu, Zhiling Yan, Sekeun Kim, Jiang Hu, Aoxiao Zhong, Zhengliang Liu, Lichao Sun, Xiang Li, Tianming Liu, Pheng-Ann Heng, and Quanzheng Li. Ma-sam: Modality-agnostic sam adaptation for 3d med- ical image segmentation, 2023. 2

  4. [4]

    Yuille, and Yuyin Zhou

    Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L. Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation, 2021. 1, 7

  5. [5]

    Sam on medical images: A comprehensive study on three prompt modes.arXiv preprint arXiv:2305.00035, 2023

    Dongjie Cheng, Ziyuan Qin, Zekun Jiang, Shaoting Zhang, Qicheng Lao, and Kang Li. Sam on medical images: A comprehensive study on three prompt modes.arXiv preprint arXiv:2305.00035, 2023. 2

  6. [6]

    Sam-med2d, 2023

    Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, and Yu Qiao. Sam-med2d, 2023. 2

  7. [7]

    Esr essentials: a step-by-step guide of segmentation for radiologists-practice recommen- dations by the european society of medical imaging infor- matics.European radiology, 2025

    Kalina Chupetlovska, Tugba Akinci D’Antonoli, Zuhir Bodalal, Mohamed Abdelatty, Hendrik Erenstein, Jo ˜ao Santinha, Merel Huisman, Jacob Visser, Stefano Trebeschi, and Kevin Groot Lipman. Esr essentials: a step-by-step guide of segmentation for radiologists-practice recommen- dations by the european society of medical imaging infor- matics.European radiol...

  8. [8]

    Assessing the inflammatory severity of the terminal ileum in crohn disease using radiomics based on mri.BMC Medical Imaging, 22, 2022

    Honglei Ding, Kefang Jiang, Chen Gao, Liangji Lu, Huani Zhang, Haibo Chen, Xuning Gao, Kefeng Zhou, and Zhichao Sun. Assessing the inflammatory severity of the terminal ileum in crohn disease using radiomics based on mri.BMC Medical Imaging, 22, 2022. 2

  9. [9]

    Few-shot medical image segmentation with cycle- resemblance attention

    Hao Ding, Changchang Sun, Hao Tang, Dawen Cai, and Yan Yan. Few-shot medical image segmentation with cycle- resemblance attention. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2488–2497, 2023. 2

  10. [10]

    Stable segment anything model.arXiv preprint arXiv:2311.15776, 2023

    Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, and Chi- Keung Tang. Stable segment anything model.arXiv preprint arXiv:2311.15776, 2023. 2

  11. [11]

    The kits21 chal- lenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct, 2023

    Nicholas Heller, Fabian Isensee, Dasha Trofimova, Re- sha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jian- peng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee S...

  12. [12]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 3

  13. [13]

    Learning to prompt segment any- thing models.arXiv preprint arXiv:2401.04651, 2024

    Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu, and Eric Xing. Learning to prompt segment any- thing models.arXiv preprint arXiv:2401.04651, 2024. 2

  14. [14]

    Segment anything model for medical images?Medical Image Analysis, 92:103061, 2024

    Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, et al. Segment anything model for medical images?Medical Image Analysis, 92:103061, 2024. 2

  15. [15]

    Oxford University Press, 2002

    David A Isenberg and Peter Renton.Imaging in Rheumatol- ogy. Oxford University Press, 2002. 2, 3

  16. [16]

    Jaeger, Simon Kohl, Jakob Wasserthal, Gregor Koehler, Tobias Norajitra, Sebastian Wirkert, and Klaus H

    Fabian Isensee, Jens Petersen, Andre Klein, David Zim- merer, Paul F. Jaeger, Simon Kohl, Jakob Wasserthal, Gregor Koehler, Tobias Norajitra, Sebastian Wirkert, and Klaus H. Maier-Hein. nnu-net: Self-adapting framework for u-net- based medical image segmentation, 2018. 1, 2, 7

  17. [17]

    Copycats: the many lives of a publicly available medical imaging dataset, 2024

    Amelia Jim ´enez-S´anchez, Natalia-Rozalia Avlona, Dovile Juodelyte, Th ´eo Sourget, Caroline Vang-Larsen, Anna Rogers, Hubert Dariusz Zajac, and Veronika Cheplygina. Copycats: the many lives of a publicly available medical imaging dataset, 2024. 1

  18. [18]

    Deep learning with noisy labels: Exploring tech- niques and remedies in medical image analysis.Medical im- age analysis, 65:101759, 2020

    Davood Karimi, Haoran Dou, Simon K Warfield, and Ali Gholipour. Deep learning with noisy labels: Exploring tech- niques and remedies in medical image analysis.Medical im- age analysis, 65:101759, 2020. 3

  19. [19]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 4015–4026, 2023. 1, 2, 3, 7

  20. [20]

    Correia, Xue Feng, Kibrom B

    Alain Lalande, Zhihao Chen, Thibaut Pommier, Thomas Decourselle, Abdul Qayyum, Michel Salomon, Dominique Ginhac, Youssef Skandarani, Arnaud Boucher, Khawla Brahim, Marleen de Bruijne, Robin Camarasa, Teresa M. Correia, Xue Feng, Kibrom B. Girum, Anja Hennemuth, Markus Huellebrand, Raabid Hussain, Matthias Ivantsits, Jun Ma, Craig Meyer, Rishabh Sharma, Ji...

  21. [21]

    Self- supervised alignment learning for medical image segmenta- tion

    Haofeng Li, Yiming Ouyang, and Xiang Wan. Self- supervised alignment learning for medical image segmenta- tion. In2024 IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2024. 5

  22. [22]

    Scribblevc: Scribble-supervised medical im- age segmentation with vision-class embedding

    Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, and Qingqi Hong. Scribblevc: Scribble-supervised medical im- age segmentation with vision-class embedding. InProceed- ings of the 31st ACM International Conference on Multime- dia, pages 3384–3393, 2023. 1

  23. [23]

    Few shot medical image segmentation with cross attention transformer

    Yi Lin, Yufan Chen, Kwang-Ting Cheng, and Hao Chen. Few shot medical image segmentation with cross attention transformer. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 233–

  24. [24]

    Samrefiner: Taming segment anything model for universal mask refine- ment, 2025

    Yuqi Lin, Hengjia Li, Wenqi Shao, Zheng Yang, Jun Zhao, Xiaofei He, Ping Luo, and Kaipeng Zhang. Samrefiner: Taming segment anything model for universal mask refine- ment, 2025. 3, 7

  25. [25]

    Adaptive early-learning correc- tion for segmentation from noisy annotations

    Sheng Liu, Kangning Liu, Weicheng Zhu, Yiqiu Shen, and Carlos Fernandez-Granda. Adaptive early-learning correc- tion for segmentation from noisy annotations. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2606–2616, 2022. 3

  26. [26]

    Segment anything in medical images.Nature Communications, 15(1), 2024

    Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature Communications, 15(1), 2024. 2, 5, 7

  27. [27]

    Medsam2: Segment anything in 3d medical images and videos.arXiv preprint arXiv:2504.03600, 2025

    Jun Ma, Zongxin Yang, Sumin Kim, Bihui Chen, Mo- hammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, and Bo Wang. Medsam2: Segment any- thing in 3d medical images and videos.arXiv preprint arXiv:2504.03600, 2025. 7, 13

  28. [28]

    A new dataset of computed-tomography angiography images for computer-aided detection of pulmonary embolism.Scientific Data, 5(180180):1–9, 2018

    Mojtaba Masoudi, Hamid-Reza Pourreza, Mahdi Saadatmand-Tarzjan, Noushin Eftekhari, Fateme Shafiee Zargar, and Masoud Pezeshki Rad. A new dataset of computed-tomography angiography images for computer-aided detection of pulmonary embolism.Scientific Data, 5(180180):1–9, 2018. 6

  29. [29]

    Cross prompting con- sistency with segment anything model for semi-supervised medical image segmentation

    Juzheng Miao, Cheng Chen, Keli Zhang, Jie Chuai, Quanzheng Li, and Pheng-Ann Heng. Cross prompting con- sistency with segment anything model for semi-supervised medical image segmentation. InInternational Conference on Medical Image Computing and Computer-Assisted Inter- vention, pages 167–177. Springer, 2024. 2

  30. [30]

    Ozan Oktay, Jo Schlemper, Lo ¨ıc Le Folgoc, Matthew C. H. Lee, Mattias P. Heinrich, Kazunari Misawa, Kensaku Mori, Steven G. McDonagh, Nils Y . Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention u-net: Learn- ing where to look for the pancreas.CoRR, abs/1804.03999,

  31. [31]

    Medical image segmentation with limited supervision: A review of deep network models, 2021

    Jialin Peng and Ye Wang. Medical image segmentation with limited supervision: A review of deep network models, 2021. 1

  32. [32]

    Benchmarking hu- man and automated prompting in the segment anything model

    Jorge Quesada, Zoe Fowler, Mohammad Alotaibi, Mohit Prabhushankar, and Ghassan AlRegib. Benchmarking hu- man and automated prompting in the segment anything model. In2024 IEEE International Conference on Big Data (BigData), pages 1625–1634. IEEE, 2024. 2

  33. [33]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2

  34. [34]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 2, 7, 13

  35. [35]

    U-net: Convolutional networks for biomedical image segmentation,

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation,

  36. [36]

    arXiv preprint arXiv:2402.02491 , year=

    Jiacheng Ruan, Jincheng Li, and Suncheng Xiang. Vm-unet: Vision mamba unet for medical image segmentation.arXiv preprint arXiv:2402.02491, 2024. 1

  37. [37]

    Autosam: Adapting sam to medical images by overloading the prompt encoder, 2023

    Tal Shaharabany, Aviad Dahan, Raja Giryes, and Lior Wolf. Autosam: Adapting sam to medical images by overloading the prompt encoder, 2023. 2

  38. [38]

    Learning from noisy labels with deep neural networks: A survey.IEEE transactions on neural networks and learning systems, 34(11):8135–8153, 2022

    Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. Learning from noisy labels with deep neural networks: A survey.IEEE transactions on neural networks and learning systems, 34(11):8135–8153, 2022. 3

  39. [39]

    Few-shot medical image segmentation with high-fidelity prototypes.Medical Image Analysis, 100:103412, 2025

    Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, and Xiatian Zhu. Few-shot medical image segmentation with high-fidelity prototypes.Medical Image Analysis, 100:103412, 2025. 2

  40. [40]

    Surface-GCN: Learning interaction experi- ence for organ segmentation in 3D medical images.Medical Physics, 50(8):5030–5044, 2023

    Fengrui Tian, Zhiqiang Tian, Zhang Chen, Dong Zhang, and Shaoyi Du. Surface-GCN: Learning interaction experi- ence for organ segmentation in 3D medical images.Medical Physics, 50(8):5030–5044, 2023. 2, 3

  41. [41]

    Navigating data scarcity using foundation models: A benchmark of few- shot and zero-shot learning approaches in medical imaging

    Stefano Woerner and Christian F Baumgartner. Navigating data scarcity using foundation models: A benchmark of few- shot and zero-shot learning approaches in medical imaging. InInternational Workshop on Foundation Models for Gen- eral Medical AI, pages 30–39. Springer, 2024. 1

  42. [42]

    Medical sam adapter: Adapting seg- ment anything model for medical image segmentation, 2023

    Junde Wu, Wei Ji, Yuanpei Liu, Huazhu Fu, Min Xu, Yanwu Xu, and Yueming Jin. Medical sam adapter: Adapting seg- ment anything model for medical image segmentation, 2023. 2, 7

  43. [43]

    Self-prompting large vision models for few-shot medical image segmenta- tion

    Qi Wu, Yuyao Zhang, and Marawan Elbatel. Self-prompting large vision models for few-shot medical image segmenta- tion. InMICCAI workshop on domain adaptation and rep- resentation transfer, pages 156–167. Springer, 2023. 2

  44. [44]

    Mambavesselnet++: A hybrid cnn-mamba architecture for medical image segmen- tation.arXiv preprint arXiv:2507.19931, 2025

    Qing Xu, Yanming Chen, Yue Li, Ziyu Liu, Zhenye Lou, Yixuan Zhang, and Xiangjian He. Mambavesselnet++: A hybrid cnn-mamba architecture for medical image segmen- tation.arXiv preprint arXiv:2507.19931, 2025. 2

  45. [45]

    Keypoint-augmented self-supervised learn- ing for medical image segmentation with limited annota- 10 tion.Advances in Neural Information Processing Systems, 36:60724–60747, 2023

    Zhangsihao Yang, Mengwei Ren, Kaize Ding, Guido Gerig, and Yalin Wang. Keypoint-augmented self-supervised learn- ing for medical image segmentation with limited annota- 10 tion.Advances in Neural Information Processing Systems, 36:60724–60747, 2023. 2

  46. [46]

    Characterizing label errors: confident learning for noisy-labeled image segmentation

    Minqing Zhang, Jiantao Gao, Zhen Lyu, Weibing Zhao, Qin Wang, Weizhen Ding, Sheng Wang, Zhen Li, and Shuguang Cui. Characterizing label errors: confident learning for noisy-labeled image segmentation. InInternational confer- ence on medical image computing and computer-assisted in- tervention, pages 721–730. Springer, 2020. 3

  47. [47]

    Unet++: A nested u-net architecture for medical image segmentation, 2018

    Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. Unet++: A nested u-net architecture for medical image segmentation, 2018. 7

  48. [48]

    Models genesis: Generic autodidactic models for 3d medical image analysis

    Zongwei Zhou, Vatsal Sodha, Md Mahfuzur Rahman Sid- diquee, Ruibin Feng, Nima Tajbakhsh, Michael B Gotway, and Jianming Liang. Models genesis: Generic autodidactic models for 3d medical image analysis. InInternational con- ference on medical image computing and computer-assisted intervention, pages 384–393. Springer, 2019. 2

  49. [49]

    confidence valley

    Chengxu Zhuang, Xuehao Ding, Divyanshu Murli, and Daniel Yamins. Local label propagation for large-scale semi- supervised learning.arXiv preprint arXiv:1905.11581, 2019. 1 11 Supplementary Material 0 1 2 3 55 60 65 70 75Segmentation Performance (%) 67.0 73.8 73.6 71.5 60.0 66.4 66.2 64.7 Ablation on Number of Contextual Slices DSC (%) IoU (%) Figure 7.Abl...