pith. sign in

arxiv: 2604.18225 · v2 · submitted 2026-04-20 · 💻 cs.CV · cs.AI

Is SAM3 ready for pathology segmentation?

Pith reviewed 2026-05-14 21:51 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords SAM3pathology segmentationprompt engineeringzero-shot segmentationfew-shot learningmedical imagingimage segmentation
0
0 comments X

The pith

SAM3 requires pathology-specific adaptation for effective image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates whether SAM3 can segment pathology images using its promptable capabilities. It tests zero-shot, few-shot, and supervised settings with text and visual prompts on datasets like NuInsSeg, PanNuke, and GlaS. The findings indicate that text prompts alone do not effectively activate nuclear concepts in pathology images. Performance varies greatly with the type and amount of visual prompts, few-shot learning provides some improvement but the model remains sensitive to noisy prompts, and there is a notable performance gap compared to models adapted with task-specific training.

Core claim

SAM3's prompt-based segmentation shows limitations in pathology: text-only prompts poorly activate nuclear concepts, results are sensitive to visual prompt types and budgets, few-shot offers gains but lacks robustness to noise, and prompt-based approaches lag behind task-trained adapters.

What carries the argument

The systematic evaluation protocol that assesses SAM3 under zero-shot, few-shot, and supervised settings with varying prompting strategies on pathological datasets.

If this is right

  • Text-only prompts are insufficient for activating concepts like nuclei in pathology images.
  • Visual prompt selection and budget critically affect segmentation performance.
  • Few-shot learning improves results but does not eliminate sensitivity to prompt noise.
  • Task-specific adapters outperform pure prompt-based usage by a significant margin.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Pathology segmentation may benefit from hybrid approaches combining prompts with light adaptation.
  • The observed sensitivities suggest testing SAM3 on other medical imaging modalities with similar prompt strategies.
  • Future work could explore automated prompt optimization to reduce reliance on manual visual prompts.

Load-bearing premise

The selected pathology datasets and specific prompt budgets and noise levels accurately capture the overall performance potential of SAM3 without favoring the identified limitations.

What would settle it

Demonstrating that SAM3 with optimized text and visual prompts achieves segmentation accuracy comparable to task-trained adapters on a new set of pathology images would falsify the claim of a significant gap.

read the original abstract

Is Segment Anything Model 3 (SAM3) capable in segmenting Any Pathology Images? Digital pathology segmentation spans tissue-level and nuclei-level scales, where traditional methods often suffer from high annotation costs and poor generalization. SAM3 introduces Promptable Concept Segmentation, offering a potential automated interface via text prompts. With this work, we propose a systematic evaluation protocol to explore the capability space of SAM3 in a structured manner. Specifically, we evaluate SAM3 under different supervision settings including zero-shot, few-shot, and supervised with varying prompting strategies. Our extensive evaluation on pathological datasets including NuInsSeg, PanNuke and GlaS, reveals that: 1.text-only prompts poorly activate nuclear concepts; 2.performance is highly sensitive to visual prompt types and budgets; 3.few-shot learning offers gains, but SAM3 lacks robustness against visual prompt noise; and 4.a significant gap persists between prompt-based usage and task-trained adapter-based reference. Our study delineates SAM3's boundaries in pathology image segmentation and provides practical guidance on the necessity of pathology domain adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript conducts a systematic empirical evaluation of Segment Anything Model 3 (SAM3) on pathology segmentation tasks using the NuInsSeg, PanNuke, and GlaS datasets. It tests zero-shot, few-shot, and supervised regimes with text-only and visual prompts of varying types and budgets, reporting that text prompts fail to activate nuclear concepts, performance is highly sensitive to visual prompt choices and budgets, few-shot yields gains but SAM3 is not robust to visual prompt noise, and a large gap remains relative to task-trained adapter baselines. The work concludes that SAM3 requires pathology-specific domain adaptation for reliable use.

Significance. If the empirical findings hold under representative conditions, the study is significant for mapping the practical boundaries of a general-purpose foundation model in digital pathology, a domain where annotation costs are high and generalization is critical. It supplies concrete guidance on prompt strategies and the necessity of adaptation, which can inform both model developers and end users considering SAM3 for nuclei- or tissue-level segmentation.

major comments (2)
  1. [Abstract] Abstract and Evaluation Protocol: the four headline findings depend on the unstated assumption that the chosen prompt budgets, visual prompt types, and injected noise levels on NuInsSeg/PanNuke/GlaS are representative of typical pathology workflows; no justification, ablation, or sensitivity analysis is supplied for these protocol choices, leaving the claims of high sensitivity (finding 2), lack of robustness (finding 3), and the 'significant gap' (finding 4) vulnerable to selection bias.
  2. [Methods] Methods and Results: the abstract asserts 'systematic tests' and a 'significant gap' to adapter-based references, yet full details on exact metrics (e.g., Dice, IoU), statistical significance testing, number of runs, and how the task-trained adapter baselines were trained and evaluated are absent, preventing verification that the reported performance differences are load-bearing rather than protocol artifacts.
minor comments (1)
  1. [Abstract] Abstract: the phrasing 'text-only prompts poorly activate nuclear concepts' would be clearer if accompanied by a brief quantitative illustration (e.g., mean Dice under text vs. visual prompts).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our evaluation of SAM3 for pathology segmentation. The comments highlight important areas for improving clarity and rigor, and we address each point below with plans for revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Evaluation Protocol: the four headline findings depend on the unstated assumption that the chosen prompt budgets, visual prompt types, and injected noise levels on NuInsSeg/PanNuke/GlaS are representative of typical pathology workflows; no justification, ablation, or sensitivity analysis is supplied for these protocol choices, leaving the claims of high sensitivity (finding 2), lack of robustness (finding 3), and the 'significant gap' (finding 4) vulnerable to selection bias.

    Authors: We acknowledge that the manuscript would be strengthened by explicit justification for the protocol choices. The prompt budgets (ranging from 1 to 5 points/lines) and noise injection levels were selected to reflect realistic variability in user-provided prompts during pathology annotation, drawing from common practices in interactive segmentation literature. To fully address the concern, we will add a dedicated paragraph in the Methods section explaining this rationale and include a sensitivity analysis that varies budgets and noise levels, confirming that the core findings on sensitivity and the performance gap remain consistent. revision: yes

  2. Referee: [Methods] Methods and Results: the abstract asserts 'systematic tests' and a 'significant gap' to adapter-based references, yet full details on exact metrics (e.g., Dice, IoU), statistical significance testing, number of runs, and how the task-trained adapter baselines were trained and evaluated are absent, preventing verification that the reported performance differences are load-bearing rather than protocol artifacts.

    Authors: We agree that these details are necessary for full reproducibility and to substantiate the reported gap. The evaluation uses both Dice coefficient and IoU as primary metrics, with all results averaged across 5 independent runs and statistical significance assessed via paired t-tests. The adapter baselines were implemented following the standard supervised fine-tuning procedure on the training splits of NuInsSeg, PanNuke, and GlaS using the same data splits and augmentation pipeline as the prompt-based experiments. We will expand the Methods and Results sections to include these specifics explicitly, along with the exact hyperparameter settings for the adapters. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking with direct measurements

full rationale

The paper conducts an empirical evaluation of SAM3 on external pathology datasets (NuInsSeg, PanNuke, GlaS) under zero-shot, few-shot, and supervised settings with explicit prompting strategies. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the load-bearing claims. All four headline findings are direct experimental outcomes measured against reference adapters and ground-truth annotations, with no reduction to the paper's own inputs by construction. The evaluation protocol is described transparently without invoking uniqueness theorems or ansatzes from prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard computer-vision evaluation assumptions and public datasets; no new entities or fitted parameters are introduced beyond typical prompt-engineering choices.

axioms (1)
  • domain assumption Standard segmentation metrics and dataset splits adequately capture model capability in pathology.
    Invoked implicitly when reporting performance gaps on NuInsSeg, PanNuke, and GlaS.

pith-pipeline@v0.9.0 · 5489 in / 1305 out tokens · 45895 ms · 2026-05-14T21:51:14.218654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 3 internal anchors

  1. [1]

    Our study delineates SAM3’s boundaries in pathology image segmentation and provides practical guidance on the necessity of pathology domain adaptation

    performance is highly sensitive to visual prompt types and budgets; 3) few-shot learning offers gains, but SAM3 lacks robustness against visual prompt noise; and 4) a significant gap persists between prompt-based usage and task-trained adapter- based reference. Our study delineates SAM3’s boundaries in pathology image segmentation and provides practical g...

  2. [2]

    Is SAM3 ready for pathology segmentation?

    INTRODUCTION Digital pathology images contain rich morphological infor- mation, ranging frommacro tissue-level structurestomicro nuclei-level entities[ 1]. Precise segmentation of these macro and micro histological entities is a fundamental task in compu- tational pathology. In practice, whole-slide and region-level analysis is typically carried out by ti...

  3. [3]

    However, acquiring large-scale and consistent expert annotations is costly, motivating increasing interest in few-shot learning for pathology segmentation

    RELA TED WORKS In the field of pathology segmentation, fully supervised learn- ing remains the prevailing paradigm, leveraging dense pixel- wise annotations to capture intricate morphological representa- tions [9]. However, acquiring large-scale and consistent expert annotations is costly, motivating increasing interest in few-shot learning for pathology ...

  4. [4]

    All experiments follow the standard inference pipeline of SAM3

    EV ALUA TION For the evaluation, we do not modify SAM3 in terms of its model architecture, prompting interaction, and decoding strat- egy. All experiments follow the standard inference pipeline of SAM3. Given an image Iq and a prompt P , SAM3 produces a segmentation mask: ˆM=f SAM3(Iq, P).(1) As shown in Fig 2, different evaluation only varies in terms of...

  5. [5]

    Epithelial cell

    EXPERIMENTS 4.1. Experimental setup Datasets.We evaluate SAM3 on three representative histopathology benchmarks covering nuclei-level and tissue- level segmentation. NuInsSeg [6] contains 665 H&E-stained image patches (512 ×512) with over 30,000 manually seg- mented nuclei from 31 human and mouse organs, and is used for single-class nuclei segmentation. P...

  6. [6]

    CONCLUSION We systematically evaluate SAM3 on nuclei- and tissue-level histopathology benchmarks under text and visual prompting. We find that text prompting is unreliable in pathology, and while stronger visual prompts substantially improve perfor- mance, a clear gap remains to supervised SAM3 adapter-based adaptation. Moreover, even after adaptation, SA...

  7. [7]

    Recent advances of deep learning for computational histopathology: principles and applications,

    Yawen Wu, Michael Cheng, Shuo Huang, Zongxiang Pei, Yingli Zuo, Jianxin Liu, Kai Yang, Qi Zhu, Jie Zhang, Honghai Hong, Daoqiang Zhang, Kun Huang, Liang Cheng, and Wei Shao, “Recent advances of deep learning for computational histopathology: principles and applications,”Cancers, vol. 14, no. 5, pp. 1199, 2022

  8. [8]

    Segment anything,

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick, “Segment anything,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

  9. [9]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Rong- hang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R ¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feichtenhofer, “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2...

  10. [10]

    SAM 3: Segment Anything with Concepts

    Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoub- hik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, An- drew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman R ¨adle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Li...

  11. [11]

    Sam3-adapter: Effi- cient adaptation of segment anything 3 for camouflage object segmentation, shadow detection, and medical im- age segmentation,

    Tianrun Chen, Runlong Cao, Xinda Yu, Lanyun Zhu, Chaotao Ding, Deyi Ji, Cheng Chen, Qi Zhu, Chunyan Xu, Papa Mao, and Ying Zang, “Sam3-adapter: Effi- cient adaptation of segment anything 3 for camouflage object segmentation, shadow detection, and medical im- age segmentation,”arXiv preprint arXiv:2511.19425, 2025

  12. [12]

    Nuinsseg: a fully annotated dataset for nuclei instance segmentation in h&e-stained histological images,

    Amirreza Mahbod, Christine Polak, Katharina Feldmann, Rumsha Khan, Katharina Gelles, Georg Dorffner, Ra- mona Woitek, Sepideh Hatamikia, and Isabella Ellinger, “Nuinsseg: a fully annotated dataset for nuclei instance segmentation in h&e-stained histological images,”Sci- entific Data, vol. 11, no. 1, pp. 295, 2024

  13. [13]

    Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification,

    Jevgenij Gamper, Navid Alemi Koohbanani, Ksenija Benes, Ali Khuram, and Nasir Rajpoot, “Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification,” inEuropean Congress on Digital Pathology. Springer, 2019, pp. 11–19

  14. [14]

    Gland segmentation in colon histology images: The glas challenge contest,

    Korsuk Sirinukunwattana, Josien P.W. Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang, Bogdan J. Matuszewski, Elia Bruni, Urko Sanchez, Anton B¨ohm, Olaf Ronneberger, Bassem Ben Cheikh, Daniel Racoceanu, Philipp Kainz, Michael Pfeif- fer, Martin Urschler, David R.J. Snead, and Nasir M. Rajpoot, “Gland segmentation in colon histology ...

  15. [15]

    A robust image segmentation and synthesis pipeline for histopathology,

    Muhammad Jehanzaib, Yasin Almalioglu, Kut- sev Bengisu Ozyoruk, Drew FK Williamson, Talha Abdullah, Kayhan Basak, Derya Demir, G Evren Keles, Kashif Zafar, and Mehmet Turan, “A robust image segmentation and synthesis pipeline for histopathology,” Medical Image Analysis, vol. 99, pp. 103344, 2025

  16. [16]

    Few-shot learning for annotation- efficient nucleus instance segmentation,

    Yu Ming, Zihao Wu, Jie Yang, Danyi Li, Yuan Gao, Changxin Gao, Gui-Song Xia, Yuanqing Li, Li Liang, and Jin-Gang Yu, “Few-shot learning for annotation- efficient nucleus instance segmentation,”IEEE Transac- tions on Medical Imaging, 2025

  17. [17]

    Segment anything model (sam) for digital pathology: Assess zero-shot seg- mentation on whole slide imaging,

    Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lu- cas W. Remedios, Shunxing Bao, Bennett A. Landman, Lee E. Wheless, Lori A. Coburn, Keith T. Wilson, Yao- hong Wang, Shilin Zhao, Agnes B. Fogo, Haichun Yang, Yucheng Tang, and Yuankai Huo, “Segment anything model (sam) for digital pathology: Assess zero-shot seg- mentation on whole slide imaging,” inIS&T ...

  18. [18]

    Segment anything in medical images,

    Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang, “Segment anything in medical images,” Nature Communications, vol. 15, no. 1, pp. 654, 2024

  19. [19]

    Open-Vocabulary SAM: Segment and Recognize Twenty-Thousand Classes Interactively,

    Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, and Chen Change Loy, “Open-Vocabulary SAM: Segment and Recognize Twenty-Thousand Classes Interactively,” inEuropean Conference on Computer Vision (ECCV). 2024, Lecture Notes in Computer Science, Springer

  20. [20]

    Self-support few-shot semantic segmentation,

    Qi Fan, Wenjie Pei, Yu-Wing Tai, and Chi-Keung Tang, “Self-support few-shot semantic segmentation,” inEuro- pean conference on computer vision. Springer, 2022, pp. 701–719

  21. [21]

    Online easy example mining for weakly- supervised gland segmentation from histology images,

    Yi Li, Yiduo Yu, Yiwen Zou, Tianqi Xiang, and Xi- aomeng Li, “Online easy example mining for weakly- supervised gland segmentation from histology images,” inInternational Conference on Medical Image Comput- ing and Computer-Assisted Intervention. Springer, 2022, pp. 578–587