One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation
Pith reviewed 2026-06-29 08:18 UTC · model grok-4.3
The pith
One click per cell type segments all instances of that type using a frozen foundation model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that Group Prompting via Chain-of-Prompts reduces interactive segmentation cost from per-instance to per-type by exploiting the frozen SAM image encoder's pre-existing clustering of same-type cells. CoP identifies reliable same-type locations through non-parametric gating of multi-scale encoder features and selects the most spatially distant reliable point as the next prompt to maximize coverage. On three cell-type-annotated benchmarks this retains over 90 percent of per-instance performance and surpasses fully-supervised methods without training; on four morphologically homogeneous benchmarks a single click retains over 99 percent.
What carries the argument
Chain-of-Prompts (CoP): recursive expansion of one user click by non-parametric gating of multi-scale SAM encoder features to locate same-type cells and select distant reliable points as subsequent prompts.
If this is right
- User interactions drop from one per cell instance to one per cell type.
- Performance stays above 90 percent of full per-instance prompting on cell-type-annotated benchmarks.
- The method exceeds fully supervised models on out-of-distribution cell types with no training.
- A single click reaches over 99 percent performance on images containing only one morphological cell type.
Where Pith is reading between the lines
- The same feature-clustering property could support group prompting for other dense object categories if their embeddings separate by type.
- Annotation effort for large cell datasets could fall sharply if one click per type replaces exhaustive per-cell labeling.
- Applying the recursive gating step to other frozen vision encoders would test whether the clustering behavior is specific to SAM or more general.
Load-bearing premise
The frozen image encoder of the Segment Anything Model already clusters same-type cells in its feature space before any prompt is given.
What would settle it
Measure whether, after an initial click on one cell of a new type, the non-parametric gating of multi-scale encoder features correctly identifies other cells of that type while excluding different types.
Figures
read the original abstract
Cell instance segmentation models trained on cell-specific datasets suffer severe performance drops on out-of-distribution cell types, while interactive foundation models overcome this through per-instance prompting at a cost that is prohibitively expensive for histopathology images containing hundreds to thousands of densely packed instances. We introduce Group Prompting, a new paradigm that shifts interactive segmentation from per-instance $O(N)$ to per-type $O(T)$, where a single click per cell type suffices to segment all instances of that type. Our key observation is that the frozen image encoder of the Segment Anything Model (SAM) already clusters same-type cells in its feature space before any prompt is given. Exploiting this property, we propose Chain-of-Prompts (CoP), a training-free framework that recursively expands a single user click by (1) identifying reliable same-type locations through non-parametric gating of multi-scale encoder features, and (2) selecting the most spatially distant reliable point as the next prompt to maximize coverage. On three cell-type-annotated benchmarks, CoP with one click per type retains over 90% of per-instance performance and surpasses fully-supervised methods without any additional training. On four morphologically homogeneous benchmarks, a single click retains over 99%. Project Page: https://shjo-april.github.io/Chain-of-Prompts/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Group Prompting and Chain-of-Prompts (CoP), a training-free interactive segmentation framework for cell instance segmentation. It shifts from per-instance O(N) prompting to per-type O(T) by leveraging the claim that SAM's frozen ViT encoder already clusters same-type cells in multi-scale feature space; a single click per type is recursively expanded via non-parametric gating to identify reliable same-type points and selection of the most distant reliable point as the next prompt. On three cell-type-annotated benchmarks CoP retains >90% of per-instance performance and surpasses fully-supervised baselines; on four homogeneous benchmarks a single click retains >99%.
Significance. If the central empirical observation holds, the result is significant: it demonstrates that a pretrained natural-image foundation model can be used for dense histopathology segmentation with minimal interaction cost and no task-specific training or fine-tuning. The training-free design, O(T) scaling, and reported outperformance of supervised methods on multiple benchmarks would be a notable practical advance for annotation-efficient cell analysis.
major comments (3)
- [§3] §3 (Method), the non-parametric gating step: the claim that multi-scale SAM encoder features enable reliable same-type identification without any learned parameters or target-data thresholds is load-bearing for the training-free and O(T) claims, yet the manuscript provides no quantitative verification (e.g., intra- vs. inter-type cosine similarity distributions or cluster purity metrics) on the three cell-type benchmarks.
- [Experiments] Experiments section, performance tables: the headline retention figures (>90% on cell-type benchmarks, >99% on homogeneous) are presented without error bars, multiple random click seeds, or ablation of the distant-point selection rule; this makes it impossible to determine whether the reported superiority over fully-supervised methods is robust or sensitive to the single-click initialization.
- [§4.2] §4.2 (ablation or feature analysis): if a dedicated subsection examines whether the observed clustering is an accident of SAM pretraining or holds across morphological variations, it is not referenced in the abstract or results; without it the generalization argument for out-of-distribution cell types rests on untested extrapolation.
minor comments (2)
- [§3] Notation for multi-scale feature gating is introduced without an explicit equation or pseudocode listing the exact non-parametric criterion (e.g., percentile threshold or distance metric).
- [Abstract] The project page link is given but the manuscript does not state whether code and exact prompt seeds used for the reported numbers will be released.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the constructive comments. We address each major point below and will revise the manuscript to incorporate additional analyses and robustness checks.
read point-by-point responses
-
Referee: [§3] §3 (Method), the non-parametric gating step: the claim that multi-scale SAM encoder features enable reliable same-type identification without any learned parameters or target-data thresholds is load-bearing for the training-free and O(T) claims, yet the manuscript provides no quantitative verification (e.g., intra- vs. inter-type cosine similarity distributions or cluster purity metrics) on the three cell-type benchmarks.
Authors: We agree that explicit quantitative verification of the clustering property would strengthen the load-bearing claim. In the revised manuscript we will add intra- versus inter-type cosine similarity distributions together with cluster purity metrics computed on the three cell-type benchmarks to directly support the non-parametric gating step. revision: yes
-
Referee: [Experiments] Experiments section, performance tables: the headline retention figures (>90% on cell-type benchmarks, >99% on homogeneous) are presented without error bars, multiple random click seeds, or ablation of the distant-point selection rule; this makes it impossible to determine whether the reported superiority over fully-supervised methods is robust or sensitive to the single-click initialization.
Authors: We acknowledge that error bars and ablations are needed to assess robustness. The revised experiments section will report standard deviations over multiple random click seeds and will include an ablation of the distant-point selection rule. revision: yes
-
Referee: [§4.2] §4.2 (ablation or feature analysis): if a dedicated subsection examines whether the observed clustering is an accident of SAM pretraining or holds across morphological variations, it is not referenced in the abstract or results; without it the generalization argument for out-of-distribution cell types rests on untested extrapolation.
Authors: The manuscript does not contain a dedicated subsection on the origin of the clustering. Generalization is supported by consistent empirical results across diverse benchmarks. We will add a concise discussion paragraph in the results or conclusion addressing the contribution of SAM pretraining and the empirical evidence for generalization across morphological variations. revision: partial
Circularity Check
No significant circularity; derivation relies on external frozen encoder properties
full rationale
The paper's central procedure (Chain-of-Prompts) is explicitly training-free and applies non-parametric gating directly to multi-scale features from the frozen SAM ViT encoder, without fitting any parameters or thresholds to the target cell datasets. Performance retention claims (>90% on cell-type benchmarks) are presented as empirical results on external benchmarks rather than quantities derived by construction from the same data. No equations reduce a prediction to a fitted input, no self-citations bear the load of uniqueness or ansatz, and the key observation about feature clustering is stated as an empirical property of the external pretrained model, not a self-referential definition. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Nature Methods22(3), 579–591 (2025)
Archit, A., Freckmann, L., Nair, S., Pape, C., et al.: Segment anything for microscopy. Nature Methods22(3), 579–591 (2025). https://doi.org/10.1038/ s41592-024-02580-4
2025
-
[2]
In: ICLR (2026)
Carion, N., Gustafson, L., Hu, Y.T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala,K.V.,Khedr,H.,Huang,A.,etal.:Sam3:Segmentanythingwithconcepts. In: ICLR (2026)
2026
-
[3]
In: MICCAI
Chen, P., Zhu, C., Shui, Z., Cai, J., Zheng, S., Zhang, S., Yang, L.: Exploring unsupervised cell recognition with prior self-activation maps. In: MICCAI. pp. 559–568. Springer Nature Switzerland, Cham (2023)
2023
-
[4]
IJCV88(2), 303–338 (2010) 10 S
Everingham,M.,VanGool,L.,Williams,C.K.,Winn,J.,Zisserman,A.:Thepascal visual object classes (VOC) challenge. IJCV88(2), 303–338 (2010) 10 S. Jo et al
2010
-
[5]
https://doi.org/10.1109/TMI.2021.3113172
Graham, S., Vu, Q.D., Jahanifar, M., Abraham, A., Durr, N.J., Rajpoot, N., Raza, S.E.A.: A dataset for prostate cancer semantic segmentation and gland detection fromwholeslideimages.IEEETransactionsonMedicalImaging40(12),3923–3933 (2021). https://doi.org/10.1109/TMI.2021.3113172
-
[6]
Graham, S., Vu, Q.D., Raza, S.E.A., Rajpoot, N., et al.: Conic challenge: Pushing the frontiers of nuclear detection, segmentation, classification and counting. MedIA 91, 103049 (2024). https://doi.org/10.1016/j.media.2023.103049
-
[7]
In: MICCAI
Huang, H., He, H., Xu, L., Zhu, X., Feng, S., Fu, G.: Ca-sam2: Sam2-based context- aware network with auto-prompting for nuclei instance segmentation. In: MICCAI. pp. 86–95. Springer Nature Switzerland (2025)
2025
-
[8]
In: MICCAI
Huang, L., Liang, Y., Liu, J.: DES-SAM: Distillation-Enhanced Semantic SAM for Cervical Nuclear Segmentation with Box Annotation . In: MICCAI. vol. LNCS 15009. Springer Nature Switzerland (October 2024)
2024
-
[9]
Hörst, F., Rempe, M., Heine, L., Seibold, C., Keyl, J., Baldini, G., Ugurel, S., Siveke, J., Grünwald, B., Egger, J., Kleesiek, J.: Cellvit: Vision transformers for precise cell segmentation and classification. MedIA94, 103143 (2024). https://doi. org/https://doi.org/10.1016/j.media.2024.103143
-
[10]
In: CVPR (2026)
Jiang, Q., Huo, J., Chen, X., Xiong, Y., Zeng, Z., Chen, Y., Ren, T., Yu, J., Zhang, L.: Detect anything via next point prediction. In: CVPR (2026)
2026
-
[11]
Jo, S., Lee, S.J., Lee, S., Hong, S., Seo, H., Kim, K.: Coin: Confidence score-guided distillationforannotation-freecellsegmentation.In:ICCV.pp.20324–20335(2025)
2025
-
[12]
In: ICCV
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: ICCV. pp. 4015–4026 (2023)
2023
-
[13]
Kumar, N., Verma, R., Anand, D., Zhou, Y., Onder, O.F., Tsougenis, E., Chen, H., Heng, P.A., Li, J., Hu, Z., Wang, Y., Koohbanani, N.A., Jahanifar, M., Tajeddin, N.Z., Gooya, A., Rajpoot, N., Ren, X., Zhou, S., Wang, Q., Shen, D., Yang, C.K., Weng, C.H., Yu, W.H., Yeh, C.Y., Yang, S., Xu, S., Yeung, P.H., Sun, P., Mahbod, A., Schaefer, G., Ellinger, I., E...
-
[14]
In: ECCV
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: ECCV. pp. 740–755. Springer (2014)
2014
-
[15]
In: ECCV
Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., et al.: Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In: ECCV. pp. 38–55. Springer (2024)
2024
-
[16]
Computers in Biology and Medicine 132, 104349 (2021)
Mahbod, A., Schaefer, G., Bancher, B., Löw, C., Dorffner, G., Ecker, R., Ellinger, I.: Cryonuseg: A dataset for nuclei instance segmentation of cryosec- tioned h&e-stained histological images. Computers in Biology and Medicine 132, 104349 (2021). https://doi.org/https://doi.org/10.1016/j.compbiomed.2021. 104349, https://www.sciencedirect.com/science/artic...
-
[17]
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018) Chain-of-Prompts 11
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
IEEE Transactions on Medical Imaging38(2), 448–459 (2018)
Naylor, P., Laé, M., Reyal, F., Walter, T.: Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Transactions on Medical Imaging38(2), 448–459 (2018)
2018
-
[19]
In: ICLR (2025)
Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K.V., Carion, N., Wu, C.Y., Girshick, R., Dollár, P., Feichtenhofer, C.: Sam 2: Segment anything in images and videos. In: ICLR (2025)
2025
-
[20]
In: MICCAI
Sahasrabudhe, M., Christodoulidis, S., Salgado, R., Michiels, S., Loi, S., André, F., Paragios, N., Vakalopoulou, M.: Self-supervised nuclei segmentation in histopatho- logical images using attention. In: MICCAI. pp. 393–402. Springer (2020)
2020
-
[21]
MedIA35, 489–502 (2017)
Sirinukunwattana, K., Pluim, J.P., Chen, H., Qi, X., Heng, P.A., Guo, Y.B., Wang, L.Y.,Matuszewski,B.J.,Bruni,E.,Sanchez,U.,etal.:Glandsegmentationincolon histology images: The GlaS challenge contest. MedIA35, 489–502 (2017)
2017
-
[22]
Nature Methods22(3), 592–599 (2025)
Stringer, C., Pachitariu, M.: Cellpose3: one-click image restoration for improved cellular segmentation. Nature Methods22(3), 592–599 (2025). https://doi.org/10. 1038/s41592-025-02595-5
2025
-
[23]
Fron- tiers in Bioengineering and BiotechnologyV olume 7 - 2019(2019)
Vu, Q.D., Graham, S., Kurc, T., To, M.N.N., Shaban, M., Qaiser, T., Koohbanani, N.A., Khurram, S.A., Kalpathy-Cramer, J., Zhao, T., Gupta, R., Kwak, J.T., Rajpoot, N., Saltz, J., Farahani, K.: Methods for seg- mentation and classification of digital microscopy tissue images. Fron- tiers in Bioengineering and BiotechnologyV olume 7 - 2019(2019). https://do...
-
[24]
In: ICCV
Wang, A., Liu, L., Chen, H., Lin, Z., Han, J., Ding, G.: Yoloe: Real-time seeing anything. In: ICCV. pp. 24591–24602 (2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.