pith. sign in

arxiv: 2503.00450 · v4 · submitted 2025-03-01 · 💻 cs.CV

Unsupervised Source-Free Ranking of Biomedical Segmentation Models Under Distribution Shift

Pith reviewed 2026-05-23 01:34 UTC · model grok-4.3

classification 💻 cs.CV
keywords model rankingunsupervisedsource-freesegmentationdistribution shiftprediction consistencybiomedical imagingdomain adaptation
0
0 comments X

The pith

Prediction consistency under perturbations ranks biomedical segmentation models to match their true target performance without labels or source data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that ranks pretrained segmentation models for new biomedical datasets by checking how stable their outputs remain when inputs receive small perturbations. This ranking requires no target labels, no access to original training statistics, and works in a black-box setting for both semantic and instance segmentation. The method targets the common repository scenario where users must pick a model for data that has shifted from the training distribution. If the estimated order matches actual performance, it removes the need for expensive labeled validation sets when reusing models. The approach is evaluated across multiple 2D and 3D biomedical tasks.

Core claim

The authors claim that model rankings produced by measuring prediction consistency under perturbations strongly correlate with the true rankings of model performance on the target domain across a wide range of biomedical segmentation tasks in both 2D and 3D imaging.

What carries the argument

Prediction consistency under input perturbations, used as a black-box proxy for generalization on shifted target data.

If this is right

  • Model selection from repositories becomes feasible without any target-domain labels.
  • The same consistency measure applies to both semantic and instance segmentation models.
  • Ranking remains valid for zero-shot reuse or after unsupervised domain adaptation.
  • The correlation holds across both 2D and 3D biomedical imaging tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The consistency proxy could be tested on dense prediction tasks outside segmentation, such as detection or registration.
  • Different perturbation families might be combined to increase ranking reliability on varied shift types.
  • The method could reduce the computational cost of evaluating entire model zoos by providing an early filter before any target evaluation.

Load-bearing premise

That the amount a model's predictions change under the chosen perturbations reliably indicates how well it will perform on the actual target domain.

What would settle it

A new biomedical dataset where the order of models by consistency score differs substantially from their order by true target-domain metrics such as Dice score.

Figures

Figures reproduced from arXiv: 2503.00450 by Anna Kreshuk, Federico Bolelli, Joshua Talks, Kevin Marchesini, Luca Lumetti.

Figure 1
Figure 1. Figure 1: Unsupervised consistency-based model ranking. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of classification and semantic segmentation transferability metric results. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Semantic segmentation EPFL target correlation. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Instance segmentation Covid IF target correlation. remains less affected, resulting in higher correlation scores. 5.2. Instance Segmentation To our knowledge, we are the first to address this prob￾lem, so no baseline is available. Unlike the semantic case, instance segmentation does not produce a fixed set of masks with direct correspondence to feature vectors, hence previ￾ously compared transferability es… view at source ↗
Figure 5
Figure 5. Figure 5: Cells vs Nuclei Prediction of Cellpose-SAM. [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cells vs Nuclei Prediction of Cellpose-SAM. [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
read the original abstract

Model reuse offers a solution to the challenges of segmentation in biomedical imaging, where high data annotation costs remain a major bottleneck for deep learning. However, although many pretrained models are released through challenges, model zoos, and repositories, selecting the most suitable model for a new dataset remains difficult due to the lack of reliable model ranking methods. We introduce the first black-box-compatible framework for unsupervised and source-free ranking of semantic and instance segmentation models based on the consistency of predictions under perturbations. While ranking methods have been studied for classification and a few segmentation-related approaches exist, most target related tasks such as transferability estimation or model validation and typically rely on labelled data, feature-space access, or specific training assumptions. In contrast, our method directly addresses the repository setting and applies to both semantic and instance segmentation, for zero-shot reuse or after unsupervised domain adaptation. We evaluate the approach across a wide range of biomedical segmentation tasks in both 2D and 3D imaging, showing that our estimated rankings strongly correlate with true target-domain model performance rankings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the first black-box, unsupervised, source-free framework for ranking pretrained semantic and instance segmentation models on new biomedical target datasets under distribution shift. The method ranks models by measuring consistency of their predictions under perturbations applied directly to unlabeled target data. Experiments across multiple 2D and 3D biomedical segmentation tasks report that the resulting rankings correlate strongly with ground-truth target-domain performance rankings.

Significance. If the central empirical claim holds, the work addresses a practical bottleneck in biomedical imaging model reuse where annotation costs are high and source data or labels are unavailable. The black-box and source-free design, applicability to both semantic and instance segmentation, and evaluation breadth across 2D/3D tasks are strengths. No machine-checked proofs or parameter-free derivations are claimed, but the proxy-based ranking approach is falsifiable via the reported correlations.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): the claim of 'strongly correlate' is not supported by reported correlation coefficients, p-values, confidence intervals, or controls for multiple testing across tasks; without these, the central empirical result cannot be assessed for statistical reliability or effect size.
  2. [§3] §3 (Method): the perturbation strategy (types, magnitudes, number of perturbations, and aggregation into the consistency metric) is described at a high level only; this choice is load-bearing for the proxy assumption that consistency predicts target generalization, yet no ablation or justification is referenced to rule out that perturbations were selected post-hoc on target data.
minor comments (2)
  1. [§3] Clarify whether the consistency metric is computed per-image or aggregated globally, and specify the exact distance or agreement function used between perturbed predictions.
  2. [§4] Include a table or figure showing per-task Spearman or Kendall correlations with ground-truth rankings to make the 'wide range of tasks' claim concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the statistical reporting and methodological transparency. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of 'strongly correlate' is not supported by reported correlation coefficients, p-values, confidence intervals, or controls for multiple testing across tasks; without these, the central empirical result cannot be assessed for statistical reliability or effect size.

    Authors: We agree that explicit statistical measures are needed to support the correlation claims. In the revised version, we will report the Spearman rank correlation coefficients for each task along with p-values, bootstrap confidence intervals, and a note on multiple-testing correction (e.g., Bonferroni) across the evaluated tasks. These additions will allow readers to assess effect size and reliability directly. revision: yes

  2. Referee: [§3] §3 (Method): the perturbation strategy (types, magnitudes, number of perturbations, and aggregation into the consistency metric) is described at a high level only; this choice is load-bearing for the proxy assumption that consistency predicts target generalization, yet no ablation or justification is referenced to rule out that perturbations were selected post-hoc on target data.

    Authors: We acknowledge that the current description of the perturbation strategy is high-level. In the revision we will expand §3 with the exact perturbation types, magnitudes, counts, and aggregation formula. We will also add an ablation study on these hyperparameters, performed on held-out validation splits prior to the main target-domain experiments, to demonstrate that the chosen settings are robust and not tuned post-hoc on the evaluation targets. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines a ranking via prediction consistency under perturbations and reports empirical correlation with target-domain performance rankings. No equations, self-citations, or fitted parameters are shown in the provided text that reduce the consistency metric or ranking to a tautological re-expression of the target performance itself. The central claim remains an external proxy assumption evaluated against held-out ground truth, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premise that consistency under perturbations captures generalization without labels. No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5717 in / 1092 out tokens · 14595 ms · 2026-05-23T01:34:14.617609+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    We propose to estimate model transferability based on the consistency of model outputs under perturbation... Prediction consistency can be viewed as a proxy for the margin of a model’s decision boundaries with respect to the target data.

  • IndisputableMonolith/Foundation/LogicAsFunctionalEquation.lean SatisfiesLawsOfLogic echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    CTE-NHD = 1 − |{ỹ ≠ ŷ} ∩ (ỹ ∪ ŷ)| / |ỹ ∪ ŷ| (normalised Hamming distance, per-class weighted)

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · 6 internal anchors

  1. [1]

    Agostinelli, Michal P’andy, J

    A. Agostinelli, Michal P’andy, J. Uijlings, Thomas Mensink, and V . Ferrari. How stable are Transferability Metrics evalua- tions?European Conference on Computer Vision, 2022. 2, 6

  2. [2]

    Robustness to Augmentations as a Generalization metric, 2021

    Sumukh K Aithal, Dhruva Kashyap, and Natarajan Subra- manyam. Robustness to Augmentations as a Generalization metric, 2021. arXiv:2101.06459. 3, 7

  3. [3]

    Segment Any- thing for Microscopy.Nature Methods, 22(3):579–591, 2025

    Anwai Archit, Luca Freckmann, Sushmita Nair, Nabeel Khalid, Paul Hilt, Vikas Rajashekar, Marei Freitag, Car- olin Teuber, Melanie Spitzner, Constanza Tapia Contreras, Genevieve Buckley, Sebastian von Haaren, Sagnik Gupta, Marian Grade, Matthias Wirth, G ¨unter Schneider, Andreas Dengel, Sheraz Ahmed, and Constantin Pape. Segment Any- thing for Microscopy....

  4. [4]

    Turaga, Daniel R

    Ignacio Arganda-Carreras, Srinivas C. Turaga, Daniel R. Berger, Dan Cire s ¸an, Alessandro Giusti, Luca M. Gam- bardella, J ¨urgen Schmidhuber, Dmitry Laptev, Sarvesh Dwivedi, Joachim M. Buhmann, Ting Liu, Mojtaba Seyedhos- seini, Tolga Tasdizen, Lee Kamentsky, Radim Burget, Vaclav Uher, Xiao Tan, Changming Sun, Tuan D. Pham, Erhan Bas, Mustafa G. Uzunbas...

  5. [5]

    An annotated high-content fluorescence microscopy dataset with Hoechst 33342-stained nuclei and manually labelled outlines

    Malou Arvidsson, Salma Kazemi Rashed, and Sonja Aits. An annotated high-content fluorescence microscopy dataset with Hoechst 33342-stained nuclei and manually labelled outlines. Data in Brief, 46:108769, 2023. 5, 2

  6. [6]

    Hamprecht

    Alberto Bailoni, Constantin Pape, Nathan H ¨utsch, Steffen Wolf, Thorsten Beier, Anna Kreshuk, and Fred A. Hamprecht. GASP, a generalized framework for agglomerative clustering of signed graphs and its application to Instance Segmentation,

  7. [7]

    arXiv:1906.11713 [cs]. 8

  8. [8]

    An Information- Theoretic Approach to Transferability in Task Transfer Learn- ing, 2022

    Yajie Bao, Yang Li, Shao-Lun Huang, Lin Zhang, Lizhong Zheng, Amir Zamir, and Leonidas Guibas. An Information- Theoretic Approach to Transferability in Task Transfer Learn- ing, 2022. arXiv:2212.10082 [cs]. 2, 5, 9

  9. [9]

    Pearson Correlation Coefficient

    Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. Pearson Correlation Coefficient. InNoise Reduction in Speech Processing, pages 1–4. Springer, Berlin, Heidelberg,

  10. [10]

    Rokuss, Klaus Maier-Hein, Jaehwan Han, Wan Kim, Hong-Gi Ahn, Tomasz Szczepa´nski, Michal K

    Federico Bolelli, Luca Lumetti, Shankeeth Vinayahalingam, Mattia Di Bartolomeo, Arrigo Pellacani, Kevin Marchesini, Niels van Nistelrooij, Pieter van Lierop, Tong Xi, Yusheng Liu, Rui Xin, Tao Yang, Lisheng Wang, Haoshen Wang, Chen- fan Xu, Zhiming Cui, Marek Wodzinski, Henning M ¨uller, Yannick Kirchhoff, Maximilian R. Rokuss, Klaus Maier-Hein, Jaehwan H...

  11. [11]

    Segmenting Maxillo- facial Structures in CBCT V olumes

    Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij, Luca Lumetti, Vittorio Pipoli, Elisa Ficarra, Shankeeth Vinayahalingam, and Costantino Grana. Segmenting Maxillo- facial Structures in CBCT V olumes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 5, 3

  12. [12]

    Caicedo, Allen Goodman, Kyle W

    Juan C. Caicedo, Allen Goodman, Kyle W. Karhohs, Beth A. Cimini, Jeanelle Ackerman, Marzieh Haghighi, CherKeng Heng, Tim Becker, Minh Doan, Claire McQuin, Mohammad Rohban, Shantanu Singh, and Anne E. Carpenter. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl.Nature Methods, 16(12):1247–1253, 2019. Publisher: Nature Publishing G...

  13. [13]

    Multi-Modal Continual Test- Time Adaptation for 3D Semantic Segmentation, 2023

    Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Sheng- hai Yuan, and Lihua Xie. Multi-Modal Continual Test- Time Adaptation for 3D Semantic Segmentation, 2023. arXiv:2303.10457 [cs]. 2

  14. [14]

    The Performance of Transferability Metrics does not Translate to Medical Tasks, 2023

    Levy Chaves, Alceu Bissoto, Eduardo Valle, and Sandra Avila. The Performance of Transferability Metrics does not Translate to Medical Tasks, 2023. arXiv:2308.07444 [cs]. 6

  15. [15]

    HeLaCytoNuc: flu- orescence microscopy dataset with segmentation masks for cell nuclei and cytoplasm, 2024

    Trina De, Adrian Urbanski, Subasini Thangamani, Maria Wyrzykowska, and Artur Yakimovich. HeLaCytoNuc: flu- orescence microscopy dataset with segmentation masks for cell nuclei and cytoplasm, 2024. 5, 2

  16. [16]

    On the Strong Correlation Between Model Invariance and General- ization, 2022

    Weijian Deng, Stephen Gould, and Liang Zheng. On the Strong Correlation Between Model Invariance and General- ization, 2022. arXiv:2207.07065 [cs]. 3, 4, 7 9

  17. [17]

    Which Model to Transfer? A Survey on Transferability Estimation, 2024

    Yuhe Ding, Bo Jiang, Aijing Yu, Aihua Zheng, and Jian Liang. Which Model to Transfer? A Survey on Transferability Estimation, 2024. arXiv:2402.15231 [cs]. 2

  18. [18]

    Jones, Yanling Liu, Dorsa Ziaei, Stephan Huschauer, Ignacio Arganda-Carreras, Hanspeter Pfister, and Donglai Wei

    Daniel Franco-Barranco, Zudi Lin, Won-Dong Jang, Xuey- ing Wang, Qijia Shen, Wenjie Yin, Yutian Fan, Mingxing Li, Chang Chen, Zhiwei Xiong, Rui Xin, Hao Liu, Huai Chen, Zhili Li, Jie Zhao, Xuejin Chen, Constantin Pape, Ryan Conrad, Luke Nightingale, Joost de Folter, Martin L. Jones, Yanling Liu, Dorsa Ziaei, Stephan Huschauer, Ignacio Arganda-Carreras, Ha...

  19. [19]

    A Benchmark for Epithelial Cell Track- ing

    Jan Funke, Lisa Mais, Andrew Champion, Natalie Dye, and Dagmar Kainmueller. A Benchmark for Epithelial Cell Track- ing. InComputer Vision – ECCV 2018 Workshops, pages 437–445. Springer International Publishing, Cham, 2019. Se- ries Title: Lecture Notes in Computer Science. 5, 2

  20. [20]

    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

    Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. InProceedings of The 33rd International Confer- ence on Machine Learning, pages 1050–1059. PMLR, 2016. ISSN: 1938-7228. 3

  21. [21]

    Lipton, Behnam Neyshabur, and Hanie Sedghi

    Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, and Hanie Sedghi. Leveraging Unla- beled Data to Predict Out-of-Distribution Performance, 2022. arXiv:2201.04234 [cs, stat]. 3

  22. [22]

    Swin UNETR: Swin Trans- formers for Semantic Segmentation of Brain Tumors in MRI Images

    Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R Roth, and Daguang Xu. Swin UNETR: Swin Trans- formers for Semantic Segmentation of Brain Tumors in MRI Images. InInternational MICCAI Brainlesion Workshop, pages 272–284. Springer, 2021. 5

  23. [23]

    Roth, and Daguang Xu

    Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger R. Roth, and Daguang Xu. UNETR: Transformers for 3D Medical Image Segmentation. In2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1748–1758,

  24. [24]

    Rothenberg, Michelle Ly, and Rodrigo Fernandez-Gonzalez

    Raymond Hawkins, Negar Balaghi, Katheryn E. Rothenberg, Michelle Ly, and Rodrigo Fernandez-Gonzalez. ReSCU- Nets: recurrent U-Nets for segmentation of multidimensional microscopy data, 2024. 5, 2

  25. [25]

    Deep Residual Learning for Image Recognition, 2015

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, 2015. 5

  26. [26]

    Reshaping deep neural network for fast decoding by node-pruning

    Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, and Kai Yu. Reshaping deep neural network for fast decoding by node-pruning. In2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 245–249, Florence, Italy, 2014. IEEE. 5, 7

  27. [27]

    Densely Connected Convolutional Networks

    Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kil- ian Q. Weinberger. Densely Connected Convolutional Net- works, 2018. arXiv:1608.06993 [cs]. 5

  28. [28]

    Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance

    Shibal Ibrahim, Natalia Ponomareva, and Rahul Mazumder. Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance. pages 693–709

  29. [29]

    ISSN: 0302-9743, 1611-3349 arXiv:2110.06893 [cs]. 2, 5, 9

  30. [30]

    nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.Na- ture methods, 18(2):203–211, 2021

    Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Petersen, and Klaus H Maier-Hein. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.Na- ture methods, 18(2):203–211, 2021. 1, 5

  31. [31]

    Predicting the Generalization Gap in Deep Networks with Margin Distributions

    Yiding Jiang, Dilip Krishnan, Hossein Mobahi, and Samy Bengio. Predicting the Generalization Gap in Deep Networks with Margin Distributions. 2018. 3

  32. [32]

    M. G. Kendall. Rank correlation methods. In4th edition (1970), 1972. 6

  33. [33]

    Ambros, Inge M

    Florian Kromp, Eva Bozsaky, Fikret Rifatbegovic, Lukas Fischer, Magdalena Ambros, Maria Berneder, Tamara Weiss, Daria Lazic, Wolfgang D¨orr, Allan Hanbury, Klaus Beiske, Peter F. Ambros, Inge M. Ambros, and Sabine Taschner- Mandl. An annotated fluorescence image dataset for training nuclear segmentation methods.Scientific Data, 7(1):262,

  34. [34]

    Publisher: Nature Publishing Group. 5, 2

  35. [35]

    Understand- ing Self-Training for Gradual Domain Adaptation

    Ananya Kumar, Tengyu Ma, and Percy Liang. Understand- ing Self-Training for Gradual Domain Adaptation. InPro- ceedings of the 37th International Conference on Machine Learning, pages 5468–5479. PMLR, 2020. ISSN: 2640-3498. 2

  36. [36]

    Dropout injection at test time for post hoc uncertainty quantification in neural networks.Information Sciences, 645:119356, 2023

    Emanuele Ledda, Giorgio Fumera, and Fabio Roli. Dropout injection at test time for post hoc uncertainty quantification in neural networks.Information Sciences, 645:119356, 2023. 3

  37. [37]

    Superhuman Accuracy on the SNEMI3D Connectomics Challenge

    Kisuk Lee, Jonathan Zung, Peter Li, Viren Jain, and H. Se- bastian Seung. Superhuman Accuracy on the SNEMI3D Connectomics Challenge, 2017. arXiv:1706.00120 [cs]. 5

  38. [38]

    Adaptive Batch Normalization for practical do- main adaptation.Pattern Recognition, 2018

    Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. Adaptive Batch Normalization for practical do- main adaptation.Pattern Recognition, 2018. 8

  39. [39]

    Ranking Neural Checkpoints.arXiv: Learning,

    Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Bradley Ray Green, Liqiang Wang, Boqing Gong, and Boqing Gong. Ranking Neural Checkpoints.arXiv: Learning,

  40. [40]

    Vmamba: Visual State Space Model

    Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. Vmamba: Visual State Space Model. InThe Thirty- eighth Annual Conference on Neural Information Processing Systems, 2024. 5

  41. [41]

    Sokolnicki, and Anne E

    Vebjorn Ljosa, Katherine L. Sokolnicki, and Anne E. Carpen- ter. Annotated high-throughput microscopy image sets for validation.Nature Methods, 9(7):637–637, 2012. Publisher: Nature Publishing Group. 5, 1, 2

  42. [42]

    A General Framework for Uncertainty Estimation in Deep Learning.IEEE Robotics and Automation Letters, 5(2):3153– 3160, 2020

    Antonio Loquercio, Mattia Seg `u, and Davide Scaramuzza. A General Framework for Uncertainty Estimation in Deep Learning.IEEE Robotics and Automation Letters, 5(2):3153– 3160, 2020. arXiv:1907.06890 [cs]. 3, 4, 7

  43. [43]

    Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets

    Aur´elien Lucchi, Yunpeng Li, and Pascal Fua. Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets. In2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 1987–1994,

  44. [44]

    ISSN: 1063-6919. 5, 1

  45. [45]

    U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

    Jun Ma, Feifei Li, and Bo Wang. U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation. arXiv preprint arXiv:2401.04722, 2024. 5

  46. [46]

    Factors of Influence for Trans- fer Learning across Diverse Appearance Domains and Task 10 Types.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

    Thomas Mensink, Jasper Uijlings, Alina Kuznetsova, Michael Gygli, and Vittorio Ferrari. Factors of Influence for Trans- fer Learning across Diverse Appearance Domains and Task 10 Types.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. 2

  47. [47]

    Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate, 2022

    Lu Mi, Hao Wang, Yonglong Tian, Hao He, and Nir Shavit. Training-Free Uncertainty Estimation for Dense Regression: Sensitivity as a Surrogate, 2022. arXiv:1910.04858 [cs]. 3, 4, 5, 7

  48. [48]

    Minimal-Entropy Correlation Alignment for Unsupervised Deep Domain Adaptation

    Pietro Morerio, Jacopo Cavazza, and Vittorio Murino. Minimal-Entropy Correlation Alignment for Unsupervised Deep Domain Adaptation, 2017. arXiv:1711.10288 [cs]. 2

  49. [49]

    Freyberg, Patricia MacWilliams, Megan Wilson, S

    Basil Mustafa, Aaron Loh, J. Freyberg, Patricia MacWilliams, Megan Wilson, S. M. McKinney, M. Sieniek, Jim Winkens, Yuan Liu, P. Bui, Shruthi Prabhakara, Umesh Telang, A. Karthikesalingam, N. Houlsby, and Vivek Natarajan. Su- pervised Transfer Learning at Scale for Medical Imaging. arXiv.org, 2021. 1, 2

  50. [50]

    Predicting Out-of-Domain Generaliza- tion with Local Manifold Smoothness

    Nathan Hoyen Ng, Neha Hulkund, Kyunghyun Cho, and Marzyeh Ghassemi. Predicting Out-of-Domain Generaliza- tion with Local Manifold Smoothness. 2022. 3

  51. [51]

    Nguyen, Cuong V

    Cuong V . Nguyen, Cuong V . Nguyen, Tal Hassner, C´edric Archambeau, and Matthias Seeger. LEEP: A New Measure to Evaluate Transferability of Learned Representations.arXiv: Learning, 2020. 2, 5, 9

  52. [52]

    Semi- Supervised Semantic Segmentation with Cross-Consistency Training, 2020

    Yassine Ouali, C ´eline Hudelot, and Myriam Tami. Semi- Supervised Semantic Segmentation with Cross-Consistency Training, 2020. arXiv:2003.09005 [cs]. 4, 5, 7

  53. [53]

    BioImage Model Zoo: A Community-Driven Re- source for Accessible Deep Learning in BioImage Analysis

    Wei Ouyang, Fynn Beuttenmueller, Estibaliz G ´omez-de Mariscal, Constantin Pape, Tom Burke, Carlos Garcia-L´opez- de Haro, Craig Russell, Luc ´ıa Moya-Sans, Cristina de-la Torre-Guti´errez, Deborah Schmidt, Dominik Kutra, Maksim Novikov, Martin Weigert, Uwe Schmidt, Peter Bankhead, Guillaume Jacquemet, Daniel Sage, Ricardo Henriques, Ar- rate Mu˜noz-Barru...

  54. [54]

    Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Laksh- minarayanan, and Jasper Snoek

    Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D. Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Laksh- minarayanan, and Jasper Snoek. Can you trust your model’ s uncertainty? Evaluating predictive uncertainty under dataset shift. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 2019. 3, 4, 7

  55. [55]

    Cellpose-SAM: superhuman generalization for cellular seg- mentation, 2025

    Marius Pachitariu, Michael Rariden, and Carsen Stringer. Cellpose-SAM: superhuman generalization for cellular seg- mentation, 2025. 1, 5, 8, 3, 11

  56. [56]

    Agostinelli, J

    Michal P’andy, A. Agostinelli, J. Uijlings, V . Ferrari, and Thomas Mensink. Transferability Estimation using Bhat- tacharyya Class Separability.Computer Vision and Pattern Recognition, 2021. 2, 3, 5, 9

  57. [57]

    Neufeldt, Markus Ganter, Paul Schnitzler, Uta Merle, Marina Lusic, Steeve Boulant, Megan Stanifer, Ralf Barten- schlager, Fred A

    Constantin Pape, Roman Remme, Adrian Wolny, Sylvia Olberg, Steffen Wolf, Lorenzo Cerrone, Mirko Cortese, Severina Klaus, Bojana Lucic, Stephanie Ullrich, Maria Anders- ¨Osswein, Stefanie Wolf, Berati Cerikan, Christo- pher J. Neufeldt, Markus Ganter, Paul Schnitzler, Uta Merle, Marina Lusic, Steeve Boulant, Megan Stanifer, Ralf Barten- schlager, Fred A. H...

  58. [58]

    Phelps, David Grant Colburn Hildebrand, Brett J

    Jasper S. Phelps, David Grant Colburn Hildebrand, Brett J. Graham, Aaron T. Kuan, Logan A. Thomas, Tri M. Nguyen, Julia Buhmann, Anthony W. Azevedo, Anne Sustar, Sweta Agrawal, Mingguan Liu, Brendan L. Shanny, Jan Funke, John C. Tuthill, and Wei-Chung Allen Lee. Reconstruction of motor control circuits in adult Drosophila using automated transmission elec...

  59. [59]

    Mobilenetv3 for image classification

    Siying Qian, Chenran Ning, and Yuepeng Hu. Mobilenetv3 for image classification. In2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pages 490–497, 2021. 5

  60. [60]

    William M. Rand. Objective Criteria for the Evaluation of Clustering Methods.Journal of the American Statistical Association, 66(336):846–850, 1971. 4

  61. [61]

    Which Model to Transfer? Finding the Needle in the Growing Haystack.arXiv: Learning, 2020

    Cedric Renggli, Andr ´e Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme Ruiz, Carlos Riquelme, Ce Zhang, and Mario Lucic. Which Model to Transfer? Finding the Needle in the Growing Haystack.arXiv: Learning, 2020. 2

  62. [62]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv: Computer Vision and Pattern Recognition, 2015. 5

  63. [63]

    Tune it the Right Way: Un- supervised Validation of Domain Adaptation via Soft Neigh- borhood Density, 2021

    Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, and Kate Saenko. Tune it the Right Way: Un- supervised Validation of Domain Adaptation via Soft Neigh- borhood Density, 2021. arXiv:2108.10860 [cs]. 2

  64. [64]

    MobileNetV2: Inverted Residuals and Linear Bottlenecks, 2018

    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zh- moginov, and Liang-Chieh Chen. MobileNetV2: Inverted Residuals and Linear Bottlenecks, 2018. 5

  65. [65]

    Pre- dicting Deep Neural Network Generalization with Perturba- tion Response Curves, 2021

    Yair Schiff, Brian Quanz, Payel Das, and Pin-Yu Chen. Pre- dicting Deep Neural Network Generalization with Perturba- tion Response Curves, 2021. arXiv:2106.04765. 3, 7

  66. [66]

    Improving predictive inference under covariate shift by weighting the log-likelihood function.Jour- nal of Statistical Planning and Inference, 90:227–244, 2000

    Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function.Jour- nal of Statistical Planning and Inference, 90:227–244, 2000. 2

  67. [67]

    Spearman

    C. Spearman. The Proof and Measurement of Association between Two Things.Am. J. Psychol., 15, 1904. 6

  68. [68]

    Wang, Michalis Michaelos, and Marius Pachitariu

    Carsen Stringer, Timothy C. Wang, Michalis Michaelos, and Marius Pachitariu. Cellpose: a generalist algorithm for cellu- lar segmentation.bioRxiv, 2020. 1

  69. [69]

    Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

    Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency tar- gets improve semi-supervised deep learning results, 2018. arXiv:1703.01780 [cs, stat]. 8

  70. [70]

    Efficient Object Localization Using Convolutional Networks

    Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christopher Bregler. Efficient Object Localization Using Convolutional Networks, 2015. arXiv:1411.4280 [cs]. 5, 3

  71. [71]

    Measures of Sim- ilarity

    Ranjith Unnikrishnan and Martial Hebert. Measures of Sim- ilarity. In2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION’05) - Volume 1, pages 394–394, 2005. 4 11

  72. [72]

    Smith, Fred A

    Athul Vijayan, Tejasvinee Atul Mody, Qin Yu, Adrian Wolny, Lorenzo Cerrone, Soeren Strauss, Miltos Tsiantis, Richard S. Smith, Fred A. Hamprecht, Anna Kreshuk, and Kay Schneitz. A deep learning-based toolkit for 3D nuclei segmentation and quantitative analysis in cellular and tissue context.Develop- ment (Cambridge, England), 151(14):dev202800, 2024. 5, 1

  73. [73]

    Laine, Johanna Jukkala, Christoph Spahn, Daniel Krentzel, Elias Nehme, Martina Lerche, Sara Hern´andez-P´erez, Pieta K

    Lucas von Chamier, Romain F. Laine, Johanna Jukkala, Christoph Spahn, Daniel Krentzel, Elias Nehme, Martina Lerche, Sara Hern´andez-P´erez, Pieta K. Mattila, Eleni Kari- nou, S ´eamus Holden, Ahmet Can Solak, Alexander Krull, Tim-Oliver Buchholz, Martin L. Jones, Lo ¨ıc A. Royer, Christophe Leterrier, Yoav Shechtman, Florian Jug, Mike Heilemann, Guillaume...

  74. [74]

    Epistemic Uncertainty Quantifi- cation For Pre-Trained Neural Networks

    Hanjing Wang and Qiang Ji. Epistemic Uncertainty Quantifi- cation For Pre-Trained Neural Networks. pages 11052–11061,

  75. [75]

    Deep Visual Domain Adap- tation: A Survey.Neurocomputing, 312:135–153, 2018

    Mei Wang and Weihong Deng. Deep Visual Domain Adap- tation: A Survey.Neurocomputing, 312:135–153, 2018. arXiv:1802.03601 [cs]. 2

  76. [76]

    Springer Nature, 2025

    Yaqi Wang, Dahong Qian, Shuai Wang, Sergi Ben-Hamadou, Achraf Achraf Pujades, Luca Lumetti, Costantino Grana, and Federico Bolelli.Supervised and Semi-supervised Multi- structure Segmentation and Landmark Detection in Dental Data: MICCAI 2024 Challenges: ToothFairy 2024, 3DTeeth- Land 2024, and STS 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Mo...

  77. [77]

    Characterizing and Avoiding Negative Transfer

    Zirui Wang, Zihang Dai, Barnabas Poczos, and Jaime Car- bonell. Characterizing and Avoiding Negative Transfer. pages 11293–11302, 2019. 2

  78. [78]

    How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Trans- ferability

    Zijian Wang, Yadan Luo, Liang Zheng, Zi Huang, and Mahsa Baktashmotlagh. How Far Pre-trained Models Are from Neural Collapse on the Target Dataset Informs their Trans- ferability. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5526–5535, Paris, France,

  79. [79]

    Weiss, Taghi M

    Karl R. Weiss, Taghi M. Khoshgoftaar, and Dingding Wang. A survey of transfer learning.Journal of Big Data, 2016. 1

  80. [80]

    Meyerowitz, and Henrik J¨onsson

    Lisa Willis, Yassin Refahi, Raymond Wightman, Benoit Landrein, Jos ´e Teles, Kerwyn Casey Huang, Elliot M. Meyerowitz, and Henrik J¨onsson. Cell size and growth regu- lation in the Arabidopsis thaliana apical stem cell niche.Pro- ceedings of the National Academy of Sciences of the United States of America, 113(51):E8238–E8246, 2016. 5, 2

Showing first 80 references.