pith. sign in

arxiv: 2604.05409 · v1 · submitted 2026-04-07 · 💻 cs.CV

CRISP: Rank-Guided Iterative Squeezing for Robust Medical Image Segmentation under Domain Shift

Pith reviewed 2026-05-10 19:21 UTC · model grok-4.3

classification 💻 cs.CV
keywords medical image segmentationdomain shiftrobustnessrank stabilityiterative refinementparameter-free adaptationhigh-precision priors
0
0 comments X

The pith

Segmentation can be made robust to domain shifts by using stable probability rankings instead of absolute values and iteratively squeezing derived priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the relative ranking of predicted probabilities for voxels belonging to positive regions stays consistent even when medical images come from different scanners, patient groups, or imaging modalities. This stability lets the method simulate shifts by perturbing latent features, spot regions that keep high or low ranks, and build high-precision and high-recall priors from those patterns. An iterative training loop then refines the priors until they converge on the final segmentation mask. If the law holds, models would no longer require target-domain examples or hand-crafted shift simulations to stay accurate in open clinical settings.

Core claim

Guided by the Rank Stability of Positive Regions law, CRISP performs segmentation on the basis of rank rather than raw probabilities: latent feature perturbation reveals stable high-probability regions treated as destined positives and low-probability regions treated as safe negatives; high-precision and high-recall priors are built from these patterns and recursively refined by iterative squeezing until they meet at the output segmentation, delivering HD95 reductions up to 8.39 pixels across modality shifts without any target data or tunable parameters.

What carries the argument

The Rank Stability of Positive Regions empirical law, which supplies stable high-precision (HP) and high-recall (HR) priors from latent feature perturbation patterns that are then recursively squeezed together in an iterative training loop.

If this is right

  • Existing segmentation networks can be used across previously unseen clinical environments without collecting new labeled data or retraining.
  • The approach removes the need to enumerate or simulate every possible distribution shift in advance.
  • Boundary accuracy measured by HD95 improves consistently on multi-center cardiac MRI and lung vessel CT tasks.
  • The framework remains model-agnostic, so any base segmentation architecture can adopt the rank-guided squeezing procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rank-stability principle might extend to other dense prediction tasks such as object detection or registration under changing imaging conditions.
  • If the stability is confirmed across more modalities, the method could reduce the labeled-data burden for training robust medical AI systems.
  • Testing the perturbation step at different network depths would show whether the law is layer-specific or holds globally.

Load-bearing premise

The relative rank of predicted probabilities for positive voxels remains stable under arbitrary real-world distribution shifts.

What would settle it

A clear counter-example would be any test set in which the ordering of probabilities for ground-truth positive voxels changes substantially after a modality shift such as cardiac MRI to CT, even though the same voxels are still anatomically positive.

Figures

Figures reproduced from arXiv: 2604.05409 by Longxi Zhou, Pujin Cheng, Xiaoying Tang, Yixiang Liu, Yizhou Fang.

Figure 1
Figure 1. Figure 1: Overview of CRISP, a rank-based segmentation framework rather than [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative segmentation results of different methods. The top and bot [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Convergence analysis of Dice (left) and HD95 (right) scores [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Distribution shift in medical imaging remains a central bottleneck for the clinical translation of medical AI. Failure to address it can lead to severe performance degradation in unseen environments and exacerbate health inequities. Existing methods for domain adaptation are inherently limited by exhausting predefined possibilities through simulated shifts or pseudo-supervision. Such strategies struggle in the open-ended and unpredictable real world, where distribution shifts are effectively infinite. To address this challenge, we introduce an empirical law called ``Rank Stability of Positive Regions'', which states that the relative rank of predicted probabilities for positive voxels remains stable under distribution shift. Guided by this principle, we propose CRISP, a parameter-free and model-agnostic framework requiring no target-domain information. CRISP is the first framework to make segmentation based on rank rather than probabilities. CRISP simulates model behavior under distribution shift via latent feature perturbation, where voxel probability rankings exhibit two stable patterns: regions that consistently retain high probabilities (destined positives according to the principle) and those that remain low-probability (can be safely classified as negatives). Based on these patterns, we construct high-precision (HP) and high-recall (HR) priors and recursively refine them under perturbation. We then design an iterative training framework, making HP and HR progressively ``squeeze'' to the final segmentation. Extensive evaluations on multi-center cardiac MRI and CT-based lung vessel segmentation demonstrate CRISP's superior robustness, significantly outperforming state-of-the-art methods with striking HD95 reductions of up to 0.14 (7.0\% improvement), 1.90 (13.1\% improvement), and 8.39 (38.9\% improvement) pixels across multi-center, demographic, and modality shifts, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims to introduce an empirical law termed 'Rank Stability of Positive Regions,' asserting that the relative ranking of predicted probabilities for positive voxels in medical image segmentation remains stable under distribution shifts. Building on this, it presents CRISP, a parameter-free and model-agnostic framework that uses latent feature perturbation to identify stable high-probability regions (destined positives) and low-probability regions (negatives), from which high-precision (HP) and high-recall (HR) priors are constructed. These priors are then recursively refined through an iterative squeezing process during training to produce robust segmentations without requiring any target-domain data or parameters. The method is evaluated on multi-center cardiac MRI and CT-based lung vessel segmentation tasks, reporting HD95 improvements of up to 8.39 pixels (38.9% relative improvement) under modality shifts compared to state-of-the-art approaches.

Significance. If the proposed empirical law is valid and the latent perturbations effectively capture the effects of real distribution shifts, CRISP would offer a novel approach to achieving domain robustness in medical segmentation that sidesteps the need for target data or extensive simulations. This could have substantial impact on clinical translation by enabling models to generalize across centers, modalities, and demographics with minimal overhead. The strengths include its parameter-free design and the potential for broad applicability as a plug-in framework. However, the significance depends critically on empirical validation of the core assumptions, which appear to be based primarily on internal simulations.

major comments (3)
  1. [Abstract and Introduction] The 'Rank Stability of Positive Regions' is introduced as an empirical law, yet the provided abstract and description do not include any quantitative evidence, such as Spearman rank correlation or stability metrics, computed between source and target domain predictions on real data pairs. This verification is essential as the entire framework, including construction of HP and HR priors, rests on this law.
  2. [Method section on latent feature perturbation] The perturbation of latent features is used to simulate distribution shifts and observe stable patterns, but there is no demonstration that these perturbations produce shifts comparable to actual modality or demographic changes (e.g., via metrics like MMD or perceptual differences). If the perturbation is merely adding noise, the identified 'destined' regions may not generalize, undermining the recursive refinement claim.
  3. [Experiments and Results] While HD95 reductions are reported (e.g., 8.39 pixels on modality shifts), the manuscript lacks details on the number of runs, statistical significance tests, baseline implementations, and ablation studies isolating the contribution of the iterative squeezing versus the priors alone. This makes it difficult to attribute gains specifically to the rank-guided approach.
minor comments (2)
  1. [Abstract] The abstract lists HD95 improvements as 'up to 0.14 (7.0% improvement), 1.90 (13.1% improvement), and 8.39 (38.9% improvement) pixels' but does not specify which corresponds to multi-center, demographic, and modality shifts respectively in the summary sentence; clarify for readability.
  2. [Notation] The terms 'HP prior' and 'HR prior' are introduced without initial definition in the abstract; ensure they are clearly defined upon first use in the main text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment point by point below, agreeing where revisions are warranted and providing clarifications on the manuscript's contributions.

read point-by-point responses
  1. Referee: [Abstract and Introduction] The 'Rank Stability of Positive Regions' is introduced as an empirical law, yet the provided abstract and description do not include any quantitative evidence, such as Spearman rank correlation or stability metrics, computed between source and target domain predictions on real data pairs. This verification is essential as the entire framework, including construction of HP and HR priors, rests on this law.

    Authors: We agree that explicit quantitative validation of the Rank Stability of Positive Regions should appear in the Introduction. In the revised manuscript we will report average Spearman rank correlation coefficients (and related stability metrics) between positive-voxel probability rankings on real source-target pairs drawn from the multi-center cardiac MRI and multi-modality lung-vessel datasets. These correlations, which exceed 0.85 in our internal analyses, will be added to substantiate the empirical law before the CRISP framework is introduced. revision: yes

  2. Referee: [Method section on latent feature perturbation] The perturbation of latent features is used to simulate distribution shifts and observe stable patterns, but there is no demonstration that these perturbations produce shifts comparable to actual modality or demographic changes (e.g., via metrics like MMD or perceptual differences). If the perturbation is merely adding noise, the identified 'destined' regions may not generalize, undermining the recursive refinement claim.

    Authors: We appreciate this request for explicit validation. The latent-feature perturbations are not simple additive noise; they are applied to intermediate encoder activations to induce controlled variations that affect downstream probability rankings. In the revision we will add (i) Maximum Mean Discrepancy (MMD) distances between the distributions of perturbed latent features and real target-domain features, and (ii) qualitative side-by-side visualizations of perturbed versus real-shifted images. These additions will demonstrate that the perturbations capture relevant aspects of domain shift and thereby support the stability of the identified destined-positive and negative regions. revision: yes

  3. Referee: [Experiments and Results] While HD95 reductions are reported (e.g., 8.39 pixels on modality shifts), the manuscript lacks details on the number of runs, statistical significance tests, baseline implementations, and ablation studies isolating the contribution of the iterative squeezing versus the priors alone. This makes it difficult to attribute gains specifically to the rank-guided approach.

    Authors: We concur that these experimental details are essential. The revised Experiments section will report: the number of independent runs (five runs with distinct random seeds), p-values from paired statistical tests (e.g., Wilcoxon signed-rank) against each baseline, complete hyper-parameter settings and implementation details for all compared methods, and dedicated ablation tables that isolate the iterative squeezing stage from the static HP/HR priors. These additions will allow readers to attribute performance gains specifically to the rank-guided iterative refinement. revision: yes

Circularity Check

1 steps flagged

Rank Stability law and HP/HR priors both derived from internal latent perturbation patterns

specific steps
  1. self definitional [Abstract]
    "we introduce an empirical law called ``Rank Stability of Positive Regions'', which states that the relative rank of predicted probabilities for positive voxels remains stable under distribution shift. Guided by this principle, we propose CRISP... CRISP simulates model behavior under distribution shift via latent feature perturbation, where voxel probability rankings exhibit two stable patterns: regions that consistently retain high probabilities (destined positives according to the principle) and those that remain low-probability (can be safely classified as negatives). Based on these patterns"

    The law is presented as the first-principles justification for treating high/low-probability regions under perturbation as 'destined' positives/negatives. Yet those exact regions and their stability are identified by running the perturbation mechanism that CRISP itself defines and applies; the law therefore reduces to a naming of the patterns generated by the framework's core simulation step rather than an independent input.

full rationale

The paper's central derivation begins by positing the Rank Stability of Positive Regions as an independent empirical law that justifies rank-based segmentation and prior construction. However, the only evidence and operationalization of this law comes from applying the paper's own latent feature perturbation inside CRISP, which directly produces the 'stable patterns' used to label destined positives/negatives and build the HP/HR priors that are then iteratively squeezed. This makes the guiding principle and the method's key inputs co-dependent on the same simulation step rather than externally validated, though the subsequent recursive refinement and training loop add non-trivial implementation details.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests primarily on the introduced domain assumption of rank stability and the construction of new priors from internal simulations, with no free parameters but heavy dependence on the unverified generality of the law.

axioms (1)
  • domain assumption Rank Stability of Positive Regions: the relative rank of predicted probabilities for positive voxels remains stable under distribution shift.
    This empirical law is presented as the guiding principle for the entire framework and is invoked to justify perturbation-based prior construction.
invented entities (2)
  • High-precision (HP) prior no independent evidence
    purpose: Captures regions that consistently retain high probabilities under latent feature perturbation.
    Constructed from observed stable high-rank patterns to serve as a starting point for iterative refinement.
  • High-recall (HR) prior no independent evidence
    purpose: Captures regions that remain low-probability and can be safely treated as negatives.
    Constructed from observed stable low-rank patterns to complement the HP prior in the squeezing process.

pith-pipeline@v0.9.0 · 5622 in / 1485 out tokens · 62182 ms · 2026-05-10T19:21:14.210171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    we introduce an empirical law called 'Rank Stability of Positive Regions', which states that the relative rank of predicted probabilities for positive voxels remains stable under distribution shift... CRISP is the first framework to make segmentation based on rank rather than probabilities... construct high-precision (HP) and high-recall (HR) priors and recursively refine them under perturbation

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Latent Feature Perturbation... Gaussian noise ξ(n)∼N(0,σ²I) is injected... grade map G(n)_c... M_HP,c = ∩{G(n)_c = L-1}, M_HR,c = ∪{G(n)_c >0}... Uncertainty Squeezing Loss L_squeeze

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    EClinicalMedicine83(2025)

    Lawrence, R., Dodsworth, E., Massou, E., Sherlaw-Johnson, C., Ramsay, A.I.G., Walton, H., O’Regan, T., Gleeson, F., Crellin, N., Herbert, K., et al.: Artificial intelligence for diagnostics in radiology practice: a rapid systematic scoping review. EClinicalMedicine83(2025)

  2. [2]

    Chu, Y., Zhou, L., Luo, G., Kang, K., Dong, S., Han, Z., Wu, L., Meng, X., Yang, C., Guo, X., et al.: HorusEye: a self-supervised foundation model for generalizable X-ray tomography restoration. Nat. Comput. Sci. 1–16 (2026)

  3. [3]

    Journal of Medical Internet Research27, e66986 (2025)

    Arvai, N., Katonai, G., Mesko, B.: Health care professionals’ concerns about medical AI and psychological barriers and strategies for successful implementation: scoping review. Journal of Medical Internet Research27, e66986 (2025)

  4. [4]

    IEEE Trans

    Zhou, L., Li, Z., Zhou, J., Li, H., Chen, Y., Huang, Y., Xie, D., Zhao, L., Fan, M., Hashmi, S., et al.: A rapid, accurate and machine-agnostic segmentation and quan- tification method for CT-based COVID-19 diagnosis. IEEE Trans. Med. Imaging 39(8), 2638–2652 (2020)

  5. [5]

    European Radiology31(12), 9012–9021 (2021)

    Nam, J.G., Witanto, J.N., Park, S.J., Yoo, S.J., Goo, J.M., Yoon, S.H.: Automatic pulmonary vessel segmentation on noncontrast chest CT: deep learning algorithm developed using spatiotemporally matched virtual noncontrast images and low-keV contrast-enhanced vessel maps. European Radiology31(12), 9012–9021 (2021)

  6. [6]

    Proceedings of the IEEE (2024)

    Yoon, J.S., Oh, K., Shin, Y., Mazurowski, M.A., Suk, H.-I.: Domain generalization for medical image analysis: A review. Proceedings of the IEEE (2024)

  7. [7]

    Medical Image Analysis42, 60–88 (2017)

    Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A.W.M., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Medical Image Analysis42, 60–88 (2017)

  8. [8]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp

    Ronneberger,O.,Fischer,P.,Brox,T.:U-net:Convolutionalnetworksforbiomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241 (2015)

  9. [9]

    arXiv preprint arXiv:2407.04407 (2024)

    Luo, R., Zhou, Z.: Trustworthy classification through rank-based conformal predic- tion sets. arXiv preprint arXiv:2407.04407 (2024)

  10. [10]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp

    Jing, M., Zhen, X., Li, J., Snoek, C.G.M.: Order-preserving consistency regulariza- tion for domain adaptation and generalization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 18916–18927 (2023)

  11. [11]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp

    Zhang, G., Qi, X., Yan, B., Wang, G.: IPLC: iterative pseudo label correction guided by SAM for source-free domain adaptation in medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 351–360 (2024)

  12. [12]

    In: In- ternational Conference on Medical Image Computing and Computer-Assisted Inter- vention (MICCAI), pp

    Zhou, Y., Wu, J., Liao, W., Zhang, S., Zhang, S., Wang, G.: TEGDA: Test-Time Evaluation-Guided Dynamic Adaptation for Medical Image Segmentation. In: In- ternational Conference on Medical Image Computing and Computer-Assisted Inter- vention (MICCAI), pp. 628–637 (2025)

  13. [13]

    Neural Networks 184, 107073 (2025)

    Cheng, Z., Liu, M., Yan, C., Wang, S.: Dynamic domain generalization for medical image segmentation. Neural Networks 184, 107073 (2025)

  14. [14]

    Fang et al

    Liu, Q., Chen, C., Qin, J., Dou, Q., Heng, P.-A.: Feddg: Federated domain general- ization on medical image segmentation via episodic learning in continuous frequency 10 Y. Fang et al. space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pp. 1013–1023 (2021)

  15. [15]

    In: Proceedings of the European Conference on Computer Vision (ECCV), pp

    Chen,L.-C., Zhu, Y.,Papandreou,G., Schroff, F.,Adam,H.: Encoder-decoderwith atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

  16. [16]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: In- verted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)

  17. [17]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp

    Chen, Y., Wang, H., Wu, C., Cao, G.: Dual knowledge-aware guidance for source- free domain adaptive fundus image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 183–193 (2025)

  18. [18]

    Liu, W., Ni, Z., Zhu, X., Chen, Q., Ni, L., Xia, P.: Spectrum intervention based in- variant causal representation learning for single-domain generalizable medical image segmentation. Med. Image Anal. 103741 (2025)

  19. [19]

    IEEE Transactions on Medical Imaging 40(12), 3543–3554 (2021)

    Campello, V.M., Gkontra, P., Izquierdo, C., Martin-Isla, C., Sojoudi, A., Full, P.M., Maier-Hein, K., Zhang, Y., He, Z., Ma, J., et al.: Multi-centre, multi-vendor and multi-disease cardiac segmentation: the M&Ms challenge. IEEE Transactions on Medical Imaging 40(12), 3543–3554 (2021)

  20. [20]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp

    Gu, R., Zhang, J., Huang, R., Lei, W., Wang, G., Zhang, S.: Domain composi- tion and attention for unseen-domain generalizable medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 241–250 (2021)

  21. [21]

    Sculley, Alexander D’Amour, Balaji Lakshminarayanan, and Jasper Snoek

    Nado, Z., Padhy, S., Sculley, D., D’Amour, A., Lakshminarayanan, B., Snoek, J.: Evaluating prediction-time batch normalization for robustness under covariate shift. arXiv preprint arXiv:2006.10963 (2020)

  22. [22]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Carlucci, F.M., D’Innocente, A., Bucci, S., Caputo, B., Tommasi, T.: Domain gen- eralization by solving jigsaw puzzles. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2229–2238 (2019)

  23. [23]

    IET Image Processing 19(1), e70019 (2025)

    Wang, J., Nur Intan Raihana Ruhaiyem, N.I.R., Fu, P.: A comprehensive review of U-Net and its variants: advances and applications in medical image segmentation. IET Image Processing 19(1), e70019 (2025)