Pre-Deployment Robustness Stress Testing for CT Segmentation Systems Using Clinically Motivated Multi-Corruption Augmentation

Aarthi Sivasankaran; Amanpreet Kaur; CholMin Kanga; Jonghyun Chung; Nagesh Gulkotwar

arxiv: 2606.00491 · v2 · pith:DQXGZL7Mnew · submitted 2026-05-30 · 💻 cs.CV · cs.AI

Pre-Deployment Robustness Stress Testing for CT Segmentation Systems Using Clinically Motivated Multi-Corruption Augmentation

CholMin Kanga , Jonghyun Chung , Amanpreet Kaur , Nagesh Gulkotwar , Aarthi Sivasankaran This is my paper

Pith reviewed 2026-06-28 19:03 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords CT segmentationrobustnessdata augmentationmedical imagingimage corruptiondeep learningpre-deployment testingorgan segmentation

0 comments

The pith

RAMP multi-corruption augmentation narrows the clean-to-corrupted Dice gap in CT segmentation from 0.26 to 0.06.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RAMP, a training method that applies multiple image degradations such as noise, resolution loss, and artifacts to CT scans while learning organ segmentations. The goal is to prepare models for the inconsistent quality of real clinical scans instead of relying only on clean benchmark data. Readers would care because segmentation reliability matters for consistent automated analysis in radiology where scanner settings and patient factors produce varied image conditions. Results across two benchmarks show higher scores on degraded test images and a much smaller performance drop from clean to corrupted cases.

Core claim

RAMP combines anatomically constrained spatial perturbations, CT intensity transformations, and stochastic multi-corruption composition to expose models to clinically plausible image degradation during training. Across two CT segmentation evaluation settings, RAMP achieved the strongest corrupted-image performance and the smallest clean-to-corrupted robustness gap. In the five-organ noisy evaluation benchmark, RAMP improved mean corrupted Dice from 0.610 to 0.753 and reduced the robustness gap from 0.264 to 0.064 compared with the nnU-Net baseline. In Abdomen1K, RAMP improved mean corrupted Dice from 0.633 to 0.789 and reduced the robustness gap from 0.290 to 0.070.

What carries the argument

Robustness via Augmented Multi-corruption Pipeline (RAMP), which uses stochastic composition of spatial, intensity, and artifact corruptions to simulate heterogeneous clinical conditions during training.

If this is right

In the five-organ noisy benchmark, mean corrupted Dice rose from 0.610 to 0.753.
The robustness gap dropped from 0.264 to 0.064 in that setting.
Similar gains occurred in the Abdomen1K dataset with gap reduction from 0.290 to 0.070.
Models avoid severe segmentation collapse under strong degradation even if not topping clean-image scores.
Multi-corruption augmentation serves as a practical pre-deployment reliability strategy for heterogeneous clinical environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar augmentation strategies could be adapted for MRI or ultrasound segmentation tasks where image quality also varies.
Deployed systems might benefit from periodic re-training with site-specific corruption profiles drawn from local scanner data.
Combining RAMP with ensemble or uncertainty methods could further stabilize outputs when facing conditions outside the training corruptions.

Load-bearing premise

The specific set of corruptions and their stochastic composition rules used in RAMP are representative of the heterogeneous imaging conditions that will actually appear at deployment time.

What would settle it

A RAMP-trained model showing large Dice drops on real clinical CT scans that contain degradation types not included in the augmentation set would indicate the central claim does not hold.

Figures

Figures reproduced from arXiv: 2606.00491 by Aarthi Sivasankaran, Amanpreet Kaur, CholMin Kanga, Jonghyun Chung, Nagesh Gulkotwar.

**Figure 1.** Figure 1: Overall training and evaluation workflow. Segmentation models were trained using different augmentation strategies and evaluated under both clean and corrupted CT imaging conditions. The proposed RAMP framework was designed to improve robustness under heterogeneous image degradation rather than optimize only for clean-image Dice performance. is trained to predict ̂𝑦 = 𝑓𝜃 (𝑥) by minimizing the segmentation… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed RAMP augmentation framework. RAMP consists of anatomically constrained spatial perturbation, CT intensity transformation, and stochastic multi-corruption composition. During training, multiple image degradation operators are randomly selected and applied to each image to simulate heterogeneous clinical imaging conditions. receive spatial and intensity augmentations from the previo… view at source ↗

**Figure 3.** Figure 3: Clean-image Dice versus mean corrupted Dice across augmentation strategies. RAMP occupies a robustness-oriented operating point, achieving the highest corrupted-image performance despite not maximizing clean-image Dice. 4.4. Robustness Under High-Severity Corruptions We next examined performance under high-severity degradation, defined as corruption levels greater than or equal to 0.15. This analysis was i… view at source ↗

**Figure 4.** Figure 4: Corruption-wise robustness heatmap across augmentation strategies. Each cell represents mean Dice across severity levels for the corresponding corruption type. RAMP shows the strongest robustness under severe intensity- and artifact-related corruptions, including bias field, compound corruption, salt-and-pepper noise, and stripe artifacts. robustness gap and lower worst-case Dice, suggesting that smaller o… view at source ↗

**Figure 5.** Figure 5: Segmentation performance across corruption severity levels. RAMP shows a slower degradation pattern than conventional augmentation strategies, indicating improved stability under increasing image degradation [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Deep learning-based CT segmentation systems often achieve high accuracy on clean benchmark images, but their performance may degrade under heterogeneous clinical imaging conditions such as noise, resolution loss, contrast variation, intensity shift, and artifacts. This instability can limit reliable deployment in real-world medical imaging workflows. We propose Robustness via Augmented Multi-corruption Pipeline (RAMP), a robustness-oriented augmentation framework for CT segmentation. RAMP combines anatomically constrained spatial perturbations, CT intensity transformations, and stochastic multi-corruption composition to expose models to clinically plausible image degradation during training. Across two CT segmentation evaluation settings, RAMP achieved the strongest corrupted-image performance and the smallest clean-to-corrupted robustness gap. In the five-organ noisy evaluation benchmark, RAMP improved mean corrupted Dice from 0.610 to 0.753 and reduced the robustness gap from 0.264 to 0.064 compared with the nnU-Net baseline. In Abdomen1K, RAMP improved mean corrupted Dice from 0.633 to 0.789 and reduced the robustness gap from 0.290 to 0.070. Although RAMP did not achieve the highest clean-image Dice, it substantially mitigated worst-case segmentation collapse under severe image degradation. These results suggest that multi-corruption augmentation can serve as a practical pre-deployment strategy for improving the reliability of CT segmentation systems in heterogeneous clinical environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAMP packages existing augmentations into a named pipeline and shows clear in-distribution gains on corrupted CT test sets, but the evaluation does not test generalization to unseen clinical degradations.

read the letter

RAMP is a training-time augmentation recipe that mixes anatomically constrained spatial changes, intensity shifts, and stochastic multi-corruption for CT segmentation models. On the reported benchmarks it lifts mean corrupted Dice from 0.610 to 0.753 on the five-organ set and from 0.633 to 0.789 on Abdomen1K, while shrinking the clean-to-corrupted gap by roughly 75 percent relative to nnU-Net. Those numbers are the concrete result.

The paper does one thing cleanly: it assembles standard augmentation families into a single, named procedure and measures the effect on two fixed benchmarks. That supplies a practical data point for anyone already running nnU-Net-style training who wants a drop-in robustness tweak.

The soft spot is exactly the one flagged in the stress-test note. The corrupted test images appear to be drawn from the same corruption families and stochastic rules used inside RAMP, so the smaller gap demonstrates robustness inside the training distribution rather than to new degradations such as metal streak, motion blur, or scanner ring artifacts that were never seen during augmentation. The abstract calls the corruptions “clinically plausible,” but without held-out families the claim that the method prepares models for heterogeneous deployment conditions rests on an untested assumption.

Methods details on exact parameter ranges, whether corruption strengths were tuned after seeing test performance, and statistical testing are not visible in the abstract, though the full text presumably supplies them. No new math or derivation is offered; the work is an empirical comparison.

This paper is for medical-imaging practitioners who need a concrete augmentation recipe for CT segmentation. A reader already working on deployment robustness would extract usable numbers and a pipeline description. It is coherent on its own terms and has enough empirical content to merit peer review, provided referees press for out-of-distribution corruption results.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Robustness via Augmented Multi-corruption Pipeline (RAMP), an augmentation framework combining anatomically constrained spatial perturbations, CT intensity transformations, and stochastic multi-corruption composition to train CT segmentation models. It reports that RAMP yields the strongest corrupted-image performance and smallest clean-to-corrupted robustness gap on two benchmarks, improving mean corrupted Dice from 0.610 to 0.753 (gap reduced from 0.264 to 0.064) on a five-organ noisy evaluation benchmark and from 0.633 to 0.789 (gap from 0.290 to 0.070) on Abdomen1K, relative to an nnU-Net baseline.

Significance. If the robustness gains generalize beyond the specific corruptions used in training, the approach could provide a practical pre-deployment method for mitigating segmentation collapse under heterogeneous clinical imaging conditions. The reported numeric improvements in corrupted Dice and robustness gap are substantial and directly address a known deployment limitation of CT segmentation systems.

major comments (2)

[Evaluation] The evaluation uses test images generated from the identical stochastic multi-corruption composition rules employed inside RAMP training. This measures in-distribution robustness rather than generalization to the broader heterogeneous clinical degradations claimed in the abstract (e.g., metal streak, motion blur, or scanner-specific ring artifacts absent from the augmentation set). Without held-out corruption families, the central claim that RAMP improves reliability under 'clinically plausible image degradation' is not fully supported. (Abstract; evaluation benchmarks description)
[Methods] Exact corruption parameters, stochastic composition rules, and any post-hoc selection of corruption strengths are not reported, nor are statistical significance tests for the Dice improvements. These omissions prevent assessment of whether the gains are robust or reproducible. (Methods section)

minor comments (1)

[Abstract] The abstract refers to a 'five-organ noisy evaluation benchmark' without specifying its construction details, relation to public datasets, or how the clean vs. corrupted splits were formed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major point below and will revise accordingly to improve clarity, reproducibility, and the scope of the claims.

read point-by-point responses

Referee: [Evaluation] The evaluation uses test images generated from the identical stochastic multi-corruption composition rules employed inside RAMP training. This measures in-distribution robustness rather than generalization to the broader heterogeneous clinical degradations claimed in the abstract (e.g., metal streak, motion blur, or scanner-specific ring artifacts absent from the augmentation set). Without held-out corruption families, the central claim that RAMP improves reliability under 'clinically plausible image degradation' is not fully supported. (Abstract; evaluation benchmarks description)

Authors: We agree that the reported benchmarks evaluate robustness to the same corruption families used in training and thus constitute in-distribution evaluation. While the corruptions were selected to reflect common clinical issues (noise, resolution loss, contrast/intensity variation), this does not demonstrate generalization to entirely unseen artifact types. In the revision we will add a new experiment section using held-out corruption families (metal streak artifacts, motion blur, and ring artifacts) that are not part of the RAMP training distribution. We will also revise the abstract and discussion to state more precisely that the gains apply to the clinically motivated degradations included in the augmentation pipeline. revision: yes
Referee: [Methods] Exact corruption parameters, stochastic composition rules, and any post-hoc selection of corruption strengths are not reported, nor are statistical significance tests for the Dice improvements. These omissions prevent assessment of whether the gains are robust or reproducible. (Methods section)

Authors: We accept this criticism. The revised manuscript will include a detailed supplementary appendix that specifies all corruption parameters (noise variances, blur kernel sizes, intensity shift ranges, etc.), the exact stochastic composition probabilities and ordering rules, and the procedure used to select corruption strengths. We will also add statistical significance testing (paired Wilcoxon signed-rank tests with Bonferroni correction) for all reported Dice improvements and include the resulting p-values in the main results tables. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivation chain

full rationale

The paper proposes an augmentation framework (RAMP) and reports empirical results on two CT segmentation benchmarks, measuring Dice scores and robustness gaps under applied corruptions. No mathematical derivations, predictions from fitted parameters, uniqueness theorems, or ansatzes are claimed. The central results (e.g., Dice improvements from 0.610 to 0.753) are direct experimental measurements on test images, not reductions to inputs by construction. Self-citations, if present, are not load-bearing for any premise. The work is self-contained as an experimental study; the skeptic concern about corruption overlap is a question of experimental design validity, not circularity in a derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the unverified premise that the chosen corruptions match future clinical distributions.

pith-pipeline@v0.9.1-grok · 5803 in / 1095 out tokens · 17875 ms · 2026-06-28T19:03:32.263010+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 1 canonical work pages · 1 internal anchor

[1]

and Fischer, P

Ronneberger, O. and Fischer, P. and Brox, T. , title =. Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015 , volume =. 2015 , pages =

2015
[2]

2016 , pages =

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , booktitle =. 2016 , pages =

2016
[3]

and Navab, N

Milletari, F. and Navab, N. and Ahmadi, S.-A. , title =. Proceedings of the Fourth International Conference on 3D Vision , volume =. 2016 , pages =

2016
[4]

and Jaeger, P

Isensee, F. and Jaeger, P. F. and Kohl, S. A. A. and Petersen, J. and Maier-Hein, K. H. , title =. Nat. Methods , volume =. 2021 , pages =

2021
[5]

and Kooi, T

Litjens, G. and Kooi, T. and Bejnordi, B. E. and Setio, A. A. A. and Ciompi, F. and Ghafoorian, M. and van der Laak, J. A. W. M. and van Ginneken, B. and S. A Survey on Deep Learning in Medical Image Analysis , journal =. 2017 , pages =

2017
[6]

and Reinke, A

Antonelli, M. and Reinke, A. and Bakas, S. and Farahani, K. and Kopp-Schneider, A. and Landman, B. A. and Litjens, G. and Menze, B. and Ronneberger, O. and Summers, R. M. and others , title =. Nat. Commun. , volume =. 2022 , pages =

2022
[7]

and Zhang, Y

Ma, J. and Zhang, Y. and Gu, S. and Zhu, C. and Ge, C. and Zhang, Y. and An, X. and Wang, C. and Wang, Q. and Liu, X. and Cao, S. and Zhang, Q. and Liu, S. and Wang, Y. and Li, Y. and He, J. and Yang, X. , title =. IEEE Trans. Pattern Anal. Mach. Intell. , volume =. 2022 , pages =

2022
[8]

and Breit, H.-C

Wasserthal, J. and Breit, H.-C. and Meyer, M. T. and Pradella, M. and Hinck, D. and Sauter, A. W. and Heye, T. and Boll, D. and Cyriac, J. and Yang, S. and Bach, M. and Segeroth, M. , title =. Radiol. Artif. Intell. , volume =. 2023 , pages =

2023
[9]

and Liu, M

Guan, H. and Liu, M. , title =. IEEE Trans. Biomed. Eng. , volume =. 2022 , pages =

2022
[10]

Zech, J. R. and Badgeley, M. A. and Liu, M. and Costa, A. B. and Titano, J. J. and Oermann, E. K. , title =. PLOS Med. , volume =. 2018 , pages =

2018
[11]

and Dietterich, T

Hendrycks, D. and Dietterich, T. , title =. International Conference on Learning Representations , volume =. 2019 , pages =

2019
[12]

and Jacobsen, J.-H

Geirhos, R. and Jacobsen, J.-H. and Michaelis, C. and Zemel, R. and Brendel, W. and Bethge, M. and Wichmann, F. A. , title =. Nat. Mach. Intell. , volume =. 2020 , pages =

2020
[13]

Kelly, C. J. and Karthikesalingam, A. and Suleyman, M. and Corrado, G. and King, D. , title =. BMC Med. , volume =. 2019 , pages =

2019
[14]

and Saria, S

Wiens, J. and Saria, S. and Sendak, M. and Ghassemi, M. and Liu, V. X. and Doshi-Velez, F. and Jung, K. and Heller, K. and Kale, D. and Saeed, M. and Ossorio, P. N. and Thadaney-Israni, S. and Goldenberg, A. , title =. Nat. Med. , volume =. 2019 , pages =

2019
[15]

and Beam, A

Yu, K.-H. and Beam, A. L. and Kohane, I. S. , title =. Nat. Biomed. Eng. , volume =. 2018 , pages =

2018
[16]

Topol, E. J. , title =. Nat. Med. , volume =. 2019 , pages =

2019
[17]

and Chen, Y

Nagendran, M. and Chen, Y. and Lovejoy, C. A. and Gordon, A. C. and Komorowski, M. and Harvey, H. and Topol, E. J. and Ioannidis, J. P. A. and Collins, G. S. and Maruthappu, M. , title =. BMJ , volume =. 2020 , pages =

2020
[18]

and Moy, L

Mongan, J. and Moy, L. and Kahn, C. E. Jr. , title =. Radiol. Artif. Intell. , volume =. 2020 , pages =

2020
[19]

and Nagendran, M

Vasey, B. and Nagendran, M. and Campbell, B. and Clifton, D. A. and Collins, G. S. and Denaxas, S. and Denniston, A. K. and Faes, L. and Geerts, B. and Ibrahim, M. and Liu, X. and Mateen, B. A. and Mathur, P. and McCradden, M. D. and Morgan, L. and Ordish, J. and Rogers, C. and Saria, S. and Ting, D. S. W. and Watkinson, P. and Weber, W. and Wheatstone, P...

2022
[20]

and Dunnmon, J

Oakden-Rayner, L. and Dunnmon, J. and Carneiro, G. and R. Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging , booktitle =. 2020 , pages =

2020
[21]

Goodfellow, I. J. and Shlens, J. and Szegedy, C. , title =. International Conference on Learning Representations , volume =. 2015 , pages =

2015
[22]

Finlayson, S. G. and Bowers, J. D. and Ito, J. and Zittrain, J. L. and Beam, A. L. and Kohane, I. S. , title =. Science , volume =. 2019 , pages =

2019
[23]

and Khoshgoftaar, T

Shorten, C. and Khoshgoftaar, T. M. , title =. J. Big Data , volume =. 2019 , pages =

2019
[24]

Cubuk, E. D. and Zoph, B. and Mane, D. and Vasudevan, V. and Le, Q. V. , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , volume =. 2019 , pages =

2019
[25]

and Mu, N

Hendrycks, D. and Mu, N. and Cubuk, E. D. and Zoph, B. and Gilmer, J. and Lakshminarayanan, B. , title =. International Conference on Learning Representations , volume =. 2020 , pages =

2020
[26]

TorchIO: A Python Library for Efficient Loading, Preprocessing, Augmentation and Patch-Based Sampling of Medical Images in Deep Learning , journal =

P. TorchIO: A Python Library for Efficient Loading, Preprocessing, Augmentation and Patch-Based Sampling of Medical Images in Deep Learning , journal =. 2021 , pages =

2021
[27]

and Li, W

Gibson, E. and Li, W. and Sudre, C. and Fidon, L. and Shakir, D. I. and Wang, G. and Eaton-Rosen, Z. and Gray, R. and Doel, T. and Hu, Y. and Whyntie, T. and Nachev, P. and Modat, M. and Barratt, D. C. and Ourselin, S. and Cardoso, M. J. and Vercauteren, T. , title =. Comput. Methods Programs Biomed. , volume =. 2018 , pages =

2018
[28]

Cardoso, M. J. and Li, W. and Brown, R. and Ma, N. and Kerfoot, E. and Wang, Y. and Murrey, B. and Myronenko, A. and Zhao, C. and Yang, D. and Nath, V. and He, Y. and Xu, Z. and Hatamizadeh, A. and Zhu, W. and Liu, Y. and Zheng, M. and Tang, Y. and Yang, I. and Zephyr, M. and Hashemian, B. and Alle, S. and Darestani, M. Z. and Budd, C. and Modat, M. and V...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[29]

and Sodha, V

Zhou, Z. and Sodha, V. and Rahman Siddiquee, M. M. and Feng, R. and Tajbakhsh, N. and Gotway, M. B. and Liang, J. , title =. Med. Image Anal. , volume =. 2021 , pages =

2021
[30]

and Yang, D

Hatamizadeh, A. and Yang, D. and Roth, H. and Xu, D. , title =. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , volume =. 2022 , pages =

2022
[31]

Menze, B. H. and Jakab, A. and Bauer, S. and Kalpathy-Cramer, J. and Farahani, K. and Kirby, J. and Burren, Y. and Porz, N. and Slotboom, J. and Wiest, R. and Lanczi, L. and Gerstner, E. and Weber, M.-A. and Arbel, T. and Avants, B. B. and Ayache, N. and Buendia, P. and Collins, D. L. and Cordier, N. and Corso, J. J. and Criminisi, A. and Das, T. and Deli...

2015
[32]

and Bai, H

Ji, Y. and Bai, H. and Yang, J. and Ge, C. and Zhu, Y. and Zhang, R. and Li, Z. and Zhang, L. and Ma, W. and Wan, X. and Luo, P. , title =. Advances in Neural Information Processing Systems , volume =. 2022 , pages =

2022

[1] [1]

and Fischer, P

Ronneberger, O. and Fischer, P. and Brox, T. , title =. Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015 , volume =. 2015 , pages =

2015

[2] [2]

2016 , pages =

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , booktitle =. 2016 , pages =

2016

[3] [3]

and Navab, N

Milletari, F. and Navab, N. and Ahmadi, S.-A. , title =. Proceedings of the Fourth International Conference on 3D Vision , volume =. 2016 , pages =

2016

[4] [4]

and Jaeger, P

Isensee, F. and Jaeger, P. F. and Kohl, S. A. A. and Petersen, J. and Maier-Hein, K. H. , title =. Nat. Methods , volume =. 2021 , pages =

2021

[5] [5]

and Kooi, T

Litjens, G. and Kooi, T. and Bejnordi, B. E. and Setio, A. A. A. and Ciompi, F. and Ghafoorian, M. and van der Laak, J. A. W. M. and van Ginneken, B. and S. A Survey on Deep Learning in Medical Image Analysis , journal =. 2017 , pages =

2017

[6] [6]

and Reinke, A

Antonelli, M. and Reinke, A. and Bakas, S. and Farahani, K. and Kopp-Schneider, A. and Landman, B. A. and Litjens, G. and Menze, B. and Ronneberger, O. and Summers, R. M. and others , title =. Nat. Commun. , volume =. 2022 , pages =

2022

[7] [7]

and Zhang, Y

Ma, J. and Zhang, Y. and Gu, S. and Zhu, C. and Ge, C. and Zhang, Y. and An, X. and Wang, C. and Wang, Q. and Liu, X. and Cao, S. and Zhang, Q. and Liu, S. and Wang, Y. and Li, Y. and He, J. and Yang, X. , title =. IEEE Trans. Pattern Anal. Mach. Intell. , volume =. 2022 , pages =

2022

[8] [8]

and Breit, H.-C

Wasserthal, J. and Breit, H.-C. and Meyer, M. T. and Pradella, M. and Hinck, D. and Sauter, A. W. and Heye, T. and Boll, D. and Cyriac, J. and Yang, S. and Bach, M. and Segeroth, M. , title =. Radiol. Artif. Intell. , volume =. 2023 , pages =

2023

[9] [9]

and Liu, M

Guan, H. and Liu, M. , title =. IEEE Trans. Biomed. Eng. , volume =. 2022 , pages =

2022

[10] [10]

Zech, J. R. and Badgeley, M. A. and Liu, M. and Costa, A. B. and Titano, J. J. and Oermann, E. K. , title =. PLOS Med. , volume =. 2018 , pages =

2018

[11] [11]

and Dietterich, T

Hendrycks, D. and Dietterich, T. , title =. International Conference on Learning Representations , volume =. 2019 , pages =

2019

[12] [12]

and Jacobsen, J.-H

Geirhos, R. and Jacobsen, J.-H. and Michaelis, C. and Zemel, R. and Brendel, W. and Bethge, M. and Wichmann, F. A. , title =. Nat. Mach. Intell. , volume =. 2020 , pages =

2020

[13] [13]

Kelly, C. J. and Karthikesalingam, A. and Suleyman, M. and Corrado, G. and King, D. , title =. BMC Med. , volume =. 2019 , pages =

2019

[14] [14]

and Saria, S

Wiens, J. and Saria, S. and Sendak, M. and Ghassemi, M. and Liu, V. X. and Doshi-Velez, F. and Jung, K. and Heller, K. and Kale, D. and Saeed, M. and Ossorio, P. N. and Thadaney-Israni, S. and Goldenberg, A. , title =. Nat. Med. , volume =. 2019 , pages =

2019

[15] [15]

and Beam, A

Yu, K.-H. and Beam, A. L. and Kohane, I. S. , title =. Nat. Biomed. Eng. , volume =. 2018 , pages =

2018

[16] [16]

Topol, E. J. , title =. Nat. Med. , volume =. 2019 , pages =

2019

[17] [17]

and Chen, Y

Nagendran, M. and Chen, Y. and Lovejoy, C. A. and Gordon, A. C. and Komorowski, M. and Harvey, H. and Topol, E. J. and Ioannidis, J. P. A. and Collins, G. S. and Maruthappu, M. , title =. BMJ , volume =. 2020 , pages =

2020

[18] [18]

and Moy, L

Mongan, J. and Moy, L. and Kahn, C. E. Jr. , title =. Radiol. Artif. Intell. , volume =. 2020 , pages =

2020

[19] [19]

and Nagendran, M

Vasey, B. and Nagendran, M. and Campbell, B. and Clifton, D. A. and Collins, G. S. and Denaxas, S. and Denniston, A. K. and Faes, L. and Geerts, B. and Ibrahim, M. and Liu, X. and Mateen, B. A. and Mathur, P. and McCradden, M. D. and Morgan, L. and Ordish, J. and Rogers, C. and Saria, S. and Ting, D. S. W. and Watkinson, P. and Weber, W. and Wheatstone, P...

2022

[20] [20]

and Dunnmon, J

Oakden-Rayner, L. and Dunnmon, J. and Carneiro, G. and R. Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging , booktitle =. 2020 , pages =

2020

[21] [21]

Goodfellow, I. J. and Shlens, J. and Szegedy, C. , title =. International Conference on Learning Representations , volume =. 2015 , pages =

2015

[22] [22]

Finlayson, S. G. and Bowers, J. D. and Ito, J. and Zittrain, J. L. and Beam, A. L. and Kohane, I. S. , title =. Science , volume =. 2019 , pages =

2019

[23] [23]

and Khoshgoftaar, T

Shorten, C. and Khoshgoftaar, T. M. , title =. J. Big Data , volume =. 2019 , pages =

2019

[24] [24]

Cubuk, E. D. and Zoph, B. and Mane, D. and Vasudevan, V. and Le, Q. V. , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , volume =. 2019 , pages =

2019

[25] [25]

and Mu, N

Hendrycks, D. and Mu, N. and Cubuk, E. D. and Zoph, B. and Gilmer, J. and Lakshminarayanan, B. , title =. International Conference on Learning Representations , volume =. 2020 , pages =

2020

[26] [26]

TorchIO: A Python Library for Efficient Loading, Preprocessing, Augmentation and Patch-Based Sampling of Medical Images in Deep Learning , journal =

P. TorchIO: A Python Library for Efficient Loading, Preprocessing, Augmentation and Patch-Based Sampling of Medical Images in Deep Learning , journal =. 2021 , pages =

2021

[27] [27]

and Li, W

Gibson, E. and Li, W. and Sudre, C. and Fidon, L. and Shakir, D. I. and Wang, G. and Eaton-Rosen, Z. and Gray, R. and Doel, T. and Hu, Y. and Whyntie, T. and Nachev, P. and Modat, M. and Barratt, D. C. and Ourselin, S. and Cardoso, M. J. and Vercauteren, T. , title =. Comput. Methods Programs Biomed. , volume =. 2018 , pages =

2018

[28] [28]

Cardoso, M. J. and Li, W. and Brown, R. and Ma, N. and Kerfoot, E. and Wang, Y. and Murrey, B. and Myronenko, A. and Zhao, C. and Yang, D. and Nath, V. and He, Y. and Xu, Z. and Hatamizadeh, A. and Zhu, W. and Liu, Y. and Zheng, M. and Tang, Y. and Yang, I. and Zephyr, M. and Hashemian, B. and Alle, S. and Darestani, M. Z. and Budd, C. and Modat, M. and V...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[29] [29]

and Sodha, V

Zhou, Z. and Sodha, V. and Rahman Siddiquee, M. M. and Feng, R. and Tajbakhsh, N. and Gotway, M. B. and Liang, J. , title =. Med. Image Anal. , volume =. 2021 , pages =

2021

[30] [30]

and Yang, D

Hatamizadeh, A. and Yang, D. and Roth, H. and Xu, D. , title =. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , volume =. 2022 , pages =

2022

[31] [31]

Menze, B. H. and Jakab, A. and Bauer, S. and Kalpathy-Cramer, J. and Farahani, K. and Kirby, J. and Burren, Y. and Porz, N. and Slotboom, J. and Wiest, R. and Lanczi, L. and Gerstner, E. and Weber, M.-A. and Arbel, T. and Avants, B. B. and Ayache, N. and Buendia, P. and Collins, D. L. and Cordier, N. and Corso, J. J. and Criminisi, A. and Das, T. and Deli...

2015

[32] [32]

and Bai, H

Ji, Y. and Bai, H. and Yang, J. and Ge, C. and Zhu, Y. and Zhang, R. and Li, Z. and Zhang, L. and Ma, W. and Wan, X. and Luo, P. , title =. Advances in Neural Information Processing Systems , volume =. 2022 , pages =

2022