arxiv: 2605.03098 · v1 · submitted 2026-05-04 · 💻 cs.CV

Recognition: 1 theorem link

One Sequence to Segment Them All: Efficient Data Augmentation for CT and MRI Cross-Domain 3D Spine Segmentation

Nathan Molinier , Hendrik M\"oller , Thomas Dagonneau , Anna Curto-Vilalta , Robert Graf , Matan Atad , Daniel Rueckert , Jan S. Kirschke

show 1 more author

Julien Cohen-Adad

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords data augmentationcross-domain segmentationspine segmentationCT MRI generalization3D medical imagingmodel robustnessnnUNet

0 comments

The pith

Targeted data augmentations let a spine segmentation model trained on one CT or MRI sequence generalize to seven unseen domains including the opposite modality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether data augmentation can overcome the limitation that deep learning segmentation models trained on one imaging sequence or modality typically fail on others due to variations in scanner protocols and contrasts. Three models are each trained on a single-modality spine dataset and then evaluated on seven out-of-distribution CT and MRI datasets that reflect realistic deployment conditions. The chosen augmentations produce large gains in Dice scores on the unseen domains while leaving performance on the original training domain essentially unchanged and even accelerating training. The method is packaged as an open-source toolbox for direct use in standard frameworks.

Core claim

A set of GPU-optimized data augmentations applied during training on a single acquisition sequence enables 3D spine segmentation models to achieve an average Dice score gain of 155 percent across seven out-of-distribution datasets spanning CT and MRI sequences and contrasts, while incurring an average Dice decrease of only 0.008 percent on in-domain test sets and improving training speed by roughly 10 percent.

What carries the argument

The targeted set of data augmentation techniques that simulate cross-sequence and cross-modality variations, implemented with GPU optimization to avoid extra training cost.

If this is right

Models exhibit an average 155 percent Dice gain on unseen domains.
In-domain accuracy is preserved with an average Dice drop of only 0.008 percent.
Transfer works in both directions between CT and MRI.
Training runs approximately 10 percent faster despite the stronger augmentations.
The released toolbox integrates directly into nnUNet and MONAI pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Clinics could train segmentation models on smaller single-site datasets and still expect usable performance across varied scanners without collecting new annotations.
The efficiency improvement may make aggressive augmentation the default choice rather than an optional extra in medical imaging workflows.
The same augmentation strategy could be tested on other anatomical targets or on 2D slice-based models to check whether the cross-modality benefit generalizes.

Load-bearing premise

The selected augmentations and the seven test datasets capture the range of real clinical scanner and protocol differences that models encounter in practice.

What would settle it

Retraining with the same augmentations on a fresh collection of CT and MRI spine volumes from additional institutions and observing no Dice improvement on the new out-of-domain cases would show the reported gains are not general.

Figures

Figures reproduced from arXiv: 2605.03098 by Anna Curto-Vilalta, Daniel Rueckert, Hendrik M\"oller, Jan S. Kirschke, Julien Cohen-Adad, Matan Atad, Nathan Molinier, Robert Graf, Thomas Dagonneau.

**Figure 1.** Figure 1: Example images from the different datasets. In order left to right: MM (CT), Spider (MRI T1w, T2w) [6] and SG (CT, MRI Dixon water, in-phase, and fat) view at source ↗

**Figure 2.** Figure 2: Example transforms applied to one SG CT (top) and SG Dixon in-phase image (bottom) of the same subject. The columns are the different transforms applied to these images. Standard geometric transformations (e.g., rotations and flips) and other baseline augmentations (e.g., Gaussian noise, blurring, and resampling) are kept with the same configuration as the normal nnUNet trainer. The order of the transfor… view at source ↗

**Figure 3.** Figure 3: A qualitative comparison between predictions produced by the baseline and our setup on random test samples. The rows represent different samples, labeled with the dataset name and sequence. The columns display the ground-truth segmentation of the test image, along with the predicted segmentations from both the baseline and our method, for each of the three training sets. Red arrows highlight areas with pre… view at source ↗

read the original abstract

Deep learning-based medical image segmentation is increasingly used to support clinical diagnosis and develop new treatment strategies. However, model performance remains limited by the scarcity of high-quality annotated data and insufficient generalization across imaging protocols. This limitation is particularly evident in MRI and CT, where models are typically trained on a single acquisition sequence and exhibit reduced robustness when applied to unseen sequences or contrasts. Although data augmentation is widely used to improve general robustness on medical images, its impact on cross-modality generalization has not been quantitatively explored. In this work, we study a targeted set of data augmentation techniques designed to improve cross-modality transfer. We train three spine segmentation models, each on a single-modality/sequence dataset, and evaluate them across seven out-of-distribution datasets (spanning CT and MRI), reflecting a realistic single-sequence training and multi-sequence/contrast/modality deployment scenario. Our results demonstrate substantial performance gains on unseen domains (average Dice gain of 155 %) while preserving in-domain accuracy (average Dice decrease of 0.008 %), including effective transfer between CT and MRI. To mitigate the computational cost typically associated with strong data augmentation, we implement GPU-optimized augmentations that maintain, and even improve, training efficiency by approximately 10 %. We release our approach as an open-source toolbox, enabling seamless integration into commonly used frameworks such as nnUNet and MONAI. These augmentations significantly enhance robustness to heterogeneous clinical imaging scenarios without compromising training speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Targeted augmentations improve cross-modality spine segmentation on held-out data but the 155% relative Dice gain needs absolute per-dataset numbers to judge real impact.

read the letter

The main thing to know about this paper is that a set of targeted data augmentations, made GPU-efficient, allows models trained on single CT or MRI sequences for 3D spine segmentation to perform much better on seven out-of-distribution datasets spanning different modalities and contrasts, with an average relative Dice gain of 155% and negligible loss in-domain. They also open-source the toolbox for easy use in standard frameworks. This is new in the sense that it provides a quantitative look at how augmentation affects cross-modality transfer specifically for spine, rather than proposing a novel network. It does well by evaluating on a realistic mix of external datasets and showing CT-MRI transfer works to some degree. The code release and training speed improvement make it immediately usable for others facing similar data heterogeneity issues. Where it is softer is on the interpretation of the gains. Relative percentage improvements like 155% can look larger than they are if the starting Dice scores on OOD data are very low, which is common in these cross-domain settings. The abstract lacks the per-dataset absolute values and aggregation details, so one cannot yet tell if the results are clinically meaningful across the board or concentrated in a few weak baselines. The stress-test concern holds until the full numbers are checked. The methods description is also high-level here, though the paper likely fills that in. This paper is for people working on robust medical image segmentation, particularly those dealing with multi-protocol clinical data. It would be worth a reading group discussion for the practical augmentation choices and results. It deserves peer review because the core idea is testable, the evaluation uses multiple held-out sets, and the open code allows verification. Referees could push for the absolute metrics and more on the augmentation specifics to make the claims tighter.

Referee Report

1 major / 2 minor

Summary. The paper proposes a targeted set of data augmentation techniques to improve cross-modality generalization in 3D spine segmentation models trained on single CT or MRI sequences. Three models are trained on individual datasets and evaluated on seven out-of-distribution datasets spanning CT and MRI contrasts; the work reports an average relative Dice gain of 155% on OOD data with negligible in-domain degradation (0.008%), GPU-optimized implementations that improve training efficiency by approximately 10%, and releases an open-source toolbox for integration with nnUNet and MONAI.

Significance. If the reported gains hold after providing absolute per-dataset metrics and confirming they are attributable to the augmentations, the work would offer a practical, lightweight approach to addressing domain shift in medical segmentation without requiring multi-domain training data or complex adaptation methods. The emphasis on computational efficiency and the open-source release are clear strengths that could aid reproducibility and adoption in clinical pipelines.

major comments (1)

Abstract: The central claim of substantial OOD gains rests on an average relative Dice improvement of 155%. Relative percentages are sensitive to low baseline values (common in cross-modality spine segmentation); the manuscript must report the per-dataset baseline Dice scores, absolute improvements, the exact aggregation method for the average, and whether any OOD dataset was excluded. Without these, it is impossible to determine whether the gains are uniformly meaningful or driven by near-zero baselines, directly affecting the strength of the 'one sequence to segment them all' conclusion.

minor comments (2)

Abstract: The phrase 'average Dice decrease of 0.008 %' should clarify whether this is an absolute or relative change and provide the corresponding standard deviation or range across the in-domain evaluations.
The manuscript should include a table or figure with per-dataset Dice scores (baseline vs. augmented) for both in-domain and OOD evaluations to allow direct assessment of practical impact.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The major comment raises an important point about the interpretability of relative Dice gains, and we address it directly below by committing to a clear revision.

read point-by-point responses

Referee: Abstract: The central claim of substantial OOD gains rests on an average relative Dice improvement of 155%. Relative percentages are sensitive to low baseline values (common in cross-modality spine segmentation); the manuscript must report the per-dataset baseline Dice scores, absolute improvements, the exact aggregation method for the average, and whether any OOD dataset was excluded. Without these, it is impossible to determine whether the gains are uniformly meaningful or driven by near-zero baselines, directly affecting the strength of the 'one sequence to segment them all' conclusion.

Authors: We agree that relative improvements must be accompanied by absolute values and per-dataset breakdowns to avoid misinterpretation, especially when baselines may be low in cross-modality settings. In the revised manuscript we will add a new table (e.g., Table 2) that reports, for each of the seven OOD datasets: (i) baseline Dice without the proposed augmentations, (ii) Dice with the augmentations, (iii) absolute improvement, and (iv) relative improvement. We will explicitly state that the 155 % figure is the arithmetic mean of the seven relative improvements and confirm that no OOD dataset was excluded. We will also include the corresponding per-training-dataset in-domain results to document the 0.008 % average degradation. These additions will be placed in the results section and referenced from the abstract, providing full transparency while preserving the original conclusions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on held-out OOD datasets

full rationale

The paper is a purely empirical study that trains segmentation models on single-modality datasets and measures Dice performance on seven independent out-of-distribution test sets (including cross-modality CT/MRI transfer). No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described methods. All reported gains (155 % relative Dice, 0.008 % in-domain drop) are direct measurements on held-out data rather than quantities forced by construction from the inputs. The work therefore contains no self-definitional, fitted-input, or self-citation circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the work relies on standard deep-learning assumptions about data augmentation improving generalization.

pith-pipeline@v0.9.0 · 5601 in / 1110 out tokens · 51090 ms · 2026-05-08T18:36:08.573625+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Medical image analysis86, 102789 (2023)

Billot, B., Greve, D.N., Puonti, O., Thielscher, A., Van Leemput, K., Fischl, B., Dalca, A.V., Iglesias, J.E., et al.: Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining. Medical image analysis86, 102789 (2023)

2023
[2]

MONAI: An open-source framework for deep learning in healthcare

Cardoso, M.J., Li, W., Brown, R., Ma, N., Kerfoot, E., Wang, Y., Murrey, B., Myronenko, A., Zhao, C., Yang, D., et al.: Monai: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701 (2022)

work page internal anchor Pith review arXiv 2022
[3]

Journal of medical imaging and radiation oncology65(5), 545–563 (2021)

Chlap, P., Min, H., Vandenberg, N., Dowling, J., Holloway, L., Haworth, A.: A re- viewofmedicalimagedataaugmentationtechniquesfordeeplearningapplications. Journal of medical imaging and radiation oncology65(5), 545–563 (2021)

2021
[4]

IEEE transactions on Image Processing20(5), 1249–1261 (2010)

Deng, G.: A generalized unsharp masking algorithm. IEEE transactions on Image Processing20(5), 1249–1261 (2010)

2010
[5]

Artificial intelligence review56(11), 12561–12605 (2023)

Goceri, E.: Medical image data augmentation: techniques, comparisons and inter- pretations. Artificial intelligence review56(11), 12561–12605 (2023)

2023
[6]

Scientific Data 11(1), 264 (2024)

van der Graaf, J.W., van Hooff, M.L., Buckens, C.F., Rutten, M., van Susante, J.L., Kroeze, R.J., de Kleuver, M., van Ginneken, B., Lessmann, N.: Lumbar spine segmentation in mr images: a dataset and a public benchmark. Scientific Data 11(1), 264 (2024)

2024
[7]

In: Medical Imaging with Deep Learning (2024)

Graf, R., Möller, H., McGinnis, J., Rühling, S., Weihrauch, M., Atad, M., Shit, S., Mühlau, M., Paetzold, J.C., Rueckert, D., et al.: Modeling the acquisition shift between axial and sagittal mri for diffusion superresolution to enable axial spine segmentation. In: Medical Imaging with Deep Learning (2024)

2024
[8]

European Radiology Experimental7(1), 70 (2023)

Graf, R., Schmitt, J., Schlaeger, S., Möller, H.K., Sideri-Lampretsa, V., Sekuboy- ina, A., Krieg, S.M., Wiestler, B., Menze, B., Rueckert, D., et al.: Denoising diffusion-based mri to ct image translation enables automated spinal segmenta- tion. European Radiology Experimental7(1), 70 (2023)

2023
[9]

European Radiology Experimental9(1), 93 (2025) 10 Molinier and Möller et al

Häntze, H., Xu, L., Rattunde, M.N., Donle, L., Dorfner, F.J., Hering, A., Nawabi, J., Adams, L.C., Bressem, K.K.: Mri annotation using an inversion-based pre- processing for ct model adaptation. European Radiology Experimental9(1), 93 (2025) 10 Molinier and Möller et al

2025
[10]

Nature methods18(2), 203–211 (2021)

Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)

2021
[11]

Journal of imaging9(4), 81 (2023)

Kebaili, A., Lapuyade-Lahorgue, J., Ruan, S.: Deep learning approaches for data augmentation in medical imaging: a review. Journal of imaging9(4), 81 (2023)

2023
[12]

arXiv preprint arXiv:2312.02608 (2023)

Kofler, F., Möller, H., Buchner, J.A., de la Rosa, E., Ezhov, I., Rosier, M., Mekki, I., Shit, S., Negwer, M., Al-Maskari, R., et al.: Panoptica–instance-wise evaluation of 3d semantic and instance segmentation maps. arXiv preprint arXiv:2312.02608 (2023)

work page arXiv 2023
[13]

IEEE transactions on medical imaging 34(10), 1993–2024 (2014)

Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34(10), 1993–2024 (2014)

1993
[14]

IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)

Ouyang, C., Chen, C., Li, S., Li, Z., Qin, C., Bai, W., Rueckert, D.: Causality- inspired single-source domain generalization for medical image segmentation. IEEE Transactions on Medical Imaging42(4), 1095–1106 (2022)

2022
[15]

Informatics in medicine unlocked47, 101504 (2024)

Rayed, M.E., Islam, S.S., Niha, S.I., Jim, J.R., Kabir, M.M., Mridha, M.: Deep learning for medical image segmentation: State-of-the-art advancements and chal- lenges. Informatics in medicine unlocked47, 101504 (2024)

2024
[16]

Saydazimov, J., Ergashev, S., Nosirkulov, A.: Research of some image filter algo- rithmsusedinobjectdetection.In:Proceedingsofthe8thInternationalConference on Future Networks & Distributed Systems. p. 781–785. Association for Computing Machinery,NewYork,NY,USA(2025).https://doi.org/10.1145/3726122.3726236, https://doi.org/10.1145/3726122.3726236

work page doi:10.1145/3726122.3726236 2025
[17]

In: 2023 IEEE Intelligent Vehicles Symposium (IV)

Schwonberg, M., El Bouazati, F., Schmidt, N.M., Gottschalk, H.: Augmentation- based domain generalization for semantic segmentation. In: 2023 IEEE Intelligent Vehicles Symposium (IV). pp. 1–8. IEEE (2023)

2023
[18]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Schwonberg, M., Gottschalk, H.: Domain generalization for semantic segmentation: A survey. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6437–6448 (2025)

2025
[19]

In: International work- shop on simulation and synthesis in medical imaging

Shin, H.C., Tenenholtz, N.A., Rogers, J.K., Schwarz, C.G., Senjem, M.L., Gunter, J.L., Andriole, K.P., Michalski, M.: Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In: International work- shop on simulation and synthesis in medical imaging. pp. 1–11. Springer (2018)

2018
[20]

IEEE transactions on medical imag- ing29(6), 1310–1320 (2010)

Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C.: N4itk: improved n3 bias correction. IEEE transactions on medical imag- ing29(6), 1310–1320 (2010)

2010
[21]

ResearchGate preprint (2025), https://www.researchgate.net/publication/389881289_TotalSpineSeg_Robust_ Spine_Segmentation_with_Landmark-Based_Labeling_in_MRI

Warszawer, Y., Molinier, N., Valosek, J., Benveniste, P.L., Bédard, S., Shirbint, E., Mohamed, F., Tsagkas, C., Kolind, S., Lynd, L., Oh, J., Prat, A., Tam, R., Traboulsee, A., Patten, S., Lee, L.E., Ach- iron, A., Cohen-Adad, J.: Totalspineseg: Robust spine segmentation with landmark-based labeling in mri. ResearchGate preprint (2025), https://www.resear...

work page arXiv 2025
[22]

Computerized Medical Imaging and Graphics p

Xie, Z., Lin, Z., Sun, E., Ding, F., Qi, J., Zhao, S.: Deep learning for automatic ver- tebra analysis: A methodological survey of recent advances. Computerized Medical Imaging and Graphics p. 102652 (2025)

2025
[23]

arXiv preprint arXiv:2007.13003 (2020) 3

Xu, Z., Liu, D., Yang, J., Raffel, C., Niethammer, M.: Robust and general- izable visual representation learning via random convolutions. arXiv preprint arXiv:2007.13003 (2020)

work page arXiv 2007