Label Dropout: Improved Deep Learning Echocardiography Segmentation Using Multiple Datasets With Domain Shift and Partial Labelling

Andrew J. Reader (1); Andrew P. King (1) ((1) King's College London); Bram Ruijsink (1); Esther Puyol-Ant\'on (1); Iman Islam (1)

arxiv: 2403.07818 · v2 · submitted 2024-03-12 · 💻 cs.CV · cs.AI· cs.LG

Label Dropout: Improved Deep Learning Echocardiography Segmentation Using Multiple Datasets With Domain Shift and Partial Labelling

Iman Islam (1) , Esther Puyol-Ant\'on (1) , Bram Ruijsink (1) , Andrew J. Reader (1) , Andrew P. King (1) ((1) King's College London) This is my paper

Pith reviewed 2026-05-24 02:35 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords echocardiography segmentationpartial labellinglabel dropoutdomain shiftshortcut learningmulti-dataset trainingdeep learningcardiac imaging

0 comments

The pith

Label dropout prevents models from linking label absence to scanner or operator domain when training echo segmentation on partially labelled datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that training segmentation networks on multiple echo datasets with differing label sets causes the model to use the presence or absence of a label as a cue for which domain the image came from, hurting performance on structures that are labelled in only some datasets. To break this association the authors introduce label dropout, which randomly omits labels from the loss even when they are present. Experiments on two cardiac structures demonstrate large Dice gains when the same multi-dataset collection is trained with the new scheme rather than standard partial-label losses.

Core claim

Training naively with adaptations of cross-entropy on partially labelled multi-domain echo data produces shortcut learning in which the network associates label presence with domain characteristics; a label-dropout regulariser that randomly masks labels during training removes this spurious correlation and raises Dice score by 62 % and 25 % on two cardiac structures.

What carries the argument

Label dropout, a training-time operation that randomly drops available labels from the loss computation so that the model cannot rely on the pattern of which labels are present to infer domain.

If this is right

Segmentation models can be trained on larger, more heterogeneous collections of echo data without requiring every structure to be labelled in every scan.
The same training scheme can be applied to any imaging task that combines datasets with overlapping but incomplete label sets.
Robustness to scanner and operator variation increases because the model is forced to learn appearance-based features rather than label-pattern cues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on other partially labelled medical imaging problems such as CT or MRI organ segmentation where label sets also differ across sites.
If label dropout is applied at test time as well it might further reduce sensitivity to missing annotations in clinical deployment.
The method may interact with other domain-adaptation techniques; combining label dropout with style transfer or adversarial alignment remains unexplored in the paper.

Load-bearing premise

The observed performance drop when using standard partial-label losses on diverse datasets is caused by the model learning to associate label presence with domain characteristics.

What would settle it

A controlled experiment in which label presence is made statistically independent of domain while keeping the same data and loss; if performance remains low, the shortcut-learning explanation is falsified.

Figures

Figures reproduced from arXiv: 2403.07818 by Andrew J. Reader (1), Andrew P. King (1) ((1) King's College London), Bram Ruijsink (1), Esther Puyol-Ant\'on (1), Iman Islam (1).

**Figure 1.** Figure 1: Experiment 1: Test Dice scores achieved by training and evaluating intradomain and cross-domain LV segmentation models using three different datasets. C = CAMUS [5], UI = Unity Imaging [3], END = EchoNet Dynamic [7]. Experiment 2 - Training using a combination of three diverse partially labelled echo datasets: The purpose of this experiment was to illustrate the problem of using the standard loss model wh… view at source ↗

**Figure 2.** Figure 2: Experiment 2: Example test results from the three datasets. From left to right: image, ground truth segmentation and model predictions using standard loss model, adaptive loss without augmentation and adaptive loss with augmentation. Experiment 3 - Investigating the adaptive loss in a controlled experiment: The purpose of this experiment was to further investigate our hypothesis that domain shift has led … view at source ↗

**Figure 3.** Figure 3: Experiment 3: Test set results when training using only the CAMUS dataset with 50% of LVM labels removed. Box plots show Dice coefficients for each segmented structure and the overall mean. Green = benchmark, blue = standard, pink = adaptive. This experiment shows the viability of the adaptive loss as a solution to the problem of partial labelling. It also supports our hypothesis that the lack of improveme… view at source ↗

**Figure 4.** Figure 4: shows the test set results of this experiment. The plot clearly shows the improved performance when using the label dropout technique which persists across a range of dropout probabilities. Note that the 0% label dropout model is equivalent to the adaptive loss model without label dropout. The benchmark model and the 0% label dropout models achieved Dice scores of 0.83 and 0.71 respectively. The Dice score… view at source ↗

**Figure 5.** Figure 5: Repetition of Experiment 2 with label dropout. Randomly selected test set results when training with all 3 datasets using label dropout (LD). From left to right: image, ground truth segmentation and model predictions using adaptive loss with augmentation and adaptive loss with augmentation and label dropout. images from EchoNet Dynamic test set and some key results are shown in [PITH_FULL_IMAGE:figures/f… view at source ↗

read the original abstract

Echocardiography (echo) is the first imaging modality used when assessing cardiac function. The measurement of functional biomarkers from echo relies upon the segmentation of cardiac structures and deep learning models have been proposed to automate the segmentation process. However, in order to translate these tools to widespread clinical use it is important that the segmentation models are robust to a wide variety of images (e.g. acquired from different scanners, by operators with different levels of expertise etc.). To achieve this level of robustness it is necessary that the models are trained with multiple diverse datasets. A significant challenge faced when training with multiple diverse datasets is the variation in label presence, i.e. the combined data are often partially-labelled. Adaptations of the cross entropy loss function have been proposed to deal with partially labelled data. In this paper we show that training naively with such a loss function and multiple diverse datasets can lead to a form of shortcut learning, where the model associates label presence with domain characteristics, leading to a drop in performance. To address this problem, we propose a novel label dropout scheme to break the link between domain characteristics and the presence or absence of labels. We demonstrate that label dropout improves echo segmentation Dice score by 62% and 25% on two cardiac structures when training using multiple diverse partially labelled datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Label dropout fixes a domain-label shortcut in multi-dataset partial-label echo segmentation and delivers large reported Dice gains.

read the letter

The main point is that training on multiple echo datasets with missing labels lets the model learn to treat label absence as a domain cue, hurting segmentation. Their label dropout randomizes which labels are dropped per sample to break that link, and they report 62% and 25% Dice lifts on two structures versus naive partial-label training. The experiments include direct ablations against standard cross-entropy adaptations, which isolates the effect reasonably well. The method itself is simple and does not require new loss terms or architectural changes. The paper is clear on the problem setup and shows the performance drop happens before the fix is applied. One soft spot is the size of the gains; I would want to confirm the baselines received equal hyperparameter effort and that the test sets are not inadvertently easier for the dropout model. A second minor point is whether the benefit scales when the number of datasets grows or when domain shifts are more extreme than the ones tested. The derivation is straightforward with no circular definitions or unfalsifiable claims. Citations follow the usual pattern for cardiac segmentation work. This is aimed at groups that already combine public echo datasets and hit the partial-label wall. A reader who needs a practical knob to turn on multi-source training would find it useful. The work is coherent enough on its own terms to merit peer review rather than a desk reject.

Referee Report

0 major / 3 minor

Summary. The paper claims that naive training of echocardiography segmentation models on multiple diverse partially labelled datasets using adapted cross-entropy losses leads to shortcut learning, where the model associates label presence/absence with domain characteristics and thereby degrades performance. It proposes a label dropout scheme to break this association and reports Dice score gains of 62% and 25% on two cardiac structures.

Significance. If the reported gains are reproducible, the work supplies a practical, low-overhead technique for combining heterogeneous partially labelled echo datasets without performance collapse, directly addressing a barrier to training robust, clinically deployable segmentation models.

minor comments (3)

[Abstract] The abstract states quantitative improvements without naming the exact baselines, datasets, or statistical controls; move or expand this information into the abstract for immediate verifiability.
[Methods] Clarify in the methods whether label dropout is applied only during training or also at inference, and provide the precise probability schedule used.
[Results] Add error bars or statistical significance tests to the Dice comparisons in the results tables/figures to support the magnitude of the reported gains.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and for recommending minor revision. The referee's summary accurately captures our contribution regarding label dropout to address shortcut learning when training on multiple diverse partially labelled echocardiography datasets.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical method (label dropout) to address shortcut learning in partial-label multi-dataset training for echocardiography segmentation. The central claims rest on experimental results comparing Dice scores against baselines using standard partial-label losses, with no mathematical derivation chain, fitted parameters renamed as predictions, or load-bearing self-citations. The abstract and described experiments directly test performance effects without reducing any result to its own inputs by construction or definition. This is a standard empirical ML paper whose validation is external to any internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the work relies on standard machine learning assumptions for segmentation with partial labels but introduces no explicit free parameters or invented entities. Full details on any hyperparameters or modelling choices are unavailable without the manuscript.

axioms (1)

domain assumption Adaptations of the cross entropy loss can be used to train on partially labelled data.
Stated in the abstract as the baseline approach that leads to the identified problem.

pith-pipeline@v0.9.0 · 5803 in / 1129 out tokens · 28212 ms · 2026-05-24T02:35:47.050932+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 1 internal anchor

[1]

Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin- Unet: Unet-like Pure Transformer for Medical Image Segmentation (May 2021), http://arxiv.org/abs/2105.05537, arXiv:2105.05537 [cs, eess]

work page arXiv 2021
[2]

npj Digital Medicine 3(1), 10 (Jan 2020)

Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J.H., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Deep learning interpretation of echocardio- grams. npj Digital Medicine 3(1), 10 (Jan 2020). https://doi.org/10.1038/ s41746-019-0216-8, https://www.nature.com/articles/s41746-019-0216-8

work page 2020
[3]

org/abs/2208.11870, arXiv:2208.11870 [cs]

Huang, Z., Sidhom, M.J., Wessler, B.S., Hughes, M.C.: Fix-A-Step: Semi- supervised Learning from Uncurated Unlabeled Data (May 2023),http://arxiv. org/abs/2208.11870, arXiv:2208.11870 [cs]

work page arXiv 2023
[4]

Nature Methods 18(2), 203–211 (Feb 2021).https://doi.org/10.1038/ s41592-020-01008-z, http://www.nature.com/articles/s41592-020-01008-z

Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmenta- tion. Nature Methods 18(2), 203–211 (Feb 2021).https://doi.org/10.1038/ s41592-020-01008-z, http://www.nature.com/articles/s41592-020-01008-z

work page 2021
[5]

IEEE Transactions on Medical Imag- ing 38(9), 2198–2210 (Sep 2019).https://doi.org/10.1109/TMI.2019.2900516, https://ieeexplore.ieee.org/document/8649738/

Leclerc, S., Smistad, E., Pedrosa, J., Ostvik, A., Cervenansky, F., Espinosa, F., Espeland, T., Berg, E.A.R., Jodoin, P.M., Grenier, T., Lartizien, C., Dhooge, J., Lovstakken, L., Bernard, O.: Deep Learning for Segmentation Using an Open Large-Scale Dataset in 2D Echocardiography. IEEE Transactions on Medical Imag- ing 38(9), 2198–2210 (Sep 2019).https://...

work page doi:10.1109/tmi.2019.2900516 2019
[6]

European Heart Journal - Digi- tal Health 4(5), 370–383 (Oct 2023).https://doi.org/10.1093/ehjdh/ztad044, https://academic.oup.com/ehjdh/article/4/5/370/7223886 10 I

Mariscal-Harana, J., Asher, C., Vergani, V., Rizvi, M., Keehn, L., Kim, R.J., Judd, R.M., Petersen, S.E., Razavi, R., King, A.P., Ruijsink, B., Puyol-Antón, E.: An artificial intelligence tool for automated analysis of large-scale unstructured clin- ical cine cardiac magnetic resonance databases. European Heart Journal - Digi- tal Health 4(5), 370–383 (Oc...

work page doi:10.1093/ehjdh/ztad044 2023
[7]

Nature580(7802), 252– 256 (Apr 2020)

Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C.P., Hei- denreich, P.A., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Video- based AI for beat-to-beat assessment of cardiac function. Nature580(7802), 252– 256 (Apr 2020). https://doi.org/10.1038/s41586-020-2145-8, https://www. nature.com/articles/s41586-020-2145-8

work page doi:10.1038/s41586-020-2145-8 2020
[8]

IEEE Transactions on Medi- cal Imaging41(10), 2867–2878 (Oct 2022).https://doi.org/10.1109/TMI.2022

Painchaud, N., Duchateau, N., Bernard, O., Jodoin, P.M.: Echocardiography Seg- mentation With Enforced Temporal Consistency. IEEE Transactions on Medi- cal Imaging41(10), 2867–2878 (Oct 2022).https://doi.org/10.1109/TMI.2022. 3173669, https://ieeexplore.ieee.org/document/9771186/

work page doi:10.1109/tmi.2022 2022
[9]

In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, vol

Petit, O., Thome, N., Charnoz, A., Hostettler, A., Soler, L.: Handling Miss- ing Annotations for Semantic Segmentation with Deep ConvNets. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, vol. 11045, pp. 20 − − 28. Springer International Publish- ing, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_...

work page doi:10.1007/978-3-030-00889-5_3 2018
[10]

org/abs/2203.11726, arXiv:2203.11726 [physics]

Puyol-Antón, E., Ruijsink, B., Sidhu, B.S., Gould, J., Porter, B., Elliott, M.K., Mehta, V., Gu, H., Xochicale, M., Gomez, A., Rinaldi, C.A., Cowie, M., Chowienczyk, P., Razavi, R., King, A.P.: AI-enabled Assessment of Cardiac Sys- tolic and Diastolic Function from Echocardiography (Jul 2022),http://arxiv. org/abs/2203.11726, arXiv:2203.11726 [physics]

work page arXiv 2022
[11]

Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation (May 2015),http://arxiv.org/abs/1505.04597, arXiv:1505.04597 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Medical Image Analysis 70, 101979 (May 2021)

Shi,G.,Xiao,L.,Chen,Y.,Zhou,S.K.:Marginallossandexclusionlossforpartially supervised multi-organ segmentation. Medical Image Analysis 70, 101979 (May 2021). https://doi.org/10.1016/j.media.2021.101979, https://linkinghub. elsevier.com/retrieve/pii/S1361841521000256

work page doi:10.1016/j.media.2021.101979 2021
[13]

The Lancet Digital Health4(1), e46– e54 (Jan 2022)

Tromp, J., Seekings, P.J., Hung, C.L., Iversen, M.B., Frost, M.J., Ouwerkerk, W., Jiang, Z., Eisenhaber, F., Goh, R.S.M., Zhao, H., Huang, W., Ling, L.H., Sim, D., Cozzone, P., Richards, A.M., Lee, H.K., Solomon, S.D., Lam, C.S.P., Ezekowitz, J.A.: Automated interpretation of systolic and diastolic function on the echocardiogram: a multicohort study. The ...

work page doi:10.1016/s2589-7500(21)00235-1 2022

[1] [1]

Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin- Unet: Unet-like Pure Transformer for Medical Image Segmentation (May 2021), http://arxiv.org/abs/2105.05537, arXiv:2105.05537 [cs, eess]

work page arXiv 2021

[2] [2]

npj Digital Medicine 3(1), 10 (Jan 2020)

Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J.H., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Deep learning interpretation of echocardio- grams. npj Digital Medicine 3(1), 10 (Jan 2020). https://doi.org/10.1038/ s41746-019-0216-8, https://www.nature.com/articles/s41746-019-0216-8

work page 2020

[3] [3]

org/abs/2208.11870, arXiv:2208.11870 [cs]

Huang, Z., Sidhom, M.J., Wessler, B.S., Hughes, M.C.: Fix-A-Step: Semi- supervised Learning from Uncurated Unlabeled Data (May 2023),http://arxiv. org/abs/2208.11870, arXiv:2208.11870 [cs]

work page arXiv 2023

[4] [4]

Nature Methods 18(2), 203–211 (Feb 2021).https://doi.org/10.1038/ s41592-020-01008-z, http://www.nature.com/articles/s41592-020-01008-z

Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmenta- tion. Nature Methods 18(2), 203–211 (Feb 2021).https://doi.org/10.1038/ s41592-020-01008-z, http://www.nature.com/articles/s41592-020-01008-z

work page 2021

[5] [5]

IEEE Transactions on Medical Imag- ing 38(9), 2198–2210 (Sep 2019).https://doi.org/10.1109/TMI.2019.2900516, https://ieeexplore.ieee.org/document/8649738/

Leclerc, S., Smistad, E., Pedrosa, J., Ostvik, A., Cervenansky, F., Espinosa, F., Espeland, T., Berg, E.A.R., Jodoin, P.M., Grenier, T., Lartizien, C., Dhooge, J., Lovstakken, L., Bernard, O.: Deep Learning for Segmentation Using an Open Large-Scale Dataset in 2D Echocardiography. IEEE Transactions on Medical Imag- ing 38(9), 2198–2210 (Sep 2019).https://...

work page doi:10.1109/tmi.2019.2900516 2019

[6] [6]

European Heart Journal - Digi- tal Health 4(5), 370–383 (Oct 2023).https://doi.org/10.1093/ehjdh/ztad044, https://academic.oup.com/ehjdh/article/4/5/370/7223886 10 I

Mariscal-Harana, J., Asher, C., Vergani, V., Rizvi, M., Keehn, L., Kim, R.J., Judd, R.M., Petersen, S.E., Razavi, R., King, A.P., Ruijsink, B., Puyol-Antón, E.: An artificial intelligence tool for automated analysis of large-scale unstructured clin- ical cine cardiac magnetic resonance databases. European Heart Journal - Digi- tal Health 4(5), 370–383 (Oc...

work page doi:10.1093/ehjdh/ztad044 2023

[7] [7]

Nature580(7802), 252– 256 (Apr 2020)

Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C.P., Hei- denreich, P.A., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Video- based AI for beat-to-beat assessment of cardiac function. Nature580(7802), 252– 256 (Apr 2020). https://doi.org/10.1038/s41586-020-2145-8, https://www. nature.com/articles/s41586-020-2145-8

work page doi:10.1038/s41586-020-2145-8 2020

[8] [8]

IEEE Transactions on Medi- cal Imaging41(10), 2867–2878 (Oct 2022).https://doi.org/10.1109/TMI.2022

Painchaud, N., Duchateau, N., Bernard, O., Jodoin, P.M.: Echocardiography Seg- mentation With Enforced Temporal Consistency. IEEE Transactions on Medi- cal Imaging41(10), 2867–2878 (Oct 2022).https://doi.org/10.1109/TMI.2022. 3173669, https://ieeexplore.ieee.org/document/9771186/

work page doi:10.1109/tmi.2022 2022

[9] [9]

In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, vol

Petit, O., Thome, N., Charnoz, A., Hostettler, A., Soler, L.: Handling Miss- ing Annotations for Semantic Segmentation with Deep ConvNets. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, vol. 11045, pp. 20 − − 28. Springer International Publish- ing, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_...

work page doi:10.1007/978-3-030-00889-5_3 2018

[10] [10]

org/abs/2203.11726, arXiv:2203.11726 [physics]

Puyol-Antón, E., Ruijsink, B., Sidhu, B.S., Gould, J., Porter, B., Elliott, M.K., Mehta, V., Gu, H., Xochicale, M., Gomez, A., Rinaldi, C.A., Cowie, M., Chowienczyk, P., Razavi, R., King, A.P.: AI-enabled Assessment of Cardiac Sys- tolic and Diastolic Function from Echocardiography (Jul 2022),http://arxiv. org/abs/2203.11726, arXiv:2203.11726 [physics]

work page arXiv 2022

[11] [11]

Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation (May 2015),http://arxiv.org/abs/1505.04597, arXiv:1505.04597 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

Medical Image Analysis 70, 101979 (May 2021)

Shi,G.,Xiao,L.,Chen,Y.,Zhou,S.K.:Marginallossandexclusionlossforpartially supervised multi-organ segmentation. Medical Image Analysis 70, 101979 (May 2021). https://doi.org/10.1016/j.media.2021.101979, https://linkinghub. elsevier.com/retrieve/pii/S1361841521000256

work page doi:10.1016/j.media.2021.101979 2021

[13] [13]

The Lancet Digital Health4(1), e46– e54 (Jan 2022)

Tromp, J., Seekings, P.J., Hung, C.L., Iversen, M.B., Frost, M.J., Ouwerkerk, W., Jiang, Z., Eisenhaber, F., Goh, R.S.M., Zhao, H., Huang, W., Ling, L.H., Sim, D., Cozzone, P., Richards, A.M., Lee, H.K., Solomon, S.D., Lam, C.S.P., Ezekowitz, J.A.: Automated interpretation of systolic and diastolic function on the echocardiogram: a multicohort study. The ...

work page doi:10.1016/s2589-7500(21)00235-1 2022