Label Dropout: Improved Deep Learning Echocardiography Segmentation Using Multiple Datasets With Domain Shift and Partial Labelling
Pith reviewed 2026-05-24 02:35 UTC · model grok-4.3
The pith
Label dropout prevents models from linking label absence to scanner or operator domain when training echo segmentation on partially labelled datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Training naively with adaptations of cross-entropy on partially labelled multi-domain echo data produces shortcut learning in which the network associates label presence with domain characteristics; a label-dropout regulariser that randomly masks labels during training removes this spurious correlation and raises Dice score by 62 % and 25 % on two cardiac structures.
What carries the argument
Label dropout, a training-time operation that randomly drops available labels from the loss computation so that the model cannot rely on the pattern of which labels are present to infer domain.
If this is right
- Segmentation models can be trained on larger, more heterogeneous collections of echo data without requiring every structure to be labelled in every scan.
- The same training scheme can be applied to any imaging task that combines datasets with overlapping but incomplete label sets.
- Robustness to scanner and operator variation increases because the model is forced to learn appearance-based features rather than label-pattern cues.
Where Pith is reading between the lines
- The approach could be tested on other partially labelled medical imaging problems such as CT or MRI organ segmentation where label sets also differ across sites.
- If label dropout is applied at test time as well it might further reduce sensitivity to missing annotations in clinical deployment.
- The method may interact with other domain-adaptation techniques; combining label dropout with style transfer or adversarial alignment remains unexplored in the paper.
Load-bearing premise
The observed performance drop when using standard partial-label losses on diverse datasets is caused by the model learning to associate label presence with domain characteristics.
What would settle it
A controlled experiment in which label presence is made statistically independent of domain while keeping the same data and loss; if performance remains low, the shortcut-learning explanation is falsified.
Figures
read the original abstract
Echocardiography (echo) is the first imaging modality used when assessing cardiac function. The measurement of functional biomarkers from echo relies upon the segmentation of cardiac structures and deep learning models have been proposed to automate the segmentation process. However, in order to translate these tools to widespread clinical use it is important that the segmentation models are robust to a wide variety of images (e.g. acquired from different scanners, by operators with different levels of expertise etc.). To achieve this level of robustness it is necessary that the models are trained with multiple diverse datasets. A significant challenge faced when training with multiple diverse datasets is the variation in label presence, i.e. the combined data are often partially-labelled. Adaptations of the cross entropy loss function have been proposed to deal with partially labelled data. In this paper we show that training naively with such a loss function and multiple diverse datasets can lead to a form of shortcut learning, where the model associates label presence with domain characteristics, leading to a drop in performance. To address this problem, we propose a novel label dropout scheme to break the link between domain characteristics and the presence or absence of labels. We demonstrate that label dropout improves echo segmentation Dice score by 62% and 25% on two cardiac structures when training using multiple diverse partially labelled datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that naive training of echocardiography segmentation models on multiple diverse partially labelled datasets using adapted cross-entropy losses leads to shortcut learning, where the model associates label presence/absence with domain characteristics and thereby degrades performance. It proposes a label dropout scheme to break this association and reports Dice score gains of 62% and 25% on two cardiac structures.
Significance. If the reported gains are reproducible, the work supplies a practical, low-overhead technique for combining heterogeneous partially labelled echo datasets without performance collapse, directly addressing a barrier to training robust, clinically deployable segmentation models.
minor comments (3)
- [Abstract] The abstract states quantitative improvements without naming the exact baselines, datasets, or statistical controls; move or expand this information into the abstract for immediate verifiability.
- [Methods] Clarify in the methods whether label dropout is applied only during training or also at inference, and provide the precise probability schedule used.
- [Results] Add error bars or statistical significance tests to the Dice comparisons in the results tables/figures to support the magnitude of the reported gains.
Simulated Author's Rebuttal
We thank the referee for their review and for recommending minor revision. The referee's summary accurately captures our contribution regarding label dropout to address shortcut learning when training on multiple diverse partially labelled echocardiography datasets.
Circularity Check
No significant circularity
full rationale
The paper presents an empirical method (label dropout) to address shortcut learning in partial-label multi-dataset training for echocardiography segmentation. The central claims rest on experimental results comparing Dice scores against baselines using standard partial-label losses, with no mathematical derivation chain, fitted parameters renamed as predictions, or load-bearing self-citations. The abstract and described experiments directly test performance effects without reducing any result to its own inputs by construction or definition. This is a standard empirical ML paper whose validation is external to any internal reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Adaptations of the cross entropy loss can be used to train on partially labelled data.
Reference graph
Works this paper leans on
- [1]
-
[2]
npj Digital Medicine 3(1), 10 (Jan 2020)
Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J.H., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Deep learning interpretation of echocardio- grams. npj Digital Medicine 3(1), 10 (Jan 2020). https://doi.org/10.1038/ s41746-019-0216-8, https://www.nature.com/articles/s41746-019-0216-8
work page 2020
-
[3]
org/abs/2208.11870, arXiv:2208.11870 [cs]
Huang, Z., Sidhom, M.J., Wessler, B.S., Hughes, M.C.: Fix-A-Step: Semi- supervised Learning from Uncurated Unlabeled Data (May 2023),http://arxiv. org/abs/2208.11870, arXiv:2208.11870 [cs]
-
[4]
Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmenta- tion. Nature Methods 18(2), 203–211 (Feb 2021).https://doi.org/10.1038/ s41592-020-01008-z, http://www.nature.com/articles/s41592-020-01008-z
work page 2021
-
[5]
Leclerc, S., Smistad, E., Pedrosa, J., Ostvik, A., Cervenansky, F., Espinosa, F., Espeland, T., Berg, E.A.R., Jodoin, P.M., Grenier, T., Lartizien, C., Dhooge, J., Lovstakken, L., Bernard, O.: Deep Learning for Segmentation Using an Open Large-Scale Dataset in 2D Echocardiography. IEEE Transactions on Medical Imag- ing 38(9), 2198–2210 (Sep 2019).https://...
-
[6]
Mariscal-Harana, J., Asher, C., Vergani, V., Rizvi, M., Keehn, L., Kim, R.J., Judd, R.M., Petersen, S.E., Razavi, R., King, A.P., Ruijsink, B., Puyol-Antón, E.: An artificial intelligence tool for automated analysis of large-scale unstructured clin- ical cine cardiac magnetic resonance databases. European Heart Journal - Digi- tal Health 4(5), 370–383 (Oc...
-
[7]
Nature580(7802), 252– 256 (Apr 2020)
Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C.P., Hei- denreich, P.A., Harrington, R.A., Liang, D.H., Ashley, E.A., Zou, J.Y.: Video- based AI for beat-to-beat assessment of cardiac function. Nature580(7802), 252– 256 (Apr 2020). https://doi.org/10.1038/s41586-020-2145-8, https://www. nature.com/articles/s41586-020-2145-8
-
[8]
IEEE Transactions on Medi- cal Imaging41(10), 2867–2878 (Oct 2022).https://doi.org/10.1109/TMI.2022
Painchaud, N., Duchateau, N., Bernard, O., Jodoin, P.M.: Echocardiography Seg- mentation With Enforced Temporal Consistency. IEEE Transactions on Medi- cal Imaging41(10), 2867–2878 (Oct 2022).https://doi.org/10.1109/TMI.2022. 3173669, https://ieeexplore.ieee.org/document/9771186/
-
[9]
Petit, O., Thome, N., Charnoz, A., Hostettler, A., Soler, L.: Handling Miss- ing Annotations for Semantic Segmentation with Deep ConvNets. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, vol. 11045, pp. 20 − − 28. Springer International Publish- ing, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_...
-
[10]
org/abs/2203.11726, arXiv:2203.11726 [physics]
Puyol-Antón, E., Ruijsink, B., Sidhu, B.S., Gould, J., Porter, B., Elliott, M.K., Mehta, V., Gu, H., Xochicale, M., Gomez, A., Rinaldi, C.A., Cowie, M., Chowienczyk, P., Razavi, R., King, A.P.: AI-enabled Assessment of Cardiac Sys- tolic and Diastolic Function from Echocardiography (Jul 2022),http://arxiv. org/abs/2203.11726, arXiv:2203.11726 [physics]
-
[11]
Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation (May 2015),http://arxiv.org/abs/1505.04597, arXiv:1505.04597 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[12]
Medical Image Analysis 70, 101979 (May 2021)
Shi,G.,Xiao,L.,Chen,Y.,Zhou,S.K.:Marginallossandexclusionlossforpartially supervised multi-organ segmentation. Medical Image Analysis 70, 101979 (May 2021). https://doi.org/10.1016/j.media.2021.101979, https://linkinghub. elsevier.com/retrieve/pii/S1361841521000256
-
[13]
The Lancet Digital Health4(1), e46– e54 (Jan 2022)
Tromp, J., Seekings, P.J., Hung, C.L., Iversen, M.B., Frost, M.J., Ouwerkerk, W., Jiang, Z., Eisenhaber, F., Goh, R.S.M., Zhao, H., Huang, W., Ling, L.H., Sim, D., Cozzone, P., Richards, A.M., Lee, H.K., Solomon, S.D., Lam, C.S.P., Ezekowitz, J.A.: Automated interpretation of systolic and diastolic function on the echocardiogram: a multicohort study. The ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.