Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation

Chi-Wing Fu; Lequan Yu; Pheng-Ann Heng; Shujun Wang; Xiaomeng Li

arxiv: 1907.07034 · v1 · pith:HI6JSX2Nnew · submitted 2019-07-16 · 💻 cs.CV

Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation

Lequan Yu , Shujun Wang , Xiaomeng Li , Chi-Wing Fu , Pheng-Ann Heng This is my paper

Pith reviewed 2026-05-24 20:51 UTC · model grok-4.3

classification 💻 cs.CV

keywords semi-supervised learning3D medical segmentationleft atriumuncertainty estimationself-ensemblingconsistency lossMR imagesstudent-teacher model

0 comments

The pith

Uncertainty estimates from a teacher model let the student focus consistency training on reliable targets when using unlabeled 3D MR scans for left atrium segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a student-teacher framework for semi-supervised 3D left atrium segmentation that trains the student to match the teacher's predictions on the same input under different perturbations. It adds an uncertainty-aware filter so the student only learns from the teacher's low-uncertainty outputs on unlabeled data. This setup aims to extract useful signal from unlabeled scans without letting noisy predictions degrade the consistency loss. A sympathetic reader would care because manual labeling of 3D medical volumes is costly, and reliable use of unlabeled data could lower that barrier for cardiac imaging tasks.

Core claim

The framework consists of a student model and a teacher model; the student minimizes a segmentation loss on labeled data plus a consistency loss against the teacher's targets on unlabeled data, with the consistency targets selected or weighted by uncertainty maps produced by the teacher so that only meaningful and reliable predictions guide learning.

What carries the argument

Uncertainty-aware scheme that uses the teacher's uncertainty estimates to identify and emphasize reliable targets inside the consistency loss.

If this is right

Incorporating unlabeled data produces high performance gains over fully supervised baselines.
The method outperforms prior state-of-the-art semi-supervised segmentation approaches on the left atrium task.
The same uncertainty-guided consistency idea can be applied to other semi-supervised medical segmentation problems.
Gradual learning from reliable targets reduces the risk of harmful noise in the consistency objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same filtering logic could be tested on other organs or modalities where annotation cost is high.
If uncertainty maps are noisy early in training, a ramp-up schedule on the uncertainty threshold might be needed for stability.
The approach may extend naturally to multi-organ or whole-heart segmentation once the single-structure case is validated.

Load-bearing premise

Uncertainty estimates from the teacher model correctly mark which of its own predictions are trustworthy enough for the student to learn from.

What would settle it

Running the same student-teacher consistency setup with the uncertainty filter removed or replaced by random weighting and observing no gain or a drop in segmentation accuracy on the test set.

Figures

Figures reproduced from arXiv: 1907.07034 by Chi-Wing Fu, Lequan Yu, Pheng-Ann Heng, Shujun Wang, Xiaomeng Li.

**Figure 1.** Figure 1: Specifically, we update the teacher’s weights [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Visualization of the segmentations by different methods and the uncer [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Training deep convolutional neural networks usually requires a large amount of labeled data. However, it is expensive and time-consuming to annotate data for medical image segmentation tasks. In this paper, we present a novel uncertainty-aware semi-supervised framework for left atrium segmentation from 3D MR images. Our framework can effectively leverage the unlabeled data by encouraging consistent predictions of the same input under different perturbations. Concretely, the framework consists of a student model and a teacher model, and the student model learns from the teacher model by minimizing a segmentation loss and a consistency loss with respect to the targets of the teacher model. We design a novel uncertainty-aware scheme to enable the student model to gradually learn from the meaningful and reliable targets by exploiting the uncertainty information. Experiments show that our method achieves high performance gains by incorporating the unlabeled data. Our method outperforms the state-of-the-art semi-supervised methods, demonstrating the potential of our framework for the challenging semi-supervised problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The uncertainty-aware weighting inside the student-teacher loop is the incremental piece, but the experiments do not isolate whether it actually beats plain mean-teacher consistency on the 3D LA data.

read the letter

The paper's core move is to modulate the consistency loss with the teacher's uncertainty estimate so the student only pulls from low-uncertainty targets on unlabeled 3D MR volumes. That is a concrete, if incremental, change to the standard mean-teacher setup cited in the abstract. They apply it to left atrium segmentation, a task where labeled data really is scarce, and claim it beats prior semi-supervised baselines. The description of the framework is clear enough that a reader could re-implement the weighting scheme without much guesswork. That is the part worth noting: a practical way to down-weight noisy pseudo-labels inside an existing self-ensembling loop. The rest of the method follows the usual student-teacher pattern with perturbations and a combined segmentation-plus-consistency objective. No over-claiming of broad novelty appears in the text. The main gap is the missing ablation. The stress-test concern holds: if the reported Dice and ASD gains disappear when the uncertainty mask is replaced by uniform weighting or random masking, then the central claim rests on the base ensembling rather than the new filter. The abstract gives no numbers, no error bars, and no dataset sizes, so the strength of the outperformance is impossible to judge from the summary. The full paper presumably contains tables, but the absence of even a simple baseline comparison in the method section leaves the novelty claim under-supported. This work is aimed at the medical-image segmentation crowd that already uses mean-teacher or similar consistency methods. Someone already running 3D U-Nets on cardiac MR would get a usable idea to test on their own unlabeled pool. It is not a foundational result and does not change how we think about semi-supervised learning in general. Still, the technique is well-motivated for the clinical setting and the paper is coherent on its own terms, so it deserves a serious referee who can ask for the missing ablation and the full result tables. I would send it to review rather than desk-reject.

Referee Report

3 major / 1 minor

Summary. The paper proposes an uncertainty-aware self-ensembling framework for semi-supervised 3D left atrium segmentation from MR images. It consists of a student-teacher architecture in which the student is trained with a segmentation loss on labeled data and a consistency loss on unlabeled data, where the consistency targets from the teacher are modulated by an uncertainty estimate so that the student learns preferentially from low-uncertainty predictions. The authors claim that incorporating unlabeled data via this scheme yields high performance gains and outperforms prior semi-supervised methods.

Significance. If the uncertainty modulation can be shown to be the source of the gains, the approach would offer a practical way to improve consistency-based semi-supervised segmentation in medical imaging, where labeled data are scarce. The framework is a direct extension of Mean-Teacher self-ensembling and therefore inherits its reproducibility advantages, but the absence of an ablation isolating the uncertainty term leaves the novelty claim unsupported.

major comments (3)

[Abstract] Abstract: the central claim that the method 'achieves high performance gains' and 'outperforms the state-of-the-art semi-supervised methods' is stated without any numerical results, dataset sizes, error bars, or statistical tests, rendering the claim unverifiable from the provided text.
[Method] Method section (uncertainty-aware scheme): the consistency loss is described as being weighted by teacher uncertainty, yet no ablation is reported that compares the full model against an unweighted Mean-Teacher baseline (or against random masking of the consistency term). Without this comparison the reported Dice/ASD improvements cannot be attributed to the uncertainty component rather than to self-ensembling alone.
[Experiments] Experiments: the weakest assumption—that low-uncertainty teacher predictions are reliably trustworthy—is not tested; no calibration plots, uncertainty-quality correlation, or failure-case analysis of the uncertainty estimator is supplied.

minor comments (1)

[Method] Notation for the uncertainty map and the weighting function should be introduced with an explicit equation rather than described only in prose.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment point by point below and will incorporate revisions where they strengthen the work.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'achieves high performance gains' and 'outperforms the state-of-the-art semi-supervised methods' is stated without any numerical results, dataset sizes, error bars, or statistical tests, rendering the claim unverifiable from the provided text.

Authors: We agree that the abstract would benefit from quantitative support. In the revised manuscript we will add concise numerical results (e.g., Dice scores and dataset size) while respecting the abstract length limit. revision: yes
Referee: [Method] Method section (uncertainty-aware scheme): the consistency loss is described as being weighted by teacher uncertainty, yet no ablation is reported that compares the full model against an unweighted Mean-Teacher baseline (or against random masking of the consistency term). Without this comparison the reported Dice/ASD improvements cannot be attributed to the uncertainty component rather than to self-ensembling alone.

Authors: This is a valid observation. Although the original manuscript reports comparisons to Mean-Teacher, it does not contain an explicit ablation isolating the uncertainty weighting. We will add this ablation study in the revision to demonstrate the contribution of the uncertainty-aware term. revision: yes
Referee: [Experiments] Experiments: the weakest assumption—that low-uncertainty teacher predictions are reliably trustworthy—is not tested; no calibration plots, uncertainty-quality correlation, or failure-case analysis of the uncertainty estimator is supplied.

Authors: We acknowledge the need to validate the uncertainty estimator. In the revised version we will include additional analysis, such as uncertainty-error correlation on held-out data, to support the assumption. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with independent experimental validation

full rationale

The paper presents a student-teacher self-ensembling framework augmented by an uncertainty-aware weighting scheme for the consistency loss. All performance claims rest on experimental results (Dice/ASD metrics on the 3D LA dataset) rather than any mathematical derivation that reduces outputs to inputs by construction. No equations are shown that define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing self-citations or imported uniqueness theorems appear in the abstract or method summary. The consistency loss and uncertainty modulation are defined externally to the final evaluation metric, satisfying the criteria for a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Central claim depends on the unstated assumption that the uncertainty map correlates with prediction reliability on unlabeled data; no free parameters or invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5707 in / 1047 out tokens · 16743 ms · 2026-05-24T20:51:39.713606+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/DimensionForcing.lean 8-tick period (2^D=8 for D=3) echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

we set T = 8 to balance the uncertainty estimation quality and training efficiency... ramp up the uncertainty threshold H from 3/4 Umax to Umax
IndisputableMonolith/Cost/FunctionalEquation.lean J(x) = 1/2(x + x^{-1}) - 1 uniqueness unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lc(f',f) = sum_v I(u_v < H) ||f'_v - f_v||^2 / sum I(u_v < H)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

[1]

In: MICCAI

Bai, W., Oktay, O., Sinclair, M.e.a.: Semi-supervised learning for network-based cardiac mr image segmentation. In: MICCAI. pp. 253–260 (2017)

work page 2017
[2]

In: MICCAI

Baur, C., Albarqouni, S., Navab, N.: Semi-supervised deep learning for fully con- volutional networks. In: MICCAI. pp. 311–319 (2017)

work page 2017
[3]

MICCAI pp

Chartsias, A., Joyce, T., Papanastasiou, G., Semple, S., Williams, M., Newby, D., Dharmakumar, R., Tsaftaris, S.A.: Factorised spatial representation learning: ap- plication in semi-supervised myocardial segmentation. MICCAI pp. 490–498 (2018)

work page 2018
[4]

Multi-Task Learning for Left Atrial Segmentation on GE-MRI

Chen, C., Bai, W., Rueckert, D.: Multi-task learning for left atrial segmentation on ge-mri. arXiv preprint arXiv:1810.13205 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

In: IPMI

Cui, W., Liu, Y., Li, Y., Guo, M., Li, Y., Li, X., Wang, T., Zeng, X., Ye, C.: Semi-supervised brain lesion segmentation with an adapted mean teacher model. In: IPMI. pp. 554–565 (2019)

work page 2019
[6]

In: MICCAI

Dong, N., Kampﬀmeyer, M., Liang, X., Wang, Z., Dai, W., Xing, E.: Unsupervised domain adaptation for automatic estimation of cardiothoracic ratio. In: MICCAI. pp. 544–552 (2018)

work page 2018
[7]

In: MICCAI

Ganaye, P.A., Sdika, M., Benoit-Cattin, H.: Semi-supervised learning for segmen- tation under semantic constraint. In: MICCAI. pp. 595–602 (2018)

work page 2018
[8]

Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: NIPS. pp. 5574–5584 (2017)

work page 2017
[9]

arXiv preprint (2016)

Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint (2016)

work page 2016
[10]

BMVC (2018)

Li, X., Yu, L., Chen, H., Fu, C.W., Heng, P.A.: Semi-supervised skin lesion seg- mentation via transformation consistent self-ensembling model. BMVC (2018)

work page 2018
[11]

Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV. pp. 565–571 (2016)

work page 2016
[12]

In: MICCAI

Nie, D., Gao, Y., Wang, L., Shen, D.: Asdnet: Attention based semi-supervised deep networks for medical image segmentation. In: MICCAI. pp. 370–378 (2018)

work page 2018
[13]

In: DLMIA workshop (2018)

Perone, C.S., Cohen-Adad, J.: Deep semi-supervised segmentation with weight- averaged consistency targets. In: DLMIA workshop (2018)

work page 2018
[14]

In: NIPS (2017)

Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: NIPS (2017)

work page 2017
[15]

TMI 38(2), 515–524 (2019)

Xiong, Z., Fedorov, V.V., Fu, X., Cheng, E., Macleod, R., Zhao, J.: Fully automatic left atrium segmentation from late gadolinium enhanced magnetic resonance imag- ing using a dual fully convolutional neural network. TMI 38(2), 515–524 (2019)

work page 2019
[16]

In: International Workshop on STACOM (2017)

Yang, X., Bian, C., Yu, L., Ni, D., Heng, P.A.: Hybrid loss guided convolutional networks for whole heart parsing. In: International Workshop on STACOM (2017)

work page 2017
[17]

In: MICCAI

Yu, L., Cheng, J.Z., Dou, Q., Yang, X., Chen, H., Qin, J., Heng, P.A.: Automatic 3d cardiovascular mr segmentation with densely-connected volumetric convnets. In: MICCAI. pp. 287–295. Springer (2017)

work page 2017
[18]

In: MICCAI

Zhang, Y., Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z.: Deep adversarial networks for biomedical image segmentation utilizing unannotated im- ages. In: MICCAI. pp. 408–416 (2017)

work page 2017
[19]

Semi-Supervised Multi-Organ Segmentation via Deep Multi-Planar Co-Training

Zhou, Y., Wang, Y., Tang, P., Bai, S., Shen, W., Fishman, E.K., Yuille, A.L.: Semi- supervised multi-organ segmentation via multi-planar co-training. arXiv preprint arXiv:1804.02586 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

In: MICCAI

Bai, W., Oktay, O., Sinclair, M.e.a.: Semi-supervised learning for network-based cardiac mr image segmentation. In: MICCAI. pp. 253–260 (2017)

work page 2017

[2] [2]

In: MICCAI

Baur, C., Albarqouni, S., Navab, N.: Semi-supervised deep learning for fully con- volutional networks. In: MICCAI. pp. 311–319 (2017)

work page 2017

[3] [3]

MICCAI pp

Chartsias, A., Joyce, T., Papanastasiou, G., Semple, S., Williams, M., Newby, D., Dharmakumar, R., Tsaftaris, S.A.: Factorised spatial representation learning: ap- plication in semi-supervised myocardial segmentation. MICCAI pp. 490–498 (2018)

work page 2018

[4] [4]

Multi-Task Learning for Left Atrial Segmentation on GE-MRI

Chen, C., Bai, W., Rueckert, D.: Multi-task learning for left atrial segmentation on ge-mri. arXiv preprint arXiv:1810.13205 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

In: IPMI

Cui, W., Liu, Y., Li, Y., Guo, M., Li, Y., Li, X., Wang, T., Zeng, X., Ye, C.: Semi-supervised brain lesion segmentation with an adapted mean teacher model. In: IPMI. pp. 554–565 (2019)

work page 2019

[6] [6]

In: MICCAI

Dong, N., Kampﬀmeyer, M., Liang, X., Wang, Z., Dai, W., Xing, E.: Unsupervised domain adaptation for automatic estimation of cardiothoracic ratio. In: MICCAI. pp. 544–552 (2018)

work page 2018

[7] [7]

In: MICCAI

Ganaye, P.A., Sdika, M., Benoit-Cattin, H.: Semi-supervised learning for segmen- tation under semantic constraint. In: MICCAI. pp. 595–602 (2018)

work page 2018

[8] [8]

Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: NIPS. pp. 5574–5584 (2017)

work page 2017

[9] [9]

arXiv preprint (2016)

Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint (2016)

work page 2016

[10] [10]

BMVC (2018)

Li, X., Yu, L., Chen, H., Fu, C.W., Heng, P.A.: Semi-supervised skin lesion seg- mentation via transformation consistent self-ensembling model. BMVC (2018)

work page 2018

[11] [11]

Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV. pp. 565–571 (2016)

work page 2016

[12] [12]

In: MICCAI

Nie, D., Gao, Y., Wang, L., Shen, D.: Asdnet: Attention based semi-supervised deep networks for medical image segmentation. In: MICCAI. pp. 370–378 (2018)

work page 2018

[13] [13]

In: DLMIA workshop (2018)

Perone, C.S., Cohen-Adad, J.: Deep semi-supervised segmentation with weight- averaged consistency targets. In: DLMIA workshop (2018)

work page 2018

[14] [14]

In: NIPS (2017)

Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: NIPS (2017)

work page 2017

[15] [15]

TMI 38(2), 515–524 (2019)

Xiong, Z., Fedorov, V.V., Fu, X., Cheng, E., Macleod, R., Zhao, J.: Fully automatic left atrium segmentation from late gadolinium enhanced magnetic resonance imag- ing using a dual fully convolutional neural network. TMI 38(2), 515–524 (2019)

work page 2019

[16] [16]

In: International Workshop on STACOM (2017)

Yang, X., Bian, C., Yu, L., Ni, D., Heng, P.A.: Hybrid loss guided convolutional networks for whole heart parsing. In: International Workshop on STACOM (2017)

work page 2017

[17] [17]

In: MICCAI

Yu, L., Cheng, J.Z., Dou, Q., Yang, X., Chen, H., Qin, J., Heng, P.A.: Automatic 3d cardiovascular mr segmentation with densely-connected volumetric convnets. In: MICCAI. pp. 287–295. Springer (2017)

work page 2017

[18] [18]

In: MICCAI

Zhang, Y., Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z.: Deep adversarial networks for biomedical image segmentation utilizing unannotated im- ages. In: MICCAI. pp. 408–416 (2017)

work page 2017

[19] [19]

Semi-Supervised Multi-Organ Segmentation via Deep Multi-Planar Co-Training

Zhou, Y., Wang, Y., Tang, P., Bai, S., Shen, W., Fishman, E.K., Yuille, A.L.: Semi- supervised multi-organ segmentation via multi-planar co-training. arXiv preprint arXiv:1804.02586 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018