Dual Agreement Consistency Learning for Semi-Supervised Fetal Ultrasound Segmentation
Pith reviewed 2026-06-25 20:44 UTC · model grok-4.3
The pith
DACL trains a lightweight CNN and a Transformer together on fetal ultrasound images by enforcing agreement on both pixel probabilities and prediction confidence to improve segmentation when labels are scarce.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DACL jointly trains a deployment-oriented lightweight convolutional network and a Transformer-based network, leveraging labeled data for supervised learning and unlabeled data via CPS, with a dual-agreement consistency loss that couples pixel-wise probabilistic divergence with entropy-guided confidence alignment to suppress unreliable pseudo-labels and enable stable cross-architecture pseudo-label learning, plus an interpolation-based consistency strategy using mixup, yielding Dice gains up to 2.77 percent and HD95 reductions up to 14.69 mm under 5 percent labeled data on fetal head and abdomen datasets.
What carries the argument
Dual-agreement consistency loss that couples pixel-wise probabilistic divergence with entropy-guided confidence alignment between the CNN and Transformer outputs.
If this is right
- Cross-architecture pseudo-labeling becomes more stable when both distribution and uncertainty are explicitly aligned.
- Boundary accuracy in fetal structures improves measurably even with 5 percent labeled scans.
- A 1.47-million-parameter CNN can be trained to competitive performance when guided by a larger Transformer through the dual-agreement mechanism.
- Mixup interpolation on unlabeled samples further increases robustness to variations in ultrasound appearance.
Where Pith is reading between the lines
- The same dual-agreement idea could be tested on other scarce-label ultrasound tasks such as placental or cardiac segmentation without changing the core loss structure.
- Because one model is kept deliberately small, the approach may allow on-device inference in clinical settings once the larger model is used only during training.
- The reported gains suggest that uncertainty alignment may be more important than pure prediction agreement when architectures differ substantially in capacity.
Load-bearing premise
That explicitly coupling pixel-wise probabilistic divergence with entropy-guided confidence alignment will suppress unreliable pseudo-labels and enable stable cross-architecture pseudo-label learning under extreme annotation scarcity.
What would settle it
An ablation on the fetal head and abdomen test sets that removes the entropy-guided confidence alignment term while keeping all other components fixed and measures whether the reported Dice and HD95 gains disappear.
Figures
read the original abstract
Maternal-fetal US is the primary imaging modality for monitoring fetal development, yet accurate automated segmentation remains challenging due to the scarcity of pixel-level annotations. To address this issue, we propose DACL, a semi-supervised framework for robust fetal US image segmentation. DACL jointly trains a deployment-oriented lightweight convolutional network (1.47\thinsp\mathrm{M} parameters) and a Transformer-based network, leveraging labeled data for supervised learning and unlabeled data via CPS. To enhance prediction stability, we introduce a dual-agreement consistency loss that couples pixel-wise probabilistic divergence with entropy-guided confidence alignment. Unlike conventional CPS methods that enforce agreement only at the prediction level, DACL explicitly regularizes both distributional alignment and uncertainty, thereby suppressing unreliable pseudo-labels and enabling stable cross-architecture pseudo-label learning under extreme annotation scarcity. Furthermore, an interpolation-based consistency strategy using mixup is applied to unlabeled samples to enhance robustness. Under 5% labeled data, DACL improves Dice by up to 2.77% and reduces HD95 by up to 14.69 mm compared with the strongest recent semi-supervised methods, demonstrating significant improvements in boundary accuracy on both fetal head and abdomen datasets. These results demonstrate the effectiveness of agreement-based consistency learning for annotation-efficient fetal US segmentation. Our code is on GitHub.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes DACL, a semi-supervised segmentation framework for fetal ultrasound images that jointly trains a lightweight CNN (1.47M parameters) and a Transformer via cross pseudo supervision (CPS) on labeled and unlabeled data. It introduces a dual-agreement consistency loss coupling pixel-wise probabilistic divergence with entropy-guided confidence alignment, plus mixup-based interpolation on unlabeled samples, to suppress unreliable pseudo-labels under extreme label scarcity. The central empirical claim is that, at 5% labeled data, DACL yields Dice gains of up to 2.77% and HD95 reductions of up to 14.69 mm versus recent semi-supervised baselines on fetal head and abdomen datasets.
Significance. If the reported gains are reproducible and the dual-agreement mechanism is shown to be causal, the work would offer a practical advance for annotation-efficient medical image segmentation, particularly in cross-architecture settings. The public GitHub code release is a clear strength that supports reproducibility.
major comments (3)
- [§4] §4 (Experiments): The manuscript reports numerical improvements but provides no ablation isolating the dual-agreement consistency loss (distributional alignment + entropy term) from the mixup strategy or the cross-architecture CPS baseline. Without this, the causal contribution of the entropy-guided component to pseudo-label suppression cannot be verified, which is load-bearing for the central claim.
- [§4] §4 (Experiments): No statistical tests, standard deviations across runs, or multiple random seeds are reported for the Dice and HD95 metrics. This undermines confidence in the claimed gains of +2.77% Dice and -14.69 mm HD95 at 5% labels.
- [§3] §3 (Method): The description of the dual-agreement consistency loss does not include an explicit equation or pseudocode showing how the entropy term modulates the divergence to down-weight unreliable pixels; the mechanism remains at the level of prose.
minor comments (2)
- [Abstract] The abstract and introduction should explicitly name the fetal head and abdomen datasets (e.g., sizes, sources, splits) rather than referring to them generically.
- Figure captions and tables should report the exact percentage of labeled data and the number of runs for all compared methods to allow direct comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major point below and will incorporate revisions to improve clarity and rigor.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The manuscript reports numerical improvements but provides no ablation isolating the dual-agreement consistency loss (distributional alignment + entropy term) from the mixup strategy or the cross-architecture CPS baseline. Without this, the causal contribution of the entropy-guided component to pseudo-label suppression cannot be verified, which is load-bearing for the central claim.
Authors: We agree that an ablation isolating the dual-agreement consistency loss (including the entropy term) from mixup and the CPS baseline is necessary to establish causality. In the revised manuscript, we will add targeted ablation experiments that systematically disable the entropy-guided alignment and mixup components while retaining the cross-architecture CPS setup, reporting their individual contributions to Dice and HD95 under 5% labeled data. revision: yes
-
Referee: [§4] §4 (Experiments): No statistical tests, standard deviations across runs, or multiple random seeds are reported for the Dice and HD95 metrics. This undermines confidence in the claimed gains of +2.77% Dice and -14.69 mm HD95 at 5% labels.
Authors: We acknowledge that the current version lacks statistical reporting. We will rerun all experiments across multiple random seeds (at least 3), report mean and standard deviation for Dice and HD95, and include statistical significance tests (e.g., paired t-tests) comparing DACL against baselines to substantiate the reported gains. revision: yes
-
Referee: [§3] §3 (Method): The description of the dual-agreement consistency loss does not include an explicit equation or pseudocode showing how the entropy term modulates the divergence to down-weight unreliable pixels; the mechanism remains at the level of prose.
Authors: We will revise Section 3 to include an explicit mathematical equation for the dual-agreement consistency loss, along with pseudocode, that formally defines how the entropy term modulates the pixel-wise probabilistic divergence to down-weight unreliable predictions. revision: yes
Circularity Check
No circularity; empirical proposal with no derivation chain or self-referential reductions
full rationale
The paper introduces DACL as a semi-supervised segmentation framework and evaluates it empirically on fetal US datasets under low-label regimes, reporting Dice and HD95 gains versus prior methods. No equations, derivations, or mathematical claims are present in the abstract or description that reduce any result to a fitted input or self-citation by construction. The dual-agreement loss is presented as a proposed mechanism whose value is assessed via experiments, not derived from prior self-work or definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes appear. This matches the default case of a self-contained empirical contribution.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard semi-supervised consistency assumptions such as that agreement between models on unlabeled data improves generalization
invented entities (1)
-
Dual-agreement consistency loss
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-Unet: Unet-Likepuretransformerformedicalimagesegmentation.In:ECCVWorkshops. pp. 205–218 (2022)
2022
-
[2]
In: CVPR
Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR. pp. 2613–2622 (June 2021)
2021
-
[3]
Espinoza, J., Good, S., Russell, E., Lee, W.: Does the use of automated fetal biometry improve clinical work flow efficiency? Journal of Ultrasound in Medicine 32(5), 847–850 (2013).https://doi.org/10.7863/jum.2013.32.5.847 10 F. Wang et al
-
[4]
PLOS ONE13(8), 1–20 (08 2018)
van den Heuvel, T.L.A., de Bruijn, D., de Korte, C.L., van Ginneken, B.: Auto- mated measurement of fetal head circumference using 2d ultrasound images. PLOS ONE13(8), 1–20 (08 2018)
2018
-
[5]
In: MICCAI
Jiang, J., Wang, H., Bai, J., Long, S., Chen, S., Campello, V.M., Lekadir, K.: In- trapartum ultrasound image segmentation of pubic symphysis and fetal head using dual student-teacher framework with cnn-vit collaborative learning. In: MICCAI. pp. 448–458 (2024)
2024
-
[6]
In: International Encyclopedia of Statis- tical Science, pp
Joyce, J.M.: Kullback-Leibler divergence. In: International Encyclopedia of Statis- tical Science, pp. 720–722. Springer Berlin Heidelberg (2011)
2011
-
[7]
Neurocomputing579,127443 (2024).https://doi.org/10.1016/j.neucom.2024
Li, J., Gao, Z., Wang, C., Pu, B., Li, K.: A rule-guided interpretable lightweight frameworkforfetalstandardultrasoundplanecaptureandbiometricmeasurement. Neurocomputing621,129290 (2025).https://doi.org/10.1016/j.neucom.2024. 129290
-
[8]
In: MIDL
Luo, X., Hu, M., Song, T., Wang, G., Zhang, S.: Semi-supervised medical image segmentation via cross teaching between CNN and transformer. In: MIDL. pp. 820–833 (2022)
2022
-
[9]
IEEE Journal of Biomedical and Health Informatics pp
Lyu, C., Han, K., Liu, L., Chen, J., Ma, L., Pang, Z., Liu, Z.: Bidirectional prototype-guided consistency constraint for semi-supervised fetal ultrasound im- age segmentation. IEEE Journal of Biomedical and Health Informatics pp. 1–13 (2025)
2025
-
[10]
Knowledge-Based Systems300, 112203 (2024)
Ma, C., Wang, Z.: Semi-Mamba-UNet: Pixel-level contrastive and cross-supervised visual mamba-based UNet for semi-supervised medical image segmentation. Knowledge-Based Systems300, 112203 (2024)
2024
-
[11]
Obstetrics & Gynecology92(6) (1998)
Mongelli, M., Ek, S., Tambyrajia, R.: Screening for fetal growth restriction: A mathematical model of the effect of time interval and ultrasound error. Obstetrics & Gynecology92(6) (1998)
1998
-
[12]
The Lancet384(9946), 869–879 (2014)
Papageorghiou, A.T., Ohuma, E.O., Altman, D.G., Todros, T., Ismail, L.C., L., A., Jaffer, Y.A., Bertino, E., Gravett, M.G., Purwar, M., Noble, J.A., Pang, R., Victora, C.G., Barros, F.C., Carvalho, M., Salomon, L.J., Bhutta, Z.A., Kennedy, S.H., Villar, J.: International standards for fetal growth based on serial ultrasound measurements: the fetal growth ...
2014
-
[13]
In: ECCV
Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi- supervised image recognition. In: ECCV. pp. 135–152 (September 2018)
2018
-
[14]
Ultrasound in Obstetrics & Gynecology62(2), 185–194 (2023).https://doi.org/10.1002/uog.26130
Ramirez Zegarra, R., Ghi, T.: Use of artificial intelligence and deep learning in fetal ultrasound imaging. Ultrasound in Obstetrics & Gynecology62(2), 185–194 (2023).https://doi.org/10.1002/uog.26130
-
[15]
Ultrasound in Obstetrics & Gynecology 37, 116–126
Salomon, L.J., Alfirevic, Z., Berghella, V., Bilardo, C., Hernandez-Andrade, E., Johnsen, S.L., Kalache, K., Leung, K.Y., Malinger, G., Munoz, H., Prefumo, F., Toi, A., Lee, W., on behalf of the ISUOG Clinical Standards Commit- tee: Practice guidelines for performance of the routine mid-trimester fetal ul- trasound scan. Ultrasound in Obstetrics & Gynecol...
-
[16]
Medical Image Analysis105, 103640 (2025).https://doi
Sappia, M.S., de Korte, C.L., van Ginneken, B., Ninalga, D., Kondo, S., Kasai, S., Hirasawa, K., Akumu, T., Martín-Isla, C., Lekadir, K., Campello, V.M., Fabila, J., Beverdam, A., van Dillen, J., Neff, C., Murphy, K.: Acouslic-ai challenge report: Fetal abdominal circumference measurement on blind-sweep ultrasound data from low-income countries. Medical I...
-
[17]
Ultrasound Obstet
Sarris, I., Ioannou, C., Chamberlain, P., Ohuma, E., Roseman, F., Hoch, L., Alt- man, D.G., Papageorghiou, A.T., International Fetal and Newborn Growth Con- sortium for the 21st Century (INTERGROWTH-21st): Intra- and interobserver variability in fetal ultrasound measurements. Ultrasound Obstet. Gynecol.39(3), 266–273 (2012)
2012
-
[18]
A mathematical theory of communica- tion,
Shannon, C.E.: A mathematical theory of communication. The Bell System Tech- nical Journal27(3), 379–423 (1948).https://doi.org/10.1002/j.1538-7305. 1948.tb01338.x
-
[19]
NeurIPS30 (2017)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS30 (2017)
2017
-
[20]
In: MICCAI
Valanarasu, J.M.J., Patel, V.M.: UNeXt: MLP-Based rapid medical image segmen- tation network. In: MICCAI. pp. 23–33 (2022)
2022
-
[21]
Neural Networks145, 90–106 (2022)
Verma, V., Kawaguchi, K., Lamb, A., Kannala, J., Solin, A., Bengio, Y., Lopez- Paz, D.: Interpolation consistency training for semi-supervised learning. Neural Networks145, 90–106 (2022)
2022
-
[22]
In: IEEE 22nd International Symposium on Biomedical Imaging (2025)
Wang, F., Curran, K.M., Silvestre, G.: Semi-supervised cervical segmentation on ultrasound by a dual framework for neural networks. In: IEEE 22nd International Symposium on Biomedical Imaging (2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.