arxiv: 2604.18148 · v1 · submitted 2026-04-20 · 💻 cs.CV · cs.LG

Recognition: unknown

Attention-ResUNet for Automated Fetal Head Segmentation

Ammar Bhilwarawala , Mainak Bandyopadhyay

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:26 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords fetal head segmentationultrasound imagingattention mechanismsresidual learningmedical image segmentationprenatal caredeep learningU-Net variants

0 comments

The pith

Attention-ResUNet adds attention gates to residual connections in a U-Net to segment fetal heads more accurately in ultrasound scans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Attention-ResUNet to address low contrast, noise, and fuzzy boundaries that limit existing deep learning methods for fetal head outlining. It places attention gates at four decoder stages to emphasize relevant anatomy while residual links keep gradients flowing and reuse features. On the HC18 dataset of 200 test images, this design yields higher overlap with expert outlines than five comparable networks, with saliency maps showing focused activation on head contours rather than background. The result supports more reliable automated biometric measurements during prenatal care because the model remains computationally light at 14.7 million parameters.

Core claim

Attention-ResUNet integrates attention gates at each of the four decoder levels with residual connections to focus on anatomically relevant regions, suppress ultrasound noise, and maintain gradient flow, producing segmentations that exceed the Dice scores of ResUNet, Attention U-Net, Swin U-Net, standard U-Net, and U-Net++ on the HC18 challenge data.

What carries the argument

Attention-ResUNet, a U-Net variant that inserts attention gates at four decoder stages to weight relevant spatial features while residual skip connections enable feature reuse and stable training.

If this is right

Fetal head circumference and biparietal diameter measurements can be extracted more consistently from routine scans.
Saliency maps provide visual checks that the model attends to expected anatomical locations, aiding clinical trust.
The architecture keeps inference cost low enough for deployment on standard clinical workstations.
Statistical tests indicate the observed margins over baselines are unlikely to occur by chance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same attention-residual pattern could be tested on other ultrasound tasks such as placenta or limb segmentation where boundary contrast is similarly weak.
Real-time attention visualization during scanning might allow sonographers to adjust probe position when the model loses focus.
Retraining the identical structure on multi-center data would test whether the gains persist across equipment brands and gestational-age ranges.

Load-bearing premise

The reported accuracy gains arise from the attention-residual combination itself rather than from dataset-specific training choices or selection of the strongest run on the fixed HC18 test set.

What would settle it

Evaluation on an independent ultrasound dataset collected on different scanners or patient populations that shows no Dice improvement over the same baseline architectures would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.18148 by Ammar Bhilwarawala, Mainak Bandyopadhyay.

**Figure 1.** Figure 1: (a) ROC curve, (b) Precision vs Recall curve, and (c) Confusion Matrix for the Attention ResUNet Architecture. ROC and Precision-Recall curves in [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗

**Figure 2.** Figure 2: Hausdorff Distance (left) and ASD (right) distributions [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Hausdorff and ASD Distance correlation analysis [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Statistical significance against the proposed model. To understand the contribution of individual Attention-ResUNet components, we analyzed performance relationships with baseline architectures trained [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Saliency map with diffuse activation patterns for ResUnet Without attention gates, ResUNet relies solely on residual connections for feature propagation. The resulting saliency maps displays broad and scattered activations extending beyond the fetal head boundaries into the background regions ( [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Saliency map showing precise focus on target region via Attention UNet [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Saliency map showing enhanced spatial precision via Attention ResUNet computational resources on clinically salient regions. However, activations occasionally exhibit fragmentation and incomplete coverage of skull ossification boundaries, particularly in cases where there is severe acoustic shadowing. Our proposed architecture combines attention gating with residual learning to produce the most concentra… view at source ↗

read the original abstract

Automated fetal head segmentation in ultrasound images is critical for accurate biometric measurements in prenatal care. While existing deep learning approaches have achieved a reasonable performance, they struggle with issues like low contrast, noise, and complex anatomical boundaries which are inherent to ultrasound imaging. This paper presents Attention-ResUNet. It is a novel architecture that synergistically combines residual learning with multi-scale attention mechanisms in order to achieve enhanced fetal head segmentation. Our approach integrates attention gates at four decoder levels to focus selectively on anatomically relevant regions while suppressing the background noise, and complemented by residual connections which facilitates gradient flow and feature reuse. Extensive evaluation on the HC18 Challenge dataset where n = 200 demonstrates that Attention ResUNet achieves a superior performance with a mean Dice score of 99.30 +/- 0.14% against similar architectures. It significantly outperforms five baseline architectures including ResUNet (99.26%), Attention U-Net (98.79%), Swin U-Net (98.60%), Standard U-Net (98.58%), and U-Net++ (97.46%). Through statistical analysis we confirm highly significant improvements (p < 0.001) with effect sizes that range from 0.230 to 13.159 (Cohen's d). Using Saliency map analysis, we reveal that our architecture produces highly concentrated, anatomically consistent activation patterns, which demonstrate an enhanced interpretability which is crucial for clinical deployment. The proposed method establishes a new state of the art performance for automated fetal head segmentation whilst maintaining computational efficiency with 14.7M parameters and a 45 GFLOPs inference cost. Code repository: https://github.com/Ammar-ss

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A small 0.04% Dice gain over ResUNet on HC18 that may not isolate the attention-residual addition from training differences.

read the letter

The paper takes a standard U-Net, adds residual blocks and attention gates at four decoder levels, and reports 99.30% mean Dice on the HC18 fetal head test set of 200 images. It beats the listed baselines, with the closest being ResUNet at 99.26%. Code is released and saliency maps are shown to argue better focus on anatomy. Those are the concrete positives: a clean empirical comparison on a public dataset plus the usual clinical motivation for prenatal biometry. The architecture itself is not new; both residual connections and attention gates have appeared in prior U-Net variants for medical imaging. The contribution is therefore the specific four-level design plus the numbers on this task. The statistical tests show p<0.001 and varying Cohen's d, which is expected on a low-variance dataset of 200 images. The real question is whether the 0.04% edge comes from the architecture or from differences in how the baselines were optimized. The abstract does not state that every model was retrained from scratch with identical preprocessing, augmentation, learning-rate schedule, and epoch count. With such a narrow margin and a reported standard deviation of 0.14%, even modest hyperparameter mismatches or best-run selection can explain the gap. The saliency maps are secondary and do not resolve the attribution issue. This is the kind of paper that belongs in a medical imaging workshop or a clinical journal rather than a top-tier CV venue. A reader working on fetal ultrasound segmentation would find the numbers and the released code useful for quick comparison. A serious editor should send it to review so the authors can clarify the training protocol and add ablations; the work is coherent enough to deserve that step rather than a desk reject.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Attention-ResUNet, a U-Net variant that adds residual connections and attention gates at four decoder levels for fetal head segmentation in ultrasound. On the HC18 test set (n=200), it reports a mean Dice score of 99.30% ± 0.14%, statistically outperforming ResUNet (99.26%), Attention U-Net (98.79%), Swin U-Net (98.60%), standard U-Net (98.58%), and U-Net++ (97.46%) with p<0.001 and Cohen's d values from 0.23 to 13.16. Saliency maps are presented to illustrate focused, anatomically plausible activations. The model uses 14.7M parameters and 45 GFLOPs; public code is linked.

Significance. If the 0.04% Dice gain over ResUNet is shown to arise from the attention-residual design rather than training-protocol differences, the work would provide a modest but useful incremental advance for a clinically important task. The public repository is a clear strength for reproducibility. The extremely high absolute scores and low variance on HC18 already place the method near the practical ceiling for this dataset; further gains would require demonstrating robustness on more diverse clinical data.

major comments (1)

[§4] §4 (Experiments / Experimental Setup): The manuscript does not state that the five baseline architectures were re-trained from scratch under identical conditions (same preprocessing, augmentation policy, optimizer, learning-rate schedule, batch size, and epoch count) as Attention-ResUNet. With a mean Dice difference of only 0.04% versus ResUNet and a per-image standard deviation of 0.14%, even modest hyper-parameter mismatches can produce the observed gap; this detail is load-bearing for the central claim that the performance improvement is due to the synergistic combination of attention gates and residual connections.

minor comments (2)

[Abstract] Abstract: the clause 'complemented by residual connections which facilitates gradient flow' has a subject-verb agreement error ('connections' is plural, so 'facilitate').
[Abstract] Abstract: computational cost (14.7 M parameters, 45 GFLOPs) is reported without the corresponding figures for any baseline, preventing direct efficiency comparison.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful review and for identifying a key point of clarification in our experimental protocol. We address the major comment below and confirm our willingness to revise the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (Experiments / Experimental Setup): The manuscript does not state that the five baseline architectures were re-trained from scratch under identical conditions (same preprocessing, augmentation policy, optimizer, learning-rate schedule, batch size, and epoch count) as Attention-ResUNet. With a mean Dice difference of only 0.04% versus ResUNet and a per-image standard deviation of 0.14%, even modest hyper-parameter mismatches can produce the observed gap; this detail is load-bearing for the central claim that the performance improvement is due to the synergistic combination of attention gates and residual connections.

Authors: We agree that explicit confirmation of identical training conditions is essential for interpreting the small but statistically significant performance differences. All five baseline architectures were re-trained from scratch using precisely the same preprocessing pipeline, augmentation policy, optimizer (Adam), learning-rate schedule, batch size, and epoch count as Attention-ResUNet. This protocol was followed to isolate the effect of the architectural modifications. We acknowledge that the manuscript did not state this explicitly. We will revise Section 4 (Experiments) to include a dedicated paragraph detailing the shared training setup and confirming that all models were trained under identical conditions. This addition directly addresses the concern and reinforces the validity of our comparative claims. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical architecture comparison

full rationale

The paper proposes Attention-ResUNet, an empirical neural network for fetal head segmentation, and evaluates it on the public HC18 dataset (n=200) via reported Dice scores against baselines. No mathematical derivation chain, equations, fitted parameters presented as predictions, uniqueness theorems, or ansatzes exist. Claims rest on experimental metrics (e.g., 99.30% Dice) rather than any self-referential reduction of outputs to inputs. Self-citations, if present, are not load-bearing for core results. This is a standard empirical ML paper with no circularity patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions plus several unstated training choices. No new physical or mathematical axioms are introduced.

free parameters (2)

attention gate parameters and residual scaling factors
Learned during training; their specific values are not reported and affect the final Dice score.
training hyperparameters (learning rate, batch size, augmentation strength)
Chosen to maximize validation performance; not disclosed in the abstract.

axioms (2)

domain assumption Standard supervised segmentation loss (Dice + cross-entropy) is sufficient to train the model
Invoked implicitly when reporting Dice as the primary metric.
domain assumption The HC18 dataset split used for testing is fixed and representative
Required for the generalization claim.

pith-pipeline@v0.9.0 · 5602 in / 1412 out tokens · 60857 ms · 2026-05-10T05:26:37.053435+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 2 canonical work pages · 1 internal anchor

[1]

IEEE Access, vol

Zhang, J., et al.: ResUNet: Residual U-Net for improved biomedical segmentation. IEEE Access, vol. 7, pp. 12320–12328 (2019)

2019
[2]

Attention U-Net: Learning Where to Look for the Pancreas

Oktay, O., et al.: Attention U-Net: Learning where to look for the pancreas. In: MedIA (2018). arXiv:1804.03999

work page internal anchor Pith review arXiv 2018
[3]

In: ICCV (2021)

Liu, Z., et al.: Swin-UNet: Unet-like pure transformer for medical image segmen- tation. In: ICCV (2021). arXiv:2105.05537

work page arXiv 2021
[4]

In: MICCAI, LNCS, vol

Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomed- ical image segmentation. In: MICCAI, LNCS, vol. 9351, pp. 234–241 (2015)

2015
[5]

In: DLMIA, LNCS, vol

Zhou, Z., et al.: UNet++: A nested U-Net architecture for medical image segmen- tation. In: DLMIA, LNCS, vol. 11045, pp. 3–11 (2018)

2018
[6]

In: MLMI, LNCS, vol

Isensee, F., et al.: Automated design of deep convolutional neural networks for medical image segmentation. In: MLMI, LNCS, vol. 12436, pp. 162–171 (2020)

2020
[7]

Wang, X., et al.: MedicalNet: CNN with lightweight attention for ultrasound image segmentation. Comput. Biol. Med., vol. 154, p. 106548 (2023)

2023
[8]

In: MIDL (2021)

Chen, J., et al.: TransUNet: Transformers make strong encoders for medical image segmentation. In: MIDL (2021)

2021
[9]

In: CVPR, pp

Hatamizadeh, P., et al.: UNETR: Transformers for 3D medical image segmentation. In: CVPR, pp. 606–615 (2022)

2022
[10]

IEEE Trans

Kumar, P., et al.: Interpretable deep learning for medical image analysis: Channel and spatial attention in residual frameworks. IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 3, pp. 1089–1105 (2024)

2024
[11]

IEEE Trans

Nagabotu, V., et al.: Precise segmentation of fetal head in ultrasound images using hybrid loss and scale attention. IEEE Trans. Med. Imaging, vol. 43, no. 5, pp. 1234– 1248 (2024)

2024
[12]

Alzubaidi, M., et al.: FetSAM: Advanced segmentation techniques for fetal struc- tures in 3D ultrasound. Comput. Methods Prog. Biomed., vol. 245, p. 107892 (2024)

2024
[13]

In: ICCV, pp

Selvaraju, R., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)

2017
[14]

Radiology, vol

Wollek, A., et al.: Attention-based saliency maps improve interpretability of vision transformers for pneumothorax detection. Radiology, vol. 306, no. 2, pp. 123–135 (2023)

2023