AMN: An Adaptive Multi-Scale Fusion Network with Boundary and Uncertainty Modeling for Nuclei Segmentation
Pith reviewed 2026-06-28 17:11 UTC · model grok-4.3
The pith
AMN fuses Swin Transformer and ResNet-50 features through per-channel gating to improve nuclei subtype segmentation over single-encoder baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AMN is a dual-encoder segmentation framework that jointly leverages a Swin Transformer and a ResNet-50 feature pyramid, fused via a learned per-channel gating mechanism that dynamically weighs each encoder's contribution at every scale. AMN is trained with a multi-objective loss combining class-weighted focal loss, boundary-aware loss with positive-pixel emphasis, and a novel uncertainty-modulated classification term that suppresses overconfident erroneous predictions. On the CoNIC benchmark across seven nuclei classes it reaches a mean Dice of 0.82 and mean F1 of 0.68, outperforming U-Net, ResU-Net, DeepLabV3+, SegNet, ViT-Small, HmsU-Net, ConvFormer-UNet, and BEFUnet, and shows strong gene
What carries the argument
The learned per-channel gating that dynamically weights Swin Transformer and ResNet-50 contributions at each scale, combined with the uncertainty-modulated term in the loss.
If this is right
- Higher subtype classification accuracy directly improves automated tumor grading and immune infiltrate quantification on whole-slide images.
- Cross-dataset transfer without retraining indicates the representations are robust to staining and scanner variations common in clinical pathology.
- Stronger performance on the lymphocyte class suggests the boundary and uncertainty terms help with small or densely packed nuclei that defeat standard losses.
- Hybrid CNN-transformer designs with scale-specific adaptive fusion can outperform both pure-CNN and pure-transformer segmentation networks on histopathology tasks.
Where Pith is reading between the lines
- The same gating pattern could be tested on other paired encoders such as CNN plus vision transformer for non-medical segmentation problems.
- Uncertainty modulation may reduce the effect of label noise that often occurs when pathologists annotate nuclei subtypes.
- Extending the framework to three or more encoders would test whether the per-channel weighting generalizes beyond two sources.
Load-bearing premise
The reported gains are produced by the per-channel gating and uncertainty-modulated loss rather than by the choice of the two encoders or by ordinary training of the same backbone combination.
What would settle it
An ablation that removes the gating module and the uncertainty term, retrains the identical dual-encoder backbone with only the remaining loss terms, and measures Dice and F1 on CoNIC; if performance drops to baseline levels the claim holds, otherwise the contribution of the new components is not isolated.
Figures
read the original abstract
Accurate classification of nuclei subtypes in histopathology images is critical for downstream tasks including tumor grading, immune infiltrate quantification, and prognosis prediction. Existing approaches rely on either convolutional or transformer-based encoders in isolation, limiting their ability to simultaneously capture fine-grained local texture and long-range spatial context. We present AMN (Adaptive Multi-Scale Nuclei Network), a dual-encoder segmentation framework that jointly leverages a Swin Transformer and a ResNet-50 feature pyramid, fused via a learned per-channel gating mechanism that dynamically weighs each encoder's contribution at every scale. AMN is trained with a multi-objective loss combining class-weighted focal loss, boundary-aware loss with positive-pixel emphasis, and a novel uncertainty-modulated classification term that suppresses overconfident erroneous predictions. Evaluated on the CoNIC benchmark across seven nuclei classes, AMN achieves a mean Dice of 0.82 and mean F1 of 0.68, with an F1 of 0.67 on the diagnostically challenging lymphocyte class. AMN outperforms eight baseline models spanning pure-CNN, pure-transformer, and recent hybrid architectures: U-Net, ResU-Net, DeepLabV3+, SegNet, ViT-Small, HmsU-Net, ConvFormer-UNet, and BEFUnet. Cross-dataset evaluation on MoNuSeg demonstrates strong generalization without retraining and validating the domain robustness of the learned representations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AMN, a dual-encoder nuclei segmentation network that fuses Swin Transformer and ResNet-50 features via a learned per-channel gating mechanism at multiple scales. It is trained with a composite loss (class-weighted focal, boundary-aware with positive-pixel emphasis, and uncertainty-modulated classification) and reports mean Dice of 0.82 and mean F1 of 0.68 on the CoNIC benchmark across seven nuclei classes, outperforming eight baselines (U-Net, ResU-Net, DeepLabV3+, SegNet, ViT-Small, HmsU-Net, ConvFormer-UNet, BEFUnet). Cross-dataset evaluation on MoNuSeg is claimed to show strong generalization without retraining.
Significance. If the performance gains can be rigorously attributed to the proposed gating and uncertainty components, the work would provide a concrete example of adaptive multi-scale fusion for histopathology segmentation, with potential utility for downstream tasks such as tumor grading and immune quantification. The hybrid encoder design and boundary/uncertainty terms address known challenges in nuclei subtype classification.
major comments (2)
- [Experiments section] Experiments section: the manuscript reports superior Dice (0.82) and F1 (0.68) on CoNIC but contains no ablation that trains the identical Swin+ResNet-50 dual-encoder backbone under a standard loss (without per-channel gating or the uncertainty-modulated term). This omission prevents attribution of the gains to the two proposed mechanisms rather than encoder choice, augmentation, or optimization.
- [Results on CoNIC and MoNuSeg] Results on CoNIC and MoNuSeg: no statistical significance tests, standard deviations, or error bars accompany the reported metrics or the outperformance claims versus the eight baselines; the cross-dataset generalization statement likewise lacks quantitative support in the provided summary.
minor comments (1)
- The abstract states 'strong cross-dataset generalization' on MoNuSeg but does not list the exact quantitative metrics or whether any fine-tuning occurred.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of experimental validation and statistical reporting. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Experiments section] Experiments section: the manuscript reports superior Dice (0.82) and F1 (0.68) on CoNIC but contains no ablation that trains the identical Swin+ResNet-50 dual-encoder backbone under a standard loss (without per-channel gating or the uncertainty-modulated term). This omission prevents attribution of the gains to the two proposed mechanisms rather than encoder choice, augmentation, or optimization.
Authors: We agree that the current experiments do not include an ablation isolating the dual-encoder backbone trained under a standard loss without the gating or uncertainty terms. This limits direct attribution of gains to the proposed components. In the revised manuscript we will add this ablation study, training the identical Swin+ResNet-50 backbone with a baseline loss (e.g., weighted cross-entropy plus Dice) and reporting the resulting metrics for comparison against the full AMN model. revision: yes
-
Referee: [Results on CoNIC and MoNuSeg] Results on CoNIC and MoNuSeg: no statistical significance tests, standard deviations, or error bars accompany the reported metrics or the outperformance claims versus the eight baselines; the cross-dataset generalization statement likewise lacks quantitative support in the provided summary.
Authors: We acknowledge that the absence of statistical tests, standard deviations, and error bars weakens the strength of the reported outperformance. We will revise the results section to include these: standard deviations computed over multiple random seeds, error bars on bar plots, and paired statistical significance tests (e.g., Wilcoxon signed-rank) against each baseline on CoNIC. For the MoNuSeg cross-dataset evaluation we will add the corresponding quantitative Dice and F1 scores to support the generalization claim. revision: yes
Circularity Check
No circularity in empirical architecture and benchmark evaluation
full rationale
The paper proposes an empirical dual-encoder segmentation network (Swin + ResNet-50 with per-channel gating and uncertainty-modulated loss) and reports measured performance (Dice 0.82, F1 0.68 on CoNIC; generalization on MoNuSeg) against external baselines. No equations, self-definitional loops, fitted-input-as-prediction, or self-citation chains are present that would reduce the reported metrics to quantities defined inside the method itself. The central claims rest on standard train/test splits and independent benchmark datasets rather than any internal reduction or tautology. This is a conventional empirical ML paper whose derivation chain is self-contained against external evaluation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
B. Fu, Y . Peng, J. He, C. Tian, X. Sun, and R. Wang, ”HmsU-Net: A Hybrid Multi-Scale U-Net Based on a CNN and Transformer for Medical Image Segmentation,”Computers in Biology and Medicine, vol. 170, p. 108013, Mar. 2024, doi: 10.1016/j.compbiomed.2024.108013
-
[2]
Frontiers in Cardiovascular Medicine10, 1056055 (Feb 2023)
H. Tang et al., ”HTC-Net: A Hybrid CNN-Transformer Framework for Medical Image Segmentation,”Biomedical Signal Processing and Control, vol. 88, p. 105605, Feb. 2024, doi: 10.1016/j.bspc.2023.105605
-
[3]
X. Lin, Z. Yan, X. Deng, C. Zheng, and L. Yu, ”ConvFormer: Plug-and- play CNN-style Transformers for Improving Medical Image Segmenta- tion,” inProc. MICCAI, 2023, pp. 642–651
2023
-
[4]
X. Liu et al., ”Enhancing Medical Image Segmentation via Complemen- tary CNN-Transformer Fusion and Boundary Perception,”Frontiers in Computer Science, 2025, doi: 10.3389/fcomp.2025.1677905
-
[5]
Yao et al., ”From CNN to Transformer: A Review of Medical Image Segmentation Models,”Journal of Imaging Informatics in Medicine, vol
W. Yao et al., ”From CNN to Transformer: A Review of Medical Image Segmentation Models,”Journal of Imaging Informatics in Medicine, vol. 37, no. 4, pp. 1529–1547, Aug. 2024
2024
-
[6]
Pu et al., ”Advantages of Transformer and Its Application for Medical Image Segmentation: A Survey,”BioMedical Engineering OnLine, vol
Q. Pu et al., ”Advantages of Transformer and Its Application for Medical Image Segmentation: A Survey,”BioMedical Engineering OnLine, vol. 23, p. 14, Feb. 2024
2024
-
[7]
A. R. Khan and A. Khan, ”Multi-Axis Vision Transformer for Medical Image Segmentation,”Engineering Applications of Artificial Intelli- gence, 2025
2025
-
[8]
Jiang et al., ”Hybrid U-Net Model with Visual Transformers for Enhanced Multi-Organ Medical Image Segmentation,”Information, vol
P. Jiang et al., ”Hybrid U-Net Model with Visual Transformers for Enhanced Multi-Organ Medical Image Segmentation,”Information, vol. 16, no. 2, p. 111, Feb. 2025
2025
-
[9]
Xu, Y .-L
W. Xu, Y .-L. Fu, and D. Zhu, ”ResNet and Its Application to Med- ical Image Processing: Research Progress and Challenges,”Computer Methods and Programs in Biomedicine, vol. 240, p. 107660, Oct. 2023
2023
-
[10]
Z. Wang et al., ”Skin Lesion Segmentation Using Atrous Convolution via DeepLab v3,”arXiv preprint arXiv:1807.08891, 2018
Pith/arXiv arXiv 2018
-
[11]
Krithika (alias AnbuDevi) and K
M. Krithika (alias AnbuDevi) and K. Suganthi, ”Review of Semantic Segmentation of Medical Images Using Modified Architectures of UNet,”Diagnostics, vol. 12, no. 12, p. 3064, 2022
2022
-
[12]
K. Fu et al., ”A Survey of Vision Transformer Derivatives for Medical Image Segmentation,”arXiv preprint arXiv:2205.11239, 2022
arXiv 2022
-
[13]
Liu et al., ”Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” inProc
Z. Liu et al., ”Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” inProc. IEEE/CVF ICCV, 2021, pp. 10012–10022
2021
-
[14]
Graham et al., ”CoNIC: Colon Nucleus Identification and Counting Challenge,” inProc
S. Graham et al., ”CoNIC: Colon Nucleus Identification and Counting Challenge,” inProc. MICCAI, 2021
2021
-
[15]
Kumar et al., ”A Multi-Organ Nucleus Segmentation Challenge,” IEEE Transactions on Medical Imaging, 2019
N. Kumar et al., ”A Multi-Organ Nucleus Segmentation Challenge,” IEEE Transactions on Medical Imaging, 2019
2019
-
[16]
He et al., ”Deep Residual Learning for Image Recognition,” inProc
K. He et al., ”Deep Residual Learning for Image Recognition,” inProc. IEEE CVPR, 2016
2016
-
[17]
Lin et al., ”Feature Pyramid Networks for Object Detection,” in Proc
T.-Y . Lin et al., ”Feature Pyramid Networks for Object Detection,” in Proc. IEEE CVPR, 2017
2017
-
[18]
Lin et al., ”Focal Loss for Dense Object Detection,” inProc
T.-Y . Lin et al., ”Focal Loss for Dense Object Detection,” inProc. IEEE ICCV, 2017
2017
-
[19]
Kendall, Y
A. Kendall, Y . Gal, and R. Cipolla, ”Multi-Task Learning Using Uncer- tainty to Weigh Losses,” inProc. IEEE CVPR, 2018
2018
-
[20]
Ronneberger, P
O. Ronneberger, P. Fischer, and T. Brox, ”U-Net: Convolutional Net- works for Biomedical Image Segmentation,” inProc. MICCAI, 2015
2015
-
[21]
Chen et al., ”Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” inProc
L.-C. Chen et al., ”Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” inProc. ECCV, 2018
2018
-
[22]
Badrinarayanan, A
V . Badrinarayanan, A. Kendall, and R. Cipolla, ”SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
2017
-
[23]
Dosovitskiy et al., ”An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” inProc
A. Dosovitskiy et al., ”An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” inProc. ICLR, 2021
2021
-
[24]
W. Wang, Y . Luo, and X. Wang, ”BefNet: A Hybrid CNN-Mamba Architecture for Accurate Skin Lesion Image Segmentation,” inProc. IEEE BIBM, 2024, pp. 3795–3798
2024
-
[25]
K. Afnaan, K. L. S. P. Reddy, K. P. Dharmaraj, K. Ajith, T. Singh, and K. Hushme, ”Deep Learning for Enhanced Delineation and Clas- sification in Brain MRI Images,” inIFIP Advances in Information and Communication Technology, Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-98356-6\ 11
-
[26]
Afnaan, S
K. Afnaan, S. Palaniswamy, T. Singh, and B. Prakash, ”VisioRenalNet: Spatial Vision Transformer UNet for Enhanced T2-Weighted Kidney MRI Segmentation,” inProc. ICMLDE, Procedia Computer Science, vol. 235, 2024, pp. 1674–1683
2024
-
[27]
M. Satish and S. Palaniswamy, ”Image Super-Resolution by Aug- mentation of Region Information by Rapid Segmentation,” inApplied Soft Computing and Communication Networks (ACN 2023), Lecture Notes in Networks and Systems, vol. 966, Springer, Singapore, 2024. https://doi.org/10.1007/978-981-97-2004-0\ 27
-
[28]
B. S. Devi, R. P. Singh, and S. Palaniswamy, ”Enhancing Aerial Ship Segmentation: Attention-Based U-Net Optimization with Reduced Resolution,” inProc. 6th Int. Conf. Emerging Technology (INCET), Belgaum, India, 2025, pp. 1–6. https://doi.org/10.1109/INCET64471. 2025.11139870
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.