A Two-Stage Deep Learning Framework for Segmentation of Ten Gastrointestinal Organs from Coronal MR Enterography
Pith reviewed 2026-05-10 06:19 UTC · model grok-4.3
The pith
A two-stage deep learning pipeline first locates broad regions then refines ten gastrointestinal organs in coronal MR enterography scans, reaching 88.99 percent mean Dice score.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that separating localization from organ-specific refinement in a coarse-to-fine pipeline overcomes low contrast and class imbalance in MRE images. The first stage produces usable ROIs for all ten structures; the second stage then delivers precise boundaries, lifting DSC by up to 23.62 percent for the cecum and 18.57 percent for the sigmoid. Overall metrics on the 114-patient public set exceed those of competing single-stage networks.
What carries the argument
The two-stage coarse-to-fine pipeline: DenseNet201-UNet++ for initial ROI extraction followed by DenseNet121-SelfONN-UNet for patch-wise refinement with class-specific weighting.
If this is right
- The second stage produces measurable DSC gains for small and low-contrast organs that single-stage models miss.
- Class weighting and patch-based training reduce the impact of severe imbalance, especially for the appendix.
- The framework supplies anatomically detailed masks that could feed into downstream diagnostic or monitoring software for IBD.
- Higher computational cost is accepted in exchange for the observed boundary accuracy.
Where Pith is reading between the lines
- The same coarse-to-fine split could be tested on other abdominal MRI sequences or CT data to check transferability.
- Reducing the second-stage model size while preserving the accuracy lift would make the pipeline more suitable for routine clinical workstations.
- Combining the output masks with quantitative measures of organ wall thickness or inflammation could support automated IBD severity scoring.
Load-bearing premise
That the accuracy gains seen on this single 114-patient public dataset will hold for images from new patients, different scanners, or changed imaging protocols.
What would settle it
Running the trained models on an independent set of MRE scans acquired on different MRI machines or from a new patient population and checking whether the mean Dice remains above 85 percent would confirm or refute generalization.
Figures
read the original abstract
Accurate segmentation of gastrointestinal (GI) organs in magnetic resonance enterography (MRE) is critical for diagnosing inflammatory bowel disease (IBD). However, anatomical variability, class imbalance, and low tissue contrast hinder reliable automation. This study proposes a dual-stage deep learning framework for organ-specific segmentation of GI structures from coronal MRE images to address these challenges. A publicly available MRE dataset of 3,195 coronal T2-weighted HASTE slices from 114 IBD patients was used. Initially, a DenseNet201-UNet++ model generated coarse masks for ROI extraction. A DenseNet121-SelfONN-UNet model was then trained on organ-specific patches. Extensive data augmentation, normalization, five-fold cross-validation, and class-specific weighting were applied to mitigate severe class imbalance, particularly for the appendix. The initial stage achieved strong organ localization but underperformed for the appendix; class weighting improved its DSC from 6.76% to 85.76%. The second-stage DenseNet121-SelfONN-UNet significantly enhanced segmentation across all GI structures, with notable DSC gains (cecum +23.62%, sigmoid +18.57%, rectum +17.99%, small intestine +16.06%). Overall, the framework achieved mDSC of 88.99%, mIoU of 84.76%, and mHD95 of 6.94 mm, outperforming all baselines. This framework demonstrates the effectiveness of a coarse-to-fine, organ-aware segmentation strategy for intestinal MRE. Despite higher computational cost, it shows strong potential for clinical translation and enables anatomically informed diagnostic tools in gastroenterology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes a two-stage deep learning framework for the segmentation of ten gastrointestinal organs in coronal MR enterography (MRE) images. The first stage employs a DenseNet201-UNet++ model to produce coarse segmentation masks, which are used to extract organ-specific regions of interest (ROIs). The second stage then applies a DenseNet121-SelfONN-UNet model trained on these patches for refined segmentation. The approach is evaluated on a public dataset comprising 3,195 coronal T2-weighted HASTE slices from 114 IBD patients, utilizing five-fold cross-validation, data augmentation, normalization, and class-specific weighting to address class imbalance. The framework reports an overall mean Dice Similarity Coefficient (mDSC) of 88.99%, mean IoU of 84.76%, and mean HD95 of 6.94 mm, with notable improvements in the second stage for organs such as the cecum (+23.62% DSC), sigmoid (+18.57%), rectum (+17.99%), and small intestine (+16.06%), outperforming baseline models.
Significance. Should the reported performance metrics prove robust upon clarification of the evaluation protocol, this study makes a meaningful contribution to automated analysis of MRE for inflammatory bowel disease by showing how a coarse-to-fine, organ-aware strategy can mitigate challenges like anatomical variability and severe class imbalance (e.g., improving appendix DSC from 6.76% to 85.76% with weighting). The explicit use of five-fold cross-validation, augmentation, and class weighting are positive aspects that enhance reproducibility. The work has potential for clinical translation in gastroenterology, though external validation on diverse datasets would strengthen the claims of generalizability.
major comments (1)
- [Methods and Experiments (cross-validation procedure)] In the Methods section describing the two-stage pipeline and the Experiments section on five-fold cross-validation, it is not specified whether the first-stage DenseNet201-UNet++ model is retrained independently within each fold (i.e., nested CV) or if a single model trained on the full 3,195-slice dataset is used to generate coarse masks and organ-specific patches for the second stage. The latter case would allow test-set information to leak into the second-stage training via the first-stage predictions, rendering the central performance claims (mDSC 88.99%, mIoU 84.76%, and per-organ DSC gains such as +23.62% for cecum) unreliable.
minor comments (2)
- [Abstract] The abstract states that the framework 'outperforms all baselines' but does not name the specific baseline models (e.g., single-stage U-Net variants or other DenseNet configurations); adding this detail would allow readers to better contextualize the reported gains.
- [Abstract] The abstract refers to segmentation of 'Ten Gastrointestinal Organs' without listing them; explicitly naming the organs (appendix, cecum, sigmoid, rectum, small intestine, etc.) in the abstract would improve immediate clarity.
Simulated Author's Rebuttal
We thank the referee for their careful review and for identifying an important ambiguity in our description of the evaluation protocol. We address the major comment below and will revise the manuscript accordingly to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Methods and Experiments (cross-validation procedure)] In the Methods section describing the two-stage pipeline and the Experiments section on five-fold cross-validation, it is not specified whether the first-stage DenseNet201-UNet++ model is retrained independently within each fold (i.e., nested CV) or if a single model trained on the full 3,195-slice dataset is used to generate coarse masks and organ-specific patches for the second stage. The latter case would allow test-set information to leak into the second-stage training via the first-stage predictions, rendering the central performance claims (mDSC 88.99%, mIoU 84.76%, and per-organ DSC gains such as +23.62% for cecum) unreliable.
Authors: We sincerely thank the referee for highlighting this critical detail. We confirm that a nested cross-validation procedure was used: within each of the five folds, the DenseNet201-UNet++ model was trained exclusively on the training subset of that fold (using the validation subset for early stopping and hyperparameter tuning), and the trained model was applied only to the held-out test subset of the same fold to produce coarse masks and extract organ-specific ROIs. The second-stage DenseNet121-SelfONN-UNet was then trained and evaluated solely on the patches derived from the training data of that fold. This ensures complete separation and prevents any test-set leakage. We apologize for the lack of explicit description in the original manuscript. In the revised version, we will expand the Methods and Experiments sections to detail the nested CV protocol, specify that partitioning was performed at the patient level, and include a schematic of the fold-wise process. revision: yes
Circularity Check
No significant circularity in empirical two-stage segmentation framework
full rationale
The paper describes a standard empirical deep learning pipeline: a public dataset of 3,195 slices is split via five-fold cross-validation, a first-stage DenseNet201-UNet++ produces coarse masks for patch extraction, and a second-stage DenseNet121-SelfONN-UNet is trained on those patches to report mDSC, mIoU, and mHD95 on held-out folds. No equations, uniqueness theorems, or ansatzes are invoked; performance numbers are direct outputs of training and evaluation rather than quantities that reduce to fitted inputs by construction. The two-stage design is a conventional coarse-to-fine strategy with no self-referential definitions or load-bearing self-citations. While the exact nesting of the first-stage model within each CV fold is not quoted in the provided text, this is a methodological detail rather than a circular reduction of the claimed metrics to the inputs themselves. The derivation chain is therefore self-contained and externally falsifiable on the public dataset.
Axiom & Free-Parameter Ledger
free parameters (1)
- class-specific weights
axioms (1)
- domain assumption The 3,195-slice dataset from 114 IBD patients is sufficiently representative for clinical generalization
Reference graph
Works this paper leans on
-
[1]
Large Association of GI Tract Microbial Community with Immune and Nervous Systems,
A. Kazempour and A. Kazempour, “Large Association of GI Tract Microbial Community with Immune and Nervous Systems,” Immunology of the GI Tract - Recent Advances, Dec. 2022, doi: 10.5772/INTECHOPEN.104120
-
[2]
Pathophysiologic Role of Neurotransmitters in Digestive Diseases,
X. Yang et al., “Pathophysiologic Role of Neurotransmitters in Digestive Diseases,” Front Physiol, vol. 12, p. 567650, Jun. 2021, doi: 10.3389/FPHYS.2021.567650/XML/NLM
-
[3]
K. A. Sharkey and G. M. Mawe, “The enteric nervous system,” Physiol Rev, vol. 103, no. 2, pp. 1487–1564, Apr. 2023, doi: 10.1152/PHYSREV.00018.2022/ASSET/IMAGES/LARGE/PHYSREV.00018.2022_F010.JPEG
work page doi:10.1152/physrev.00018.2022/asset/images/large/physrev.00018.2022_f010.jpeg 2023
-
[4]
S. M. Collins and P. Bercik, “The Relationship Between Intestinal Microbiota and the Central Nervous System in Normal Gastrointestinal Function and Disease,” Gastroenterology, vol. 136, no. 6, pp. 2003– 2014, May 2009, doi: 10.1053/J.GASTRO.2009.01.075
-
[5]
I. Ogobuiro, J. Gonzales, K. R. Shumway, and F. Tuma, “Physiology, Gastrointestinal,” StatPearls, Apr. 2023, Accessed: Jul. 03, 2025. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK537103/
work page 2023
-
[6]
“illustration of Healthcare and Medical education drawing chart of Human Digestive System for Science Biology study 2803159 Vector Art at Vecteezy.” Accessed: Jul. 04, 2025. [Online]. Available: https://www.vecteezy.com/vector-art/2803159-illustration-of-healthcare-and-medical-education-drawing- chart-of-human-digestive-system-for-science-biology-study
-
[7]
Epidemiology and clinical course of Crohn’s disease: Results from observational studies,
Ø. Hovde and B. A. Moum, “Epidemiology and clinical course of Crohn’s disease: Results from observational studies,” World J Gastroenterol, vol. 18, no. 15, pp. 1723–1731, 2012, doi: 10.3748/WJG.V18.I15.1723
-
[8]
S. M. Hong and D. H. Baek, “Diagnostic Procedures for Inflammatory Bowel Disease: Laboratory, Endoscopy, Pathology, Imaging, and Beyond,” Diagnostics 2024, Vol. 14, Page 1384, vol. 14, no. 13, p. 1384, Jun. 2024, doi: 10.3390/DIAGNOSTICS14131384
-
[9]
Crohn’s disease: factors associated with exposure to high levels of diagnostic radiation,
A. N. Desmond et al., “Crohn’s disease: factors associated with exposure to high levels of diagnostic radiation,” Gut, vol. 57, no. 11, pp. 1524–1529, Nov. 2008, doi: 10.1136/GUT.2008.151415
-
[10]
Imaging in inflammatory bowel disease: current and future perspectives,
N. Shaban et al., “Imaging in inflammatory bowel disease: current and future perspectives,” Frontline Gastroenterol, vol. 13, no. e1, pp. e28–e34, Aug. 2022, doi: 10.1136/FLGASTRO-2022-102117
-
[11]
Magnetic resonance for assessment of disease activity and severity in ileocolonic Crohn’s disease,
J. Rimola et al., “Magnetic resonance for assessment of disease activity and severity in ileocolonic Crohn’s disease,” Gut, vol. 58, no. 8, pp. 1113–1120, Aug. 2009, doi: 10.1136/GUT.2008.167957
-
[12]
S. Samuel et al., “Endoscopic Skipping of the Distal Terminal Ileum in Crohn’s Disease Can Lead to Negative Results From Ileocolonoscopy,” Clinical Gastroenterology and Hepatology, vol. 10, no. 11, pp. 1253–1259, Nov. 2012, doi: 10.1016/J.CGH.2012.03.026
-
[13]
Gastrointestinal diseases segmentation and classification based on duo-deep architectures,
M. A. Khan et al., “Gastrointestinal diseases segmentation and classification based on duo-deep architectures,” Pattern Recognit Lett, vol. 131, pp. 193–204, Mar. 2020, doi: 10.1016/J.PATREC.2019.12.024
-
[14]
MR imaging of the small bowel in Crohn’s disease,
H. Siddiki and J. Fidler, “MR imaging of the small bowel in Crohn’s disease,” Eur J Radiol, vol. 69, no. 3, pp. 409–417, Mar. 2009, doi: 10.1016/J.EJRAD.2008.11.013
-
[15]
K. Horsthuis, S. Bipat, R. J. Bennink, and J. Stoker, “Inflammatory Bowel Disease Diagnosed with US, MR, Scintigraphy, and CT: Meta-analysis of Prospective Studies1,” pubs.rsna.orgK Horsthuis, S Bipat, RJ Bennink, J StokerRadiology, 2008•pubs.rsna.org, vol. 247, no. 1, pp. 64–79, Apr. 2008, doi: 10.1148/RADIOL.2471070611
-
[16]
Deep convolutional neural networks for multi-modality isointense infant brain image segmentation,
W. Zhang et al., “Deep convolutional neural networks for multi-modality isointense infant brain image segmentation,” Neuroimage, vol. 108, pp. 214–224, Mar. 2015, doi: 10.1016/J.NEUROIMAGE.2014.12.061
-
[17]
T. Cogan, M. Cogan, and L. Tamil, “MAPGI: Accurate identification of anatomical landmarks and diseased tissue in gastrointestinal tract using deep learning,” Comput Biol Med, vol. 111, p. 103351, Aug. 2019, doi: 10.1016/J.COMPBIOMED.2019.103351
-
[18]
Automatic Multi-Organ Segmentation on Abdominal CT with Dense V-Networks,
E. Gibson et al., “Automatic Multi-Organ Segmentation on Abdominal CT with Dense V-Networks,” IEEE Trans Med Imaging, vol. 37, no. 8, pp. 1822–1834, Aug. 2018, doi: 10.1109/TMI.2018.2806309
-
[19]
S. Wang, Y. Cong, H. Zhu, X. Chen, … L. Q.-I. J. of, and undefined 2020, “Multi-scale context-guided deep network for automated lesion segmentation with endoscopy images of gastrointestinal tract,” ieeexplore.ieee.orgS Wang, Y Cong, H Zhu, X Chen, L Qu, H Fan, Q Zhang, M LiuIEEE Journal of Biomedical and Health Informatics, 2020•ieeexplore.ieee.org, Acces...
-
[20]
Deep learning for automatic bowel-obstruction identification on abdominal CT,
Q. Vanderbecq et al., “Deep learning for automatic bowel-obstruction identification on abdominal CT,” SpringerQ Vanderbecq, M Gelard, JC Pesquet, M Wagner, L Arrive, M Zins, E ChouzenouxEuropean Radiology, 2024•Springer, vol. 34, no. 9, pp. 5842–5853, Sep. 2024, doi: 10.1007/S00330-024-10657-Z
-
[21]
The application of deep learning in abdominal trauma diagnosis by CT imaging,
X. Shen et al., “The application of deep learning in abdominal trauma diagnosis by CT imaging,” SpringerX Shen, Y Zhou, X Shi, S Zhang, S Ding, L Ni, X Dou, L ChenWorld Journal of Emergency Surgery, 2024•Springer, vol. 19, no. 1, Dec. 2024, doi: 10.1186/S13017-024-00546-7
-
[22]
Y. Gonzalez et al., “Semi-automatic sigmoid colon segmentation in CT for radiation therapy treatment planning via an iterative 2.5-D deep learning approach,” Med Image Anal, vol. 68, Feb. 2021, doi: 10.1016/j.media.2020.101896
-
[23]
Y. Lamash et al., “Curved planar reformatting and convolutional neural network-based segmentation of the small bowel for visualization and quantitative assessment of pediatric Crohn’s disease from MRI,” Journal of Magnetic Resonance Imaging, vol. 49, no. 6, pp. 1565–1576, Jun. 2019, doi: 10.1002/jmri.26330
-
[24]
L. D. van Harten, C. S. de Jonge, K. J. Beek, J. Stoker, and I. Išgum, “Untangling and segmenting the small intestine in 3D cine-MRI using deep learning,” Med Image Anal, vol. 78, May 2022, doi: 10.1016/j.media.2022.102386
-
[25]
N. S. Dellschaft et al., “Magnetic resonance imaging of the gastrointestinal tract shows reduced small bowel motility and altered chyme in cystic fibrosis compared to controls,” Journal of Cystic Fibrosis, vol. 21, no. 3, pp. 502–505, May 2022, doi: 10.1016/j.jcf.2021.12.007
-
[26]
Automatic colon segmentation on T1-FS MR images,
B. Orellana, I. Navazo, P. Brunet, E. Monclús, Á. Bendezú, and F. Azpiroz, “Automatic colon segmentation on T1-FS MR images,” Computerized Medical Imaging and Graphics, vol. 123, Jul. 2025, doi: 10.1016/j.compmedimag.2025.102528
-
[27]
J. Ding, Y. Zhang, A. Amjad, J. Xu, D. Thill, and X. A. Li, “Automatic Contour Refinement for Deep Learning Auto-segmentation of Complex Organs in MRI-guided Adaptive Radiation Therapy,” Adv Radiat Oncol, vol. 7, no. 5, Sep. 2022, doi: 10.1016/j.adro.2022.100968
-
[28]
O. Brem, D. Elisha, E. Konen, M. Amitai, and E. Klang, “Deep learning in magnetic resonance enterography for Crohn’s disease assessment: a systematic review,” Sep. 01, 2024, Springer. doi: 10.1007/s00261-024-04326-4
-
[29]
Z. Zhong et al., “A comprehensive dataset of magnetic resonance enterography images with intestinal segment annotations,” Sci Data, vol. 12, no. 1, p. 425, Dec. 2025, doi: 10.1038/S41597-025-04760- Z;SUBJMETA=1046,1503,257,639,692,699,705;KWRD=INFLAMMATORY+BOWEL+DISEASE,SCIE NTIFIC+DATA
-
[30]
Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: A Nested U-Net Architecture for Medical Image Segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11045 LNCS, pp. 3–11, Jul. 2018, doi: 10.1007/978-3-030-00889-5_1
-
[31]
In: 2017 IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR)
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 2261–2269, Aug. 2016, doi: 10.1109/CVPR.2017.243
-
[32]
Self-organized Operational Neural Networks with Generative Neurons,
S. Kiranyaz, J. Malik, H. Ben Abdallah, T. Ince, A. Iosifidis, and M. Gabbouj, “Self-organized Operational Neural Networks with Generative Neurons,” Neural Networks, vol. 140, pp. 294–308, Aug. 2021, doi: 10.1016/J.NEUNET.2021.02.028
-
[33]
Deep learning-driven segmentation of ischemic stroke lesions using multi-channel MRI,
A. Rahman et al., “Deep learning-driven segmentation of ischemic stroke lesions using multi-channel MRI,” Biomed Signal Process Control, vol. 105, Jul. 2025, doi: 10.1016/j.bspc.2025.107676
-
[34]
H. Phan, K. Yamamoto, T. H. Phan, and K. Yamamoto, “Resolving Class Imbalance in Object Detection with Weighted Cross Entropy Losses,” Jun. 2020, Accessed: Jul. 03, 2025. [Online]. Available: https://arxiv.org/pdf/2006.01413
-
[35]
Fuller, A., Millard, K., Green, J., 2023
S. Jadon, “A survey of loss functions for semantic segmentation,” 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2020, Oct. 2020, doi: 10.1109/CIBCB48159.2020.9277638
-
[36]
Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool,
A. A. Taha and A. Hanbury, “Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool,” BMC Med Imaging, vol. 15, no. 1, pp. 1–28, Aug. 2015, doi: 10.1186/S12880-015-0068- X/TABLES/5
-
[37]
Towards a guideline for evaluation metrics in medical image segmentation,
D. Müller, I. Soto-Rey, and F. Kramer, “Towards a guideline for evaluation metrics in medical image segmentation,” BMC Res Notes, vol. 15, no. 1, pp. 1–8, Dec. 2022, doi: 10.1186/S13104-022-06096- Y/FIGURES/2
-
[38]
J. Li et al., “Establishing a machine learning model based on dual-energy CT enterography to evaluate Crohn’s disease activity,” Insights Imaging, vol. 15, no. 1, Dec. 2024, doi: 10.1186/s13244-024-01703-x
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.