Anatomically Consistent Segmentation of Organs at Risk in MRI with Convolutional Neural Networks
Pith reviewed 2026-05-25 09:30 UTC · model grok-4.3
The pith
A convolutional neural network segments eight brain organs at risk from MRI with mean surface distances of 0.1 to 0.7 mm.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The method segments eye, lens, optic nerve, optic chiasm, pituitary gland, hippocampus, brainstem and brain using a CNN trained end-to-end for multiple classes; an efficient procedure accommodates missing ground-truth labels for subsets of classes while a graph-based post-processing step enforces optic-nerve connectivity, yielding mean distances to ground truth of 0.1-0.7 mm and 96 percent clinical acceptability on held-out data.
What carries the argument
Efficient training algorithm for end-to-end segmentation of multiple non-exclusive classes with incomplete ground truth, plus graph-based post-processing that enforces connectivity between eyes and optic chiasm.
If this is right
- Segmentations can be generated for all eight structures even when training data lack labels for some of them.
- The graph post-processing guarantees that segmented optic nerves remain connected from eye to chiasm.
- Quantitative surface distances stay below 0.7 mm for every structure tested.
- 96 percent of outputs on an independent set pass direct clinical review for radiotherapy planning.
Where Pith is reading between the lines
- The same training procedure could be applied to other imaging modalities such as CT where label availability also varies.
- Extending the graph constraint to additional anatomical rules might further reduce implausible segmentations in other regions.
- Integration into clinical software could reduce the manual contouring time currently required for radiotherapy planning.
- Testing the method on larger multi-center datasets would reveal whether performance holds across different scanners and protocols.
Load-bearing premise
The procedure for training networks when ground-truth labels are missing for some classes does not introduce bias into the learned model.
What would settle it
A controlled comparison of radiotherapy dose plans computed from the automatic segmentations versus the manual ground-truth contours on the same patient cohort, checking whether any clinically relevant differences in dose to organs at risk appear.
Figures
read the original abstract
Planning of radiotherapy involves accurate segmentation of a large number of organs at risk, i.e. organs for which irradiation doses should be minimized to avoid important side effects of the therapy. We propose a deep learning method for segmentation of organs at risk inside the brain region, from Magnetic Resonance (MR) images. Our system performs segmentation of eight structures: eye, lens, optic nerve, optic chiasm, pituitary gland, hippocampus, brainstem and brain. We propose an efficient algorithm to train neural networks for an end-to-end segmentation of multiple and non-exclusive classes, addressing problems related to computational costs and missing ground truth segmentations for a subset of classes. We enforce anatomical consistency of the result in a postprocessing step, in particular we introduce a graph-based algorithm for segmentation of the optic nerves, enforcing the connectivity between the eyes and the optic chiasm. We report cross-validated quantitative results on a database of 44 contrast-enhanced T1-weighted MRIs with provided segmentations of the considered organs at risk, which were originally used for radiotherapy planning. In addition, the segmentations produced by our model on an independent test set of 50 MRIs are evaluated by an experienced radiotherapist in order to qualitatively assess their accuracy. The mean distances between produced segmentations and the ground truth ranged from 0.1 mm to 0.7 mm across different organs. A vast majority (96 %) of the produced segmentations were found acceptable for radiotherapy planning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a CNN-based method for segmenting eight organs at risk (eye, lens, optic nerve, optic chiasm, pituitary gland, hippocampus, brainstem, brain) in contrast-enhanced T1-weighted brain MRIs for radiotherapy planning. It introduces an efficient training procedure for end-to-end multi-class segmentation with partial/missing ground-truth labels and a graph-based post-processing algorithm to enforce anatomical connectivity (especially for optic nerves). Quantitative results (mean surface distances 0.1–0.7 mm) are reported via cross-validation on a 44-image database; qualitative acceptability (96%) is assessed by one radiotherapist on an independent 50-image test set.
Significance. If the performance claims hold under more rigorous evaluation, the work could reduce manual segmentation effort in radiotherapy while adding anatomical consistency via the graph post-processing step. The handling of missing labels during training is a practical contribution for multi-organ tasks. The combination of CNN segmentation with explicit connectivity enforcement is a clear strength.
major comments (2)
- [Abstract / Evaluation] Abstract and § on independent test-set evaluation: the central clinical claim that 96% of segmentations on the 50-image test set are 'acceptable for radiotherapy planning' rests on single-rater qualitative judgment with no reported inter-rater agreement, blinding protocol, or quantitative reference comparison. This is load-bearing for the radiotherapy-utility conclusion and undermines verifiability of the result.
- [Methods (training procedure)] Methods section describing the efficient training algorithm for missing ground-truth labels: the procedure is presented as bias-free, yet no ablation or sensitivity analysis is shown to confirm that the learned model is unaffected by the partial-label scheme; this directly affects the reported cross-validation distances.
minor comments (2)
- [Abstract] Abstract: state the number of cross-validation folds, whether the mean surface distances include standard deviations or ranges per organ, and the exact definition of 'acceptability' used by the radiotherapist.
- [Results] Figure captions and results tables: ensure all quantitative metrics are accompanied by the number of samples and any statistical tests performed.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract / Evaluation] Abstract and § on independent test-set evaluation: the central clinical claim that 96% of segmentations on the 50-image test set are 'acceptable for radiotherapy planning' rests on single-rater qualitative judgment with no reported inter-rater agreement, blinding protocol, or quantitative reference comparison. This is load-bearing for the radiotherapy-utility conclusion and undermines verifiability of the result.
Authors: We agree that reliance on a single-rater qualitative assessment without reported inter-rater agreement, blinding details, or quantitative reference comparisons is a limitation that affects the strength of the clinical utility claim. In the revised manuscript we will expand the methods and results sections to fully describe the evaluation protocol (including that it was performed by one experienced radiotherapist), explicitly qualify the 96% figure as a single-rater judgment, and add a dedicated limitations paragraph discussing the absence of multi-rater metrics and blinding. No additional inter-rater or blinded data were collected in the original study, so we cannot supply them; the revision will therefore focus on transparent qualification of the existing result rather than new experiments. revision: yes
-
Referee: [Methods (training procedure)] Methods section describing the efficient training algorithm for missing ground-truth labels: the procedure is presented as bias-free, yet no ablation or sensitivity analysis is shown to confirm that the learned model is unaffected by the partial-label scheme; this directly affects the reported cross-validation distances.
Authors: The loss is computed exclusively on voxels with available ground-truth labels for each class, which is intended to prevent the missing-label scheme from biasing the model. We nevertheless recognize that an empirical ablation would provide stronger evidence. In the revised manuscript we will add an ablation study that retrains the model on the subset of images with complete labels for all eight structures and compares the resulting cross-validation surface distances against the original partial-label training; this will directly address whether the reported distances are affected by the training procedure. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper describes an empirical CNN-based segmentation pipeline evaluated via standard cross-validation on 44 images (yielding surface distances) and independent qualitative review on 50 images (yielding the 96% acceptability figure). No equations, parameters, or claims reduce by construction to fitted inputs or self-citations; the training procedure for missing labels and the graph post-processing step are presented as methodological choices without self-referential definitions or renamed known results. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- CNN weights and hyperparameters
axioms (1)
- domain assumption Convolutional neural networks can be trained to produce accurate segmentations from labeled medical images even with partial annotations
Reference graph
Works this paper leans on
-
[1]
Alchatzidis, S., Sotiras, A., and Paragios, N. (2015). Local atlas selection for discrete multi-atlas segmentation. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI) , pages 363–367. IEEE
work page 2015
-
[2]
Andres, E. A., Fidon, L., Vakalopoulou, M., No¨ el, G., Niyoteka, S., Benzazon, N., Deutsch, E., Paragios, N., and Robert, C. (2019). PO- 1002 Pseudo Computed Tomography generation using 3D deep learning– Application to brain radiotherapy. Radiotherapy and Oncology, 133:S553
work page 2019
-
[3]
Argiris, A., Karamouzis, M. V., Raben, D., and Ferris, R. L. (2008). Head and neck cancer. The Lancet, 371(9625):1695–1709
work page 2008
-
[4]
Bauer, S., Wiest, R., Nolte, L.-P., and Reyes, M. (2013). A survey of MRI-based medical image analysis for brain tumor studies. Physics in medicine and biology , 58(13):R97
work page 2013
-
[5]
Benmansour, F. and Cohen, L. D. (2011). Tubular structure segmenta- tion based on minimal path method and anisotropic enhancement. Inter- national Journal of Computer Vision , 92(2):192–210
work page 2011
-
[6]
Bondiau, P.-Y., Malandain, G., Chanalet, S., Marcy, P.-Y., Habrand, J.- L., Fauchon, F., Paquis, P., Courdi, A., Commowick, O., Rutten, I., et al. (2005). Atlas-based automatic segmentation of MR images: validation study on the brainstem in radiotherapy context. International Journal of Radiation Oncology* Biology* Physics , 61(1):289–298
work page 2005
-
[7]
Brosch, T., Peters, J., Groth, A., Stehle, T., and Weese, J. (2018). Deep learning-based boundary detection for model-based segmentation with application to MR prostate segmentation. In International Confer- ence on Medical Image Computing and Computer-Assisted Intervention , pages 515–522. Springer
work page 2018
-
[8]
Brouwer, C. L., Steenbakkers, R. J., van den Heuvel, E., Duppen, J. C., Navran, A., Bijl, H. P., Chouvalova, O., Burlage, F. R., Meertens, H., Langendijk, J. A., et al. (2012). 3D variation in delineation of head and neck organs at risk. Radiation Oncology, 7(1):32
work page 2012
-
[9]
A., Vigorito, S., Morra, A., Dell’Acqua, V., Diaz, F
Ciardo, D., Gerardi, M. A., Vigorito, S., Morra, A., Dell’Acqua, V., Diaz, F. J., Cattani, F., Zaffino, P., Ricotti, R., Spadea, M. F., et al. (2017). 35 Atlas-based segmentation in breast cancer radiotherapy: evaluation of spe- cific and generic-purpose atlases. The Breast, 32:44–52
work page 2017
-
[10]
3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation
C ¸ i¸ cek,¨O., Abdulkadir, A., Lienkamp, S. S., Brox, T., and Ronneberger, O. (2016). 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. arXiv preprint arXiv:1606.06650
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[11]
Cohen, L. D. and Kimmel, R. (1997). Global minimum for active contour models: A minimal path approach. International journal of computer vision, 24(1):57–78
work page 1997
-
[12]
Commowick, O., Gr´ egoire, V., and Malandain, G. (2008). Atlas-based delineation of lymph node levels in head and neck computed tomography images. Radiotherapy and Oncology, 87(2):281–289
work page 2008
-
[13]
Commowick, O., Warfield, S. K., and Malandain, G. (2009). Using Frankensteins creature paradigm to build a patient specific atlas. In Inter- national Conference on Medical Image Computing and Computer-Assisted Intervention, pages 993–1000. Springer
work page 2009
-
[14]
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2009). Introduction to algorithms . MIT press
work page 2009
-
[15]
Criminisi, A., Robertson, D., Konukoglu, E., Shotton, J., Pathak, S., White, S., and Siddiqui, K. (2013). Regression forests for efficient anatomy detection and localization in computed tomography scans. Medical image analysis, 17(8):1293–1303
work page 2013
-
[16]
Criminisi, A., Shotton, J., Robertson, D., and Konukoglu, E. (2010). Regression forests for efficient anatomy detection and localization in CT studies. In International MICCAI Workshop on Medical Computer Vision , pages 106–117. Springer
work page 2010
-
[17]
Deschamps, T. and Cohen, L. D. (2001). Fast extraction of minimal paths in 3D images and applications to virtual endoscopy. Medical image analysis, 5(4):281–299
work page 2001
-
[18]
Ecabert, O., Peters, J., Schramm, H., Lorenz, C., von Berg, J., Walker, M. J., Vembar, M., Olszewski, M. E., Subramanyan, K., Lavi, G., et al. (2008). Automatic model-based segmentation of the heart in CT images. IEEE transactions on medical imaging , 27(9):1189–1201. 36
work page 2008
-
[19]
Fedorov, A., Beichel, R., Kalpathy-Cramer, J., Finet, J., Fillion-Robin, J.-C., Pujol, S., Bauer, C., Jennings, D., Fennessy, F., Sonka, M., et al. (2012). 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magnetic resonance imaging , 30(9):1323–1341
work page 2012
-
[20]
Folk, M., Heber, G., Koziol, Q., Pourmal, E., and Robinson, D. (2011). An overview of the HDF5 technology suite and its applications. In Pro- ceedings of the EDBT/ICDT 2011 Workshop on Array Databases , pages 36–47. ACM
work page 2011
-
[21]
Gauriau, R., Cuingnet, R., Lesage, D., and Bloch, I. (2015). Multi- organ localization with cascaded global-to-local regression and shape prior. Medical image analysis , 23(1):70–83
work page 2015
-
[22]
Ibragimov, B. and Xing, L. (2017). Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks. Medical physics, 44(2):547–557
work page 2017
-
[23]
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[24]
K¨ allman, P.,˚Agren, A., and Brahme, A. (1992). Tumour and normal tissue responses to fractionated non-uniform dose delivery. International journal of radiation biology , 62(2):249–262
work page 1992
-
[25]
Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation
Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane, A. D., Menon, D. K., Rueckert, D., and Glocker, B. (2016). Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation. arXiv preprint arXiv:1603.05959
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[26]
Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane, A. D., Menon, D. K., Rueckert, D., and Glocker, B. (2017). Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis , 36:61–78
work page 2017
-
[27]
Kodym, O., ˇSpanˇ el, M., and Herout, A. (2018). Segmentation of Head and Neck Organs at Risk Using CNN with Batch Dice Loss. In German Conference on Pattern Recognition , pages 105–114. Springer. 37
work page 2018
-
[28]
Larsson, M., Zhang, Y., and Kahl, F. (2018). Robust abdominal organ segmentation using regional convolutional neural networks. Applied Soft Computing, 70:465–471
work page 2018
-
[29]
Law, M. Y. and Liu, B. (2009). DICOM-RT and its utilization in radi- ation therapy. Radiographics, 29(3):655–667
work page 2009
-
[30]
LeCun, Y., Bengio, Y., et al. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks , 3361(10):1995
work page 1995
-
[31]
Levin, W., Kooy, H., Loeffler, J., and DeLaney, T. (2005). Proton beam therapy. British journal of Cancer , 93(8):849
work page 2005
-
[32]
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully convolutional net- works for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3431–3440
work page 2015
-
[33]
Men, K., Geng, H., Cheng, C., Zhong, H., Huang, M., Fan, Y., Plas- taras, J. P., Lin, A., and Xiao, Y. (2019). More accurate and efficient segmentation of organs-at-risk in radiotherapy with convolutional neural networks cascades. Medical physics, 46(1):286–292
work page 2019
-
[34]
Mildenberger, P., Eichelberg, M., and Martin, E. (2002). Introduction to the DICOM standard. European radiology, 12(4):920–927
work page 2002
- [35]
-
[36]
Mlynarski, P., Delingette, H., Criminisi, A., and Ayache, N. (2019). 3D convolutional neural networks for tumor segmentation using long-range 2D context. Computerized Medical Imaging and Graphics , 73:60–72
work page 2019
-
[37]
Myronenko, A. (2018). 3D MRI brain tumor segmentation using au- toencoder regularization. In International MICCAI Brainlesion Workshop , pages 311–320. Springer
work page 2018
-
[38]
Nikolov, S., Blackwell, S., Mendes, R., De Fauw, J., Meyer, C., Hughes, C., Askham, H., Romera-Paredes, B., Karthikesalingam, A., Chu, C., et al. (2018). Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. arXiv preprint arXiv:1809.04430 . 38
-
[39]
Orasanu, E., Brosch, T., Glide-Hurst, C., and Renisch, S. (2018). Organ- At-Risk Segmentation in Brain MRI Using Model-Based Segmentation: Benefits of Deep Learning-Based Boundary Detectors. In International Workshop on Shape in Medical Imaging , pages 291–299. Springer
work page 2018
-
[40]
Parisot, S., Wells III, W., Chemouny, S., Duffau, H., and Paragios, N. (2014). Concurrent tumor segmentation and registration with uncertainty- based sparse non-uniform graphs. Medical image analysis , 18(4):647–659
work page 2014
-
[41]
Pinter, C., Lasso, A., Wang, A., Jaffray, D., and Fichtinger, G. (2012). SlicerRT: radiation therapy research toolkit for 3D Slicer. Medical physics, 39(10):6332–6338
work page 2012
-
[42]
Ramus, L. and Malandain, G. (2010). Assessing selection methods in the context of multi-atlas based segmentation. In 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro , pages 1321–
work page 2010
-
[43]
Ramus, L., Malandain, G., et al. (2010). Multi-atlas based segmenta- tion: Application to the head and neck region for radiotherapy planning. In MICCAI Workshop Medical Image Analysis for the Clinic-A Grand Challenge, pages 281–288
work page 2010
-
[44]
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention , pages 234–241. Springer
work page 2015
-
[45]
Hierarchical 3D fully convolutional networks for multi-organ segmentation
Roth, H. R., Oda, H., Hayashi, Y., Oda, M., Shimizu, N., Fujiwara, M., Misawa, K., and Mori, K. (2017). Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv preprint arXiv:1704.06382
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[46]
Sethian, J. A. (1999). Fast marching methods. SIAM review, 41(2):199– 235
work page 1999
-
[47]
P., Merchant, S., and Awate, S
Shah, M. P., Merchant, S., and Awate, S. P. (2018). MS-Net: Mixed- supervision fully-convolutional networks for full-resolution segmentation. In International Conference on Medical Image Computing and Computer- Assisted Intervention, pages 379–387. Springer. 39
work page 2018
-
[48]
Tong, N., Gou, S., Yang, S., Ruan, D., and Sheng, K. (2018). Fully au- tomatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks. Medical physics, 45(10):4558–4567
work page 2018
-
[49]
Vos, T., Allen, C., Arora, M., Barber, R. M., Bhutta, Z. A., Brown, A., Carter, A., Casey, D. C., Charlson, F. J., Chen, A. Z., et al. (2016). Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. The Lancet, 388(100...
work page 2016
-
[50]
Wang, G., Li, W., Ourselin, S., and Vercauteren, T. (2017). Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. arXiv preprint arXiv:1709.00382
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [51]
-
[52]
Warfield, S. K., Zou, K. H., and Wells, W. M. (2004). Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE transactions on medical imaging , 23(7):903
work page 2004
-
[53]
Zana, F. and Klein, J.-C. (2001). Segmentation of vessel-like patterns using mathematical morphology and curvature evaluation. IEEE transac- tions on image processing , 10(7):1010–1019
work page 2001
-
[54]
Zhan, F. B. and Noon, C. E. (1998). Shortest path algorithms: an evaluation using real road networks. Transportation science, 32(1):65–73
work page 1998
-
[55]
Zhu, W., Huang, Y., Zeng, L., Chen, X., Liu, Y., Qian, Z., Du, N., Fan, W., and Xie, X. (2019). AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Medical physics, 46(2):576–589. 40
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.