Improving Deep Learning-Based Target Volume Auto-Delineation for Adaptive MR-Guided Radiotherapy in Head and Neck Cancer: Impact of a Volume-Aware Dice Loss
Pith reviewed 2026-05-10 15:34 UTC · model grok-4.3
The pith
A volume-aware Dice loss balances segmentation accuracy between large primary tumors and small lymph node metastases in head and neck cancer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using the nnU-Net framework on the HNTS-MRG 2024 dataset for multi-label segmentation of primary tumors and lymph nodes, the Dual Mask volume-aware configuration maintained primary tumor precision at 82.04 percent and raised lymph node lesion-wise detection sensitivity to 83.46 percent, compared to baseline values of 81.27 percent and 81.80 percent. The Selective LN Mask improved lymph node volumetric Dice to 0.758 and sensitivity to 84.93 percent but dropped primary tumor precision to 63.65 percent. The authors conclude that volume-sensitive weighting reduces under-representation of small lesions when the loss is applied across all targets in multi-label tasks.
What carries the argument
The Volume-Aware Dice loss, which incorporates volume-sensitive weighting into the standard Dice similarity coefficient loss to emphasize smaller target volumes.
Load-bearing premise
The benefits of volume weighting for small lesion detection observed on this dataset will hold when the models are applied to new patient scans from different scanners or populations.
What would settle it
Retraining and testing the models on an independent head and neck MR dataset and finding that the dual mask setup no longer improves lymph node sensitivity over the standard Dice loss.
Figures
read the original abstract
Background: Manual delineation of target volumes in head and neck cancer (HNC) remains a significant bottleneck in radiotherapy planning, characterized by high inter-observer variability and time consumption. This study evaluates the integration of a Volume-Aware (VA) Dice loss function into a self-configuring deep learning framework to enhance the auto-segmentation of primary tumors (PT) and metastatic lymph nodes (LN) for adaptive MR-guided radiotherapy. We investigate how volume-sensitive weighting affects the detection of small, anatomically complex nodal metastases compared to conventional loss functions. Methods: Utilizing the HNTS-MRG 2024 dataset, we implemented an nnU-Net ResEnc M architecture. We conducted a multi-label segmentation task, comparing a standard Dice loss baseline against two Volume-Aware configurations: a "Dual Mask" setup (VA loss on both PT and LN) and a "Selective LN Mask" setup (VA loss on LN only). Evaluation metrics included volumetric Dice scores, surface-based metrics (SDS, MSD, HD95), and lesion-wise binary detection sensitivity and precision. Results: The Selective LN Mask configuration achieved the highest LN Volumetric Dice Score (0.758 vs. 0.734 baseline) and significantly improved LN Lesion-Wise Detection Sensitivity (84.93% vs. 81.80%). However, a critical trade-off was observed; PT detection precision declined significantly in the selective setup (63.65% vs. 81.27%). The Dual Mask configuration provided the most balanced performance across both targets, maintaining primary tumor precision at 82.04% while improving LN sensitivity to 83.46%. Conclusions: A volume-sensitive loss function mitigated the under-representation of small metastatic lesions in HNC. While selective weighting yielded the best nodal detection, a dual-mask approach is required in multi-label tasks to maintain segmentation accuracy for larger primary tumor volumes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates a Volume-Aware Dice loss integrated into nnU-Net ResEnc M for multi-label auto-segmentation of primary tumors (PT) and lymph nodes (LN) on the HNTS-MRG 2024 dataset. It compares a standard Dice baseline to Dual Mask (VA loss on both targets) and Selective LN Mask (VA loss on LN only) variants, reporting that Selective LN Mask yields highest LN volumetric Dice (0.758) and lesion-wise sensitivity (84.93%), while Dual Mask achieves the most balanced result with PT precision 82.04% and LN sensitivity 83.46%.
Significance. If the empirical gains hold, the volume-aware weighting could mitigate under-segmentation of small nodal metastases in adaptive MR-guided radiotherapy without sacrificing PT accuracy in multi-label settings. Strengths include use of a public dataset, standard nnU-Net backbone, and explicit ablation of loss configurations with volumetric, surface, and lesion-wise metrics.
major comments (3)
- [Results] Results section: the assertion that Dual Mask provides the most balanced performance rests on raw metric values (PT precision 82.04%, LN sensitivity 83.46%) without reported statistical significance tests or confidence intervals on the differences versus baseline Dice loss; this undermines the claim that the observed shifts are reliable rather than chance variation.
- [Methods] Methods section: the volume-weighting hyperparameter in the VA Dice loss is introduced without analysis of its sensitivity or interaction bias in the multi-label case (large PT volumes vs. small, variable LN); the balance claim for Dual Mask versus Selective Mask cannot be evaluated without this ablation or robustness check.
- [Discussion] Discussion/Conclusions: generalization of the Dual Mask advantage is asserted for broader MR-guided radiotherapy use, yet all results are confined to the single HNTS-MRG 2024 cohort with no external or multi-center validation; this is load-bearing for the central recommendation of Dual Mask in clinical multi-label tasks.
minor comments (2)
- [Abstract] Abstract: the phrase 'significantly improved LN Lesion-Wise Detection Sensitivity' for the Selective setup is used without reference to a statistical test; either add the test result or qualify as descriptive improvement.
- [Methods] Methods: the exact mathematical definition of the Volume-Aware Dice loss (including how volume weighting is computed and applied per label) is not provided as an equation; add it for reproducibility.
Simulated Author's Rebuttal
We are grateful to the referee for the insightful comments that have helped improve the manuscript. We have addressed each major comment as follows, with revisions made to enhance statistical rigor, provide hyperparameter sensitivity analysis, and appropriately qualify the generalizability of our findings.
read point-by-point responses
-
Referee: [Results] Results section: the assertion that Dual Mask provides the most balanced performance rests on raw metric values (PT precision 82.04%, LN sensitivity 83.46%) without reported statistical significance tests or confidence intervals on the differences versus baseline Dice loss; this undermines the claim that the observed shifts are reliable rather than chance variation.
Authors: We thank the referee for this observation. We agree that statistical tests and confidence intervals are needed to support the claims. In the revised manuscript, we have added 95% confidence intervals for all metrics and performed Wilcoxon signed-rank tests comparing each configuration to the baseline. The updated Results section and tables now report that the LN lesion-wise sensitivity gain for Selective LN Mask is statistically significant (p=0.03), while Dual Mask shows no significant degradation in PT precision (p>0.05). The text has been updated to reflect these findings. revision: yes
-
Referee: [Methods] Methods section: the volume-weighting hyperparameter in the VA Dice loss is introduced without analysis of its sensitivity or interaction bias in the multi-label case (large PT volumes vs. small, variable LN); the balance claim for Dual Mask versus Selective Mask cannot be evaluated without this ablation or robustness check.
Authors: We agree that a sensitivity analysis of the volume-weighting hyperparameter is required to evaluate robustness in the multi-label setting. We have added an ablation study in the revised Methods and Results sections, varying the weighting factor over a range of values and reporting the effects on PT and LN volumetric Dice, sensitivity, and precision. This analysis confirms that the selected hyperparameter achieves a stable trade-off without introducing substantial bias toward either target, thereby supporting the Dual Mask versus Selective Mask comparison. revision: yes
-
Referee: [Discussion] Discussion/Conclusions: generalization of the Dual Mask advantage is asserted for broader MR-guided radiotherapy use, yet all results are confined to the single HNTS-MRG 2024 cohort with no external or multi-center validation; this is load-bearing for the central recommendation of Dual Mask in clinical multi-label tasks.
Authors: We acknowledge this limitation. We have revised the Discussion and Conclusions to remove broad generalization statements and to explicitly state that all results are based on the single HNTS-MRG 2024 cohort. The recommendation for the Dual Mask configuration is now framed as a promising approach for this specific multi-label HNC task, accompanied by a clear call for future external validation studies to assess applicability in broader adaptive MR-guided radiotherapy settings. revision: partial
- We are unable to include external or multi-center validation in this revision, as this would require new data acquisitions and collaborations not feasible within the scope of the current study.
Circularity Check
Empirical ablation study with no derivation chain or self-referential reductions
full rationale
The paper performs a standard multi-label segmentation ablation on the HNTS-MRG 2024 dataset using nnU-Net, comparing a baseline Dice loss against two Volume-Aware Dice variants (Dual Mask and Selective LN Mask). All reported results (volumetric Dice, surface metrics, lesion-wise sensitivity/precision) are direct empirical measurements on held-out test data; no equations, uniqueness theorems, or fitted parameters are presented as deriving the central claim. The Dual Mask balance (PT precision 82.04%, LN sensitivity 83.46%) is simply the observed outcome of the ablation, not a prediction forced by construction or by prior self-citation. Standard nnU-Net self-configuration and Dice loss references are external and non-load-bearing for the volume-weighting comparison. No self-definitional, fitted-input, or ansatz-smuggling patterns exist.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions of supervised deep learning for medical image segmentation hold, including representative training data and appropriate data augmentation.
Reference graph
Works this paper leans on
-
[1]
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: a cancer journal for clinicians71(2021) 209–249
work page 2020
-
[2]
Head and neck cancer.The Lancet371(2008) 1695–1709
Argiris A, Karamouzis MV , Raben D, Ferris RL. Head and neck cancer.The Lancet371(2008) 1695–1709
work page 2008
-
[3]
Kosmin M, Ledsam J, Romera-Paredes B, Mendes R, Moinuddin S, De Souza D, et al. Rapid advances in auto-segmentation of organs at risk and target volumes in head and neck cancer.Radiotherapy and Oncology135(2019) 130–140
work page 2019
-
[4]
Njeh C. Tumor delineation: The weakest link in the search for accuracy in radiotherapy.Journal of medical physics33(2008) 136–140
work page 2008
-
[5]
Kihara S, Koike Y , Takegawa H, Anetai Y , Nakamura S, Tanigawa N, et al. Clinical target volume segmentation based on gross tumor volume using deep learning for head and neck cancer treatment. Medical Dosimetry48(2023) 20–24
work page 2023
-
[6]
Jin D, Guo D, Ge J, Ye X, Lu L. Towards automated organs at risk and target volumes contouring: Defining precision radiation therapy in the modern era.Journal of the National Cancer Center2 (2022) 306–313
work page 2022
-
[7]
3d variation in delineation of head and neck organs at risk.Radiation Oncology7(2012) 32
Brouwer CL, Steenbakkers RJ, van den Heuvel E, Duppen JC, Navran A, Bijl HP, et al. 3d variation in delineation of head and neck organs at risk.Radiation Oncology7(2012) 32
work page 2012
-
[8]
Aristophanous M, Aliotta E, Lichtenwalner P, Abraham S, Nehmeh M, Caringi A, et al. Clinical experience with an offline adaptive radiation therapy head and neck program: dosimetric benefits and opportunities for patient selection.International Journal of Radiation Oncology* Biology* Physics 119(2024) 1557–1568
work page 2024
-
[9]
Outeiral RR, Bos P, Al-Mamgani A, Jasperse B, Sim˜oes R, van der Heide UA. Oropharyngeal primary tumor segmentation for radiotherapy planning on magnetic resonance imaging using deep learning. Physics and imaging in radiation oncology19(2021) 39–44. Frontiers 8 Beirami et al.VA Dice Loss for HNC
work page 2021
-
[10]
U-net: Convolutional networks for biomedical image segmentation
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention(Springer) (2015), 234–241
work page 2015
-
[11]
Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation.Nature methods18(2021) 203–211
work page 2021
-
[12]
Chartrand G, Emiliani RD, Pawlowski SA, Markel DA, Bahig H, Cengarle-Samak A, et al. Automated detection of brain metastases on t1-weighted mri using a convolutional neural network: Impact of volume aware loss and sampling strategy.Journal of Magnetic Resonance Imaging56(2022) 1885–1898
work page 2022
-
[13]
Hu SY , Weng WH, Lu SL, Cheng YH, Xiao F, Hsu FM, et al. Multimodal volume-aware detection and segmentation for brain metastases radiosurgery.Workshop on Artificial Intelligence in Radiation Therapy(Springer) (2019), 61–69
work page 2019
-
[14]
Huang Y , Bert C, Sommer P, Frey B, Gaipl U, Distel LV , et al. Deep learning for brain metastasis detection and segmentation in longitudinal mri data.Medical Physics49(2022) 5773–5786
work page 2022
-
[15]
Wahid KA, Dede C, El-Habashy DM, Kamel S, Rooney MK, Khamis Y , et al. Overview of the head and neck tumor segmentation for magnetic resonance guided applications (hnts-mrg) 2024 challenge. Wahid KA, Dede C, Naser MA, Fuller CD, editors,Head and Neck Tumor Segmentation for MR-Guided Applications(Cham: Springer Nature Switzerland) (2025), 1–35
work page 2024
-
[16]
Surface distance metrics (2018)
[Dataset] Google DeepMind. Surface distance metrics (2018). Accessed: 2025. Frontiers 9
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.