pith. sign in

arxiv: 2604.10130 · v1 · submitted 2026-04-11 · 💻 cs.CV

Improving Deep Learning-Based Target Volume Auto-Delineation for Adaptive MR-Guided Radiotherapy in Head and Neck Cancer: Impact of a Volume-Aware Dice Loss

Pith reviewed 2026-05-10 15:34 UTC · model grok-4.3

classification 💻 cs.CV
keywords head and neck cancerauto-delineationvolume-aware dice losslymph node segmentationadaptive radiotherapynnU-NetMR-guided radiotherapy
0
0 comments X

The pith

A volume-aware Dice loss balances segmentation accuracy between large primary tumors and small lymph node metastases in head and neck cancer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study examines whether modifying the Dice loss to account for the size of target volumes can improve automatic outlining of both primary tumors and metastatic lymph nodes in head and neck MRI scans for radiotherapy. Researchers tested a standard Dice loss against versions that apply extra weight to smaller volumes, either only on nodes or on both targets. The version weighting both targets achieved better detection of small nodes while keeping precision high on larger tumors, unlike weighting nodes alone which hurt tumor performance. This addresses the problem that small structures are often missed in training because they contribute less to the overall loss.

Core claim

Using the nnU-Net framework on the HNTS-MRG 2024 dataset for multi-label segmentation of primary tumors and lymph nodes, the Dual Mask volume-aware configuration maintained primary tumor precision at 82.04 percent and raised lymph node lesion-wise detection sensitivity to 83.46 percent, compared to baseline values of 81.27 percent and 81.80 percent. The Selective LN Mask improved lymph node volumetric Dice to 0.758 and sensitivity to 84.93 percent but dropped primary tumor precision to 63.65 percent. The authors conclude that volume-sensitive weighting reduces under-representation of small lesions when the loss is applied across all targets in multi-label tasks.

What carries the argument

The Volume-Aware Dice loss, which incorporates volume-sensitive weighting into the standard Dice similarity coefficient loss to emphasize smaller target volumes.

Load-bearing premise

The benefits of volume weighting for small lesion detection observed on this dataset will hold when the models are applied to new patient scans from different scanners or populations.

What would settle it

Retraining and testing the models on an independent head and neck MR dataset and finding that the dual mask setup no longer improves lymph node sensitivity over the standard Dice loss.

Figures

Figures reproduced from arXiv: 2604.10130 by Ahmed Gomaa, Annette Schwarz, Florian Putz, Ishita Sheth, Juliane Szkitsak, Philipp Schubert, Pluvio Stephan, Sogand Beirami, Stefanie Corradini, Thomas Weissmann, Yixing Huang, Zahra Esmaeilzadeh.

Figure 1
Figure 1. Figure 1: Examples of better performance of the VA dice loss to the baseline time-intensive, often requiring up to three hours per case for an experienced radiation oncologist [4, 5, 6]. Moreover, HNC is characterized by complex anatomy and significant inter-observer variability (IOV), which can lead to inconsistencies in treatment delivery [7]. In modern radiotherapy, the transition toward personalized medicine has… view at source ↗
Figure 2
Figure 2. Figure 2: Segmentation masks produced using different configurations on the multi-label dataset. The visualization illustrates the influence of the Volume-Aware Dice loss under various settings. (a) Ground Truth: PT is shown in green, and LN in yellow. (b) Baseline Model: accurate segmentation for LN, but underperformance in PT segmentation. (c) Volume-Aware Dice Loss - Dual Mask Configuration: segmentation quality … view at source ↗
Figure 3
Figure 3. Figure 3: Plots showing volumetric and surface distance similarities for primary tumor and lymph node metastases across evaluated configurations [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Plots showing binary detection scores (Sensitivity and Precision) for primary tumor and lymph node metastasis across evaluated configurations. 4 DISCUSSION The results of this study highlight a fundamental ”precision-sensitivity trade-off” in multi-label HNC auto-segmentation. While Volume-Aware Dice loss successfully addresses small lesion detection, a critical requirement for selective nodal irradiation,… view at source ↗
Figure 5
Figure 5. Figure 5: A comprehensive visualization showcasing all evaluated configurations on a single patient. (a) Ground Truth: PT is displayed in green, and LN in yellow. (b, c, d): (c) achieving the best overall performance, exceeding both the base model (b) and the Volume-Aware Dice loss in selective LN mask approach (d). Arrow in ground truth mask: False positive lymph node metastasis corresponding to an anatomical struc… view at source ↗
read the original abstract

Background: Manual delineation of target volumes in head and neck cancer (HNC) remains a significant bottleneck in radiotherapy planning, characterized by high inter-observer variability and time consumption. This study evaluates the integration of a Volume-Aware (VA) Dice loss function into a self-configuring deep learning framework to enhance the auto-segmentation of primary tumors (PT) and metastatic lymph nodes (LN) for adaptive MR-guided radiotherapy. We investigate how volume-sensitive weighting affects the detection of small, anatomically complex nodal metastases compared to conventional loss functions. Methods: Utilizing the HNTS-MRG 2024 dataset, we implemented an nnU-Net ResEnc M architecture. We conducted a multi-label segmentation task, comparing a standard Dice loss baseline against two Volume-Aware configurations: a "Dual Mask" setup (VA loss on both PT and LN) and a "Selective LN Mask" setup (VA loss on LN only). Evaluation metrics included volumetric Dice scores, surface-based metrics (SDS, MSD, HD95), and lesion-wise binary detection sensitivity and precision. Results: The Selective LN Mask configuration achieved the highest LN Volumetric Dice Score (0.758 vs. 0.734 baseline) and significantly improved LN Lesion-Wise Detection Sensitivity (84.93% vs. 81.80%). However, a critical trade-off was observed; PT detection precision declined significantly in the selective setup (63.65% vs. 81.27%). The Dual Mask configuration provided the most balanced performance across both targets, maintaining primary tumor precision at 82.04% while improving LN sensitivity to 83.46%. Conclusions: A volume-sensitive loss function mitigated the under-representation of small metastatic lesions in HNC. While selective weighting yielded the best nodal detection, a dual-mask approach is required in multi-label tasks to maintain segmentation accuracy for larger primary tumor volumes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript evaluates a Volume-Aware Dice loss integrated into nnU-Net ResEnc M for multi-label auto-segmentation of primary tumors (PT) and lymph nodes (LN) on the HNTS-MRG 2024 dataset. It compares a standard Dice baseline to Dual Mask (VA loss on both targets) and Selective LN Mask (VA loss on LN only) variants, reporting that Selective LN Mask yields highest LN volumetric Dice (0.758) and lesion-wise sensitivity (84.93%), while Dual Mask achieves the most balanced result with PT precision 82.04% and LN sensitivity 83.46%.

Significance. If the empirical gains hold, the volume-aware weighting could mitigate under-segmentation of small nodal metastases in adaptive MR-guided radiotherapy without sacrificing PT accuracy in multi-label settings. Strengths include use of a public dataset, standard nnU-Net backbone, and explicit ablation of loss configurations with volumetric, surface, and lesion-wise metrics.

major comments (3)
  1. [Results] Results section: the assertion that Dual Mask provides the most balanced performance rests on raw metric values (PT precision 82.04%, LN sensitivity 83.46%) without reported statistical significance tests or confidence intervals on the differences versus baseline Dice loss; this undermines the claim that the observed shifts are reliable rather than chance variation.
  2. [Methods] Methods section: the volume-weighting hyperparameter in the VA Dice loss is introduced without analysis of its sensitivity or interaction bias in the multi-label case (large PT volumes vs. small, variable LN); the balance claim for Dual Mask versus Selective Mask cannot be evaluated without this ablation or robustness check.
  3. [Discussion] Discussion/Conclusions: generalization of the Dual Mask advantage is asserted for broader MR-guided radiotherapy use, yet all results are confined to the single HNTS-MRG 2024 cohort with no external or multi-center validation; this is load-bearing for the central recommendation of Dual Mask in clinical multi-label tasks.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'significantly improved LN Lesion-Wise Detection Sensitivity' for the Selective setup is used without reference to a statistical test; either add the test result or qualify as descriptive improvement.
  2. [Methods] Methods: the exact mathematical definition of the Volume-Aware Dice loss (including how volume weighting is computed and applied per label) is not provided as an equation; add it for reproducibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We are grateful to the referee for the insightful comments that have helped improve the manuscript. We have addressed each major comment as follows, with revisions made to enhance statistical rigor, provide hyperparameter sensitivity analysis, and appropriately qualify the generalizability of our findings.

read point-by-point responses
  1. Referee: [Results] Results section: the assertion that Dual Mask provides the most balanced performance rests on raw metric values (PT precision 82.04%, LN sensitivity 83.46%) without reported statistical significance tests or confidence intervals on the differences versus baseline Dice loss; this undermines the claim that the observed shifts are reliable rather than chance variation.

    Authors: We thank the referee for this observation. We agree that statistical tests and confidence intervals are needed to support the claims. In the revised manuscript, we have added 95% confidence intervals for all metrics and performed Wilcoxon signed-rank tests comparing each configuration to the baseline. The updated Results section and tables now report that the LN lesion-wise sensitivity gain for Selective LN Mask is statistically significant (p=0.03), while Dual Mask shows no significant degradation in PT precision (p>0.05). The text has been updated to reflect these findings. revision: yes

  2. Referee: [Methods] Methods section: the volume-weighting hyperparameter in the VA Dice loss is introduced without analysis of its sensitivity or interaction bias in the multi-label case (large PT volumes vs. small, variable LN); the balance claim for Dual Mask versus Selective Mask cannot be evaluated without this ablation or robustness check.

    Authors: We agree that a sensitivity analysis of the volume-weighting hyperparameter is required to evaluate robustness in the multi-label setting. We have added an ablation study in the revised Methods and Results sections, varying the weighting factor over a range of values and reporting the effects on PT and LN volumetric Dice, sensitivity, and precision. This analysis confirms that the selected hyperparameter achieves a stable trade-off without introducing substantial bias toward either target, thereby supporting the Dual Mask versus Selective Mask comparison. revision: yes

  3. Referee: [Discussion] Discussion/Conclusions: generalization of the Dual Mask advantage is asserted for broader MR-guided radiotherapy use, yet all results are confined to the single HNTS-MRG 2024 cohort with no external or multi-center validation; this is load-bearing for the central recommendation of Dual Mask in clinical multi-label tasks.

    Authors: We acknowledge this limitation. We have revised the Discussion and Conclusions to remove broad generalization statements and to explicitly state that all results are based on the single HNTS-MRG 2024 cohort. The recommendation for the Dual Mask configuration is now framed as a promising approach for this specific multi-label HNC task, accompanied by a clear call for future external validation studies to assess applicability in broader adaptive MR-guided radiotherapy settings. revision: partial

standing simulated objections not resolved
  • We are unable to include external or multi-center validation in this revision, as this would require new data acquisitions and collaborations not feasible within the scope of the current study.

Circularity Check

0 steps flagged

Empirical ablation study with no derivation chain or self-referential reductions

full rationale

The paper performs a standard multi-label segmentation ablation on the HNTS-MRG 2024 dataset using nnU-Net, comparing a baseline Dice loss against two Volume-Aware Dice variants (Dual Mask and Selective LN Mask). All reported results (volumetric Dice, surface metrics, lesion-wise sensitivity/precision) are direct empirical measurements on held-out test data; no equations, uniqueness theorems, or fitted parameters are presented as deriving the central claim. The Dual Mask balance (PT precision 82.04%, LN sensitivity 83.46%) is simply the observed outcome of the ablation, not a prediction forced by construction or by prior self-citation. Standard nnU-Net self-configuration and Dice loss references are external and non-load-bearing for the volume-weighting comparison. No self-definitional, fitted-input, or ansatz-smuggling patterns exist.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard supervised segmentation assumptions and empirical comparison; no new free parameters, axioms, or invented entities are introduced beyond the loss function variants.

axioms (1)
  • domain assumption Standard assumptions of supervised deep learning for medical image segmentation hold, including representative training data and appropriate data augmentation.
    Implicit in the use of nnU-Net on the HNTS-MRG 2024 dataset.

pith-pipeline@v0.9.0 · 5709 in / 1253 out tokens · 71297 ms · 2026-05-10T15:34:00.993630+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: a cancer journal for clinicians71(2021) 209–249

    Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: a cancer journal for clinicians71(2021) 209–249

  2. [2]

    Head and neck cancer.The Lancet371(2008) 1695–1709

    Argiris A, Karamouzis MV , Raben D, Ferris RL. Head and neck cancer.The Lancet371(2008) 1695–1709

  3. [3]

    Rapid advances in auto-segmentation of organs at risk and target volumes in head and neck cancer.Radiotherapy and Oncology135(2019) 130–140

    Kosmin M, Ledsam J, Romera-Paredes B, Mendes R, Moinuddin S, De Souza D, et al. Rapid advances in auto-segmentation of organs at risk and target volumes in head and neck cancer.Radiotherapy and Oncology135(2019) 130–140

  4. [4]

    Tumor delineation: The weakest link in the search for accuracy in radiotherapy.Journal of medical physics33(2008) 136–140

    Njeh C. Tumor delineation: The weakest link in the search for accuracy in radiotherapy.Journal of medical physics33(2008) 136–140

  5. [5]

    Clinical target volume segmentation based on gross tumor volume using deep learning for head and neck cancer treatment

    Kihara S, Koike Y , Takegawa H, Anetai Y , Nakamura S, Tanigawa N, et al. Clinical target volume segmentation based on gross tumor volume using deep learning for head and neck cancer treatment. Medical Dosimetry48(2023) 20–24

  6. [6]

    Towards automated organs at risk and target volumes contouring: Defining precision radiation therapy in the modern era.Journal of the National Cancer Center2 (2022) 306–313

    Jin D, Guo D, Ge J, Ye X, Lu L. Towards automated organs at risk and target volumes contouring: Defining precision radiation therapy in the modern era.Journal of the National Cancer Center2 (2022) 306–313

  7. [7]

    3d variation in delineation of head and neck organs at risk.Radiation Oncology7(2012) 32

    Brouwer CL, Steenbakkers RJ, van den Heuvel E, Duppen JC, Navran A, Bijl HP, et al. 3d variation in delineation of head and neck organs at risk.Radiation Oncology7(2012) 32

  8. [8]

    Aristophanous M, Aliotta E, Lichtenwalner P, Abraham S, Nehmeh M, Caringi A, et al. Clinical experience with an offline adaptive radiation therapy head and neck program: dosimetric benefits and opportunities for patient selection.International Journal of Radiation Oncology* Biology* Physics 119(2024) 1557–1568

  9. [9]

    Oropharyngeal primary tumor segmentation for radiotherapy planning on magnetic resonance imaging using deep learning

    Outeiral RR, Bos P, Al-Mamgani A, Jasperse B, Sim˜oes R, van der Heide UA. Oropharyngeal primary tumor segmentation for radiotherapy planning on magnetic resonance imaging using deep learning. Physics and imaging in radiation oncology19(2021) 39–44. Frontiers 8 Beirami et al.VA Dice Loss for HNC

  10. [10]

    U-net: Convolutional networks for biomedical image segmentation

    Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention(Springer) (2015), 234–241

  11. [11]

    nnu-net: a self-configuring method for deep learning-based biomedical image segmentation.Nature methods18(2021) 203–211

    Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation.Nature methods18(2021) 203–211

  12. [12]

    Chartrand G, Emiliani RD, Pawlowski SA, Markel DA, Bahig H, Cengarle-Samak A, et al. Automated detection of brain metastases on t1-weighted mri using a convolutional neural network: Impact of volume aware loss and sampling strategy.Journal of Magnetic Resonance Imaging56(2022) 1885–1898

  13. [13]

    Multimodal volume-aware detection and segmentation for brain metastases radiosurgery.Workshop on Artificial Intelligence in Radiation Therapy(Springer) (2019), 61–69

    Hu SY , Weng WH, Lu SL, Cheng YH, Xiao F, Hsu FM, et al. Multimodal volume-aware detection and segmentation for brain metastases radiosurgery.Workshop on Artificial Intelligence in Radiation Therapy(Springer) (2019), 61–69

  14. [14]

    Deep learning for brain metastasis detection and segmentation in longitudinal mri data.Medical Physics49(2022) 5773–5786

    Huang Y , Bert C, Sommer P, Frey B, Gaipl U, Distel LV , et al. Deep learning for brain metastasis detection and segmentation in longitudinal mri data.Medical Physics49(2022) 5773–5786

  15. [15]

    Overview of the head and neck tumor segmentation for magnetic resonance guided applications (hnts-mrg) 2024 challenge

    Wahid KA, Dede C, El-Habashy DM, Kamel S, Rooney MK, Khamis Y , et al. Overview of the head and neck tumor segmentation for magnetic resonance guided applications (hnts-mrg) 2024 challenge. Wahid KA, Dede C, Naser MA, Fuller CD, editors,Head and Neck Tumor Segmentation for MR-Guided Applications(Cham: Springer Nature Switzerland) (2025), 1–35

  16. [16]

    Surface distance metrics (2018)

    [Dataset] Google DeepMind. Surface distance metrics (2018). Accessed: 2025. Frontiers 9