Deep convolution neural network model for automatic risk assessment of patients with non-metastatic nasopharyngeal carcinoma
Pith reviewed 2026-05-24 15:00 UTC · model grok-4.3
The pith
A deep convolutional neural network predicts 3-year disease progression in non-metastatic nasopharyngeal carcinoma patients from MRI scans.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors' modified network outperforms baseline VNet with only T1C input and networks without T and overall stage classification, reaching an AUC of 0.828 for 3-year disease progression classification on the validation set while performing automatic segmentation as a first step.
What carries the argument
A modified VNet architecture that processes T1C and T2 MRI scans together for tumor segmentation and incorporates T-stage and overall stage classification to support the final progression risk prediction.
If this is right
- The model eliminates the need for manual region of interest segmentation by clinicians.
- Combining segmentation and classification in one network improves results compared to using only T1C or omitting stage info.
- Automatic risk assessment could help in stratifying patients for different follow-up intensities after radiotherapy.
- Deep learning approaches may provide pretreatment prognosis that is more relevant than current staging systems for NPC.
Where Pith is reading between the lines
- Expanding the training data to include more centers could improve performance on external datasets.
- The framework might be tested on predicting other clinical outcomes such as distant metastasis or overall survival.
- Similar dual-modality input strategies could be explored for risk assessment in other types of cancer using MRI.
Load-bearing premise
The assumption that performance on data from the training centers will hold when the model is applied to data from a completely different center.
What would settle it
Evaluating the trained model on a large independent cohort from a new center and obtaining an AUC near or below 0.65 would show that the method does not generalize well enough for reliable use.
read the original abstract
Nasopharyngeal Carcinoma (NPC) is endemic cancer in the south-east Asia. With the advent of intensity-modulated radiotherapy excellent locoregional control are being achieved. Consequently, this had led to pretreatment clinical staging classification to be less prognostic of outcomes such as recurrence after treatment. Alternative pretreatment strategies for prognosis of NPC after treatment are needed to provide better risk stratification for NPC. In this study we proposed a deep convolution neural network model based on contrast-enhanced T1 (T1C) and T2 weighted (T2) MRI scan to predict 3-year disease progression of NPC patient after primary treatment. We retrospective obtained 596 non-metastatic NPC patients from four independent centres in Hong Kong and China. Our model first performs a segmentation of the primary NPC tumour to localise the tumour, and then uses the segmentation mask as prior knowledge along with the T1C and T2 scan to classify 3-year disease progression. For segmentation, we adapted and modified a VNet to encode both T1C and T2 scan and also encoding to classify T and overall stage classification. Our modified network performed better than baseline VNet with T1C and network with no T and overall classification. The classification result for 3-year disease progression achieved an AUC of 0.828 in the validation set but did not generalised well for the test set which consist of 146 patients from a different centre to the training data (AUC = 0.69). Our preliminary results show that deep learning may offer prognostication of disease progression of NPC patients after treatment. One advantage of our model is that it does not require manual segmentation of the region of interest, hence reducing clinician's burden. Further development in generalising multicentre data set are needed before clinical application of deep learning models in assessment of NPC.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a modified VNet architecture that performs joint segmentation of primary NPC tumors from multi-modal T1C+T2 MRI and incorporates T-stage/overall-stage encoding to classify 3-year disease progression. Trained on 596 patients from four centers, the model outperforms baseline VNet variants internally (validation AUC 0.828) but shows degraded performance on an external test set of 146 patients from a different center (AUC 0.69). The authors position the work as preliminary evidence that deep learning can provide automatic prognostication without manual ROI delineation.
Significance. A robust multi-center model for NPC risk stratification without manual segmentation would be clinically relevant given the limitations of traditional staging. The multi-center data collection is a strength, but the sharp external-test degradation indicates that the reported internal result does not yet establish practical utility or generalizability. The work therefore has modest significance in its current form.
major comments (2)
- [Abstract / Results] Abstract and Results: The AUC drop from 0.828 (internal validation) to 0.69 (external test set drawn from a different centre) directly undermines the central claim that the model can deliver automatic risk assessment suitable for clinical use. Because the manuscript itself collected multi-centre data, this degradation is load-bearing evidence against robustness; the authors correctly flag the need for further generalisation work, but the current numbers do not support the title's assertion of an 'automatic risk assessment' tool.
- [Methods / Results] Methods / Results: No error bars, confidence intervals, or statistical comparison against the reported baseline VNet variants are provided for either the segmentation or classification tasks. Without these, it is impossible to judge whether the claimed superiority of the modified network (T1C+T2 plus stage encoding) is reliable or merely numerical.
minor comments (2)
- [Abstract] Abstract: Minor grammatical issues ('excellent locoregional control are being achieved', 'Further development in generalising multicentre data set are needed') should be corrected for clarity.
- [Methods] The manuscript would benefit from explicit reporting of the train/validation/test split sizes and any class imbalance handling for the 3-year progression endpoint.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results: The AUC drop from 0.828 (internal validation) to 0.69 (external test set drawn from a different centre) directly undermines the central claim that the model can deliver automatic risk assessment suitable for clinical use. Because the manuscript itself collected multi-centre data, this degradation is load-bearing evidence against robustness; the authors correctly flag the need for further generalisation work, but the current numbers do not support the title's assertion of an 'automatic risk assessment' tool.
Authors: We agree that the AUC drop on the external test set from a different center demonstrates limited generalizability and that the current results do not support positioning the model as suitable for clinical use. This is already noted in our discussion, but we accept that the abstract and title overstate the immediate applicability. We will revise the abstract to emphasize the preliminary nature of the findings and change the title to reflect an exploratory study rather than a clinical tool. revision: yes
-
Referee: [Methods / Results] Methods / Results: No error bars, confidence intervals, or statistical comparison against the reported baseline VNet variants are provided for either the segmentation or classification tasks. Without these, it is impossible to judge whether the claimed superiority of the modified network (T1C+T2 plus stage encoding) is reliable or merely numerical.
Authors: We acknowledge this omission. In the revised manuscript we will report 95% confidence intervals for all AUC and Dice scores. We will also add statistical comparisons (DeLong test for AUCs and appropriate tests for segmentation metrics) between the proposed model and the baseline VNet variants. revision: yes
Circularity Check
No circularity; standard empirical ML pipeline with explicit data splits
full rationale
The paper describes a supervised CNN (modified VNet) trained on retrospective multi-center MRI data to predict 3-year progression. It reports segmentation and classification performance on internal validation (AUC 0.828) and external test (AUC 0.69) sets with clear train/validation/test partitioning. No equations, parameters, or claims reduce to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The architecture is adapted from prior VNet work (external citation), and the performance drop on the held-out center is explicitly reported rather than hidden. The derivation chain is therefore the standard data-driven training/evaluation loop and contains no circular reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- VNet architecture hyperparameters
axioms (1)
- domain assumption Contrast-enhanced T1 and T2 MRI scans contain prognostic information for NPC progression
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.