Deep convolution neural network model for automatic risk assessment of patients with non-metastatic nasopharyngeal carcinoma

Ann D. King; Lujun Han; Peng Cao; Qiyong Ai; Richard Du; Varut Vardhanabhuti

arxiv: 1907.11861 · v1 · pith:GM2PRDJSnew · submitted 2019-07-27 · 📡 eess.IV

Deep convolution neural network model for automatic risk assessment of patients with non-metastatic nasopharyngeal carcinoma

Richard Du , Peng Cao , Lujun Han , Qiyong Ai , Ann D. King , Varut Vardhanabhuti This is my paper

Pith reviewed 2026-05-24 15:00 UTC · model grok-4.3

classification 📡 eess.IV

keywords nasopharyngeal carcinomadeep learningMRIdisease progressionautomatic segmentationrisk stratification

0 comments

The pith

A deep convolutional neural network predicts 3-year disease progression in non-metastatic nasopharyngeal carcinoma patients from MRI scans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a modified deep convolutional neural network based on VNet to automatically segment the primary tumor in contrast-enhanced T1 and T2 weighted MRI scans and then classify the risk of 3-year disease progression after treatment. The model encodes both MRI types and uses T and overall stage information to improve performance over baseline versions. It achieves an AUC of 0.828 on the validation set from the training centers but drops to 0.69 on an external test set from a different center. This work aims to offer an automatic method for better risk stratification beyond traditional clinical staging, without requiring manual tumor outlining by clinicians. The results suggest deep learning has potential for prognostication in NPC but requires further work on generalization across centers.

Core claim

The authors' modified network outperforms baseline VNet with only T1C input and networks without T and overall stage classification, reaching an AUC of 0.828 for 3-year disease progression classification on the validation set while performing automatic segmentation as a first step.

What carries the argument

A modified VNet architecture that processes T1C and T2 MRI scans together for tumor segmentation and incorporates T-stage and overall stage classification to support the final progression risk prediction.

If this is right

The model eliminates the need for manual region of interest segmentation by clinicians.
Combining segmentation and classification in one network improves results compared to using only T1C or omitting stage info.
Automatic risk assessment could help in stratifying patients for different follow-up intensities after radiotherapy.
Deep learning approaches may provide pretreatment prognosis that is more relevant than current staging systems for NPC.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Expanding the training data to include more centers could improve performance on external datasets.
The framework might be tested on predicting other clinical outcomes such as distant metastasis or overall survival.
Similar dual-modality input strategies could be explored for risk assessment in other types of cancer using MRI.

Load-bearing premise

The assumption that performance on data from the training centers will hold when the model is applied to data from a completely different center.

What would settle it

Evaluating the trained model on a large independent cohort from a new center and obtaining an AUC near or below 0.65 would show that the method does not generalize well enough for reliable use.

read the original abstract

Nasopharyngeal Carcinoma (NPC) is endemic cancer in the south-east Asia. With the advent of intensity-modulated radiotherapy excellent locoregional control are being achieved. Consequently, this had led to pretreatment clinical staging classification to be less prognostic of outcomes such as recurrence after treatment. Alternative pretreatment strategies for prognosis of NPC after treatment are needed to provide better risk stratification for NPC. In this study we proposed a deep convolution neural network model based on contrast-enhanced T1 (T1C) and T2 weighted (T2) MRI scan to predict 3-year disease progression of NPC patient after primary treatment. We retrospective obtained 596 non-metastatic NPC patients from four independent centres in Hong Kong and China. Our model first performs a segmentation of the primary NPC tumour to localise the tumour, and then uses the segmentation mask as prior knowledge along with the T1C and T2 scan to classify 3-year disease progression. For segmentation, we adapted and modified a VNet to encode both T1C and T2 scan and also encoding to classify T and overall stage classification. Our modified network performed better than baseline VNet with T1C and network with no T and overall classification. The classification result for 3-year disease progression achieved an AUC of 0.828 in the validation set but did not generalised well for the test set which consist of 146 patients from a different centre to the training data (AUC = 0.69). Our preliminary results show that deep learning may offer prognostication of disease progression of NPC patients after treatment. One advantage of our model is that it does not require manual segmentation of the region of interest, hence reducing clinician's burden. Further development in generalising multicentre data set are needed before clinical application of deep learning models in assessment of NPC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a modified VNet architecture that performs joint segmentation of primary NPC tumors from multi-modal T1C+T2 MRI and incorporates T-stage/overall-stage encoding to classify 3-year disease progression. Trained on 596 patients from four centers, the model outperforms baseline VNet variants internally (validation AUC 0.828) but shows degraded performance on an external test set of 146 patients from a different center (AUC 0.69). The authors position the work as preliminary evidence that deep learning can provide automatic prognostication without manual ROI delineation.

Significance. A robust multi-center model for NPC risk stratification without manual segmentation would be clinically relevant given the limitations of traditional staging. The multi-center data collection is a strength, but the sharp external-test degradation indicates that the reported internal result does not yet establish practical utility or generalizability. The work therefore has modest significance in its current form.

major comments (2)

[Abstract / Results] Abstract and Results: The AUC drop from 0.828 (internal validation) to 0.69 (external test set drawn from a different centre) directly undermines the central claim that the model can deliver automatic risk assessment suitable for clinical use. Because the manuscript itself collected multi-centre data, this degradation is load-bearing evidence against robustness; the authors correctly flag the need for further generalisation work, but the current numbers do not support the title's assertion of an 'automatic risk assessment' tool.
[Methods / Results] Methods / Results: No error bars, confidence intervals, or statistical comparison against the reported baseline VNet variants are provided for either the segmentation or classification tasks. Without these, it is impossible to judge whether the claimed superiority of the modified network (T1C+T2 plus stage encoding) is reliable or merely numerical.

minor comments (2)

[Abstract] Abstract: Minor grammatical issues ('excellent locoregional control are being achieved', 'Further development in generalising multicentre data set are needed') should be corrected for clarity.
[Methods] The manuscript would benefit from explicit reporting of the train/validation/test split sizes and any class imbalance handling for the 3-year progression endpoint.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: The AUC drop from 0.828 (internal validation) to 0.69 (external test set drawn from a different centre) directly undermines the central claim that the model can deliver automatic risk assessment suitable for clinical use. Because the manuscript itself collected multi-centre data, this degradation is load-bearing evidence against robustness; the authors correctly flag the need for further generalisation work, but the current numbers do not support the title's assertion of an 'automatic risk assessment' tool.

Authors: We agree that the AUC drop on the external test set from a different center demonstrates limited generalizability and that the current results do not support positioning the model as suitable for clinical use. This is already noted in our discussion, but we accept that the abstract and title overstate the immediate applicability. We will revise the abstract to emphasize the preliminary nature of the findings and change the title to reflect an exploratory study rather than a clinical tool. revision: yes
Referee: [Methods / Results] Methods / Results: No error bars, confidence intervals, or statistical comparison against the reported baseline VNet variants are provided for either the segmentation or classification tasks. Without these, it is impossible to judge whether the claimed superiority of the modified network (T1C+T2 plus stage encoding) is reliable or merely numerical.

Authors: We acknowledge this omission. In the revised manuscript we will report 95% confidence intervals for all AUC and Dice scores. We will also add statistical comparisons (DeLong test for AUCs and appropriate tests for segmentation metrics) between the proposed model and the baseline VNet variants. revision: yes

Circularity Check

0 steps flagged

No circularity; standard empirical ML pipeline with explicit data splits

full rationale

The paper describes a supervised CNN (modified VNet) trained on retrospective multi-center MRI data to predict 3-year progression. It reports segmentation and classification performance on internal validation (AUC 0.828) and external test (AUC 0.69) sets with clear train/validation/test partitioning. No equations, parameters, or claims reduce to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The architecture is adapted from prior VNet work (external citation), and the performance drop on the held-out center is explicitly reported rather than hidden. The derivation chain is therefore the standard data-driven training/evaluation loop and contains no circular reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions in medical imaging AI and the availability of retrospective multi-center data; no new entities invented.

free parameters (1)

VNet architecture hyperparameters
The network modifications and training parameters are chosen to fit the data but not explicitly listed.

axioms (1)

domain assumption Contrast-enhanced T1 and T2 MRI scans contain prognostic information for NPC progression
Invoked when using these scans as input for classification.

pith-pipeline@v0.9.0 · 5885 in / 1273 out tokens · 52188 ms · 2026-05-24T15:00:08.750468+00:00 · methodology

Deep convolution neural network model for automatic risk assessment of patients with non-metastatic nasopharyngeal carcinoma

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)