JDCNet: Confidence-Gated Privileged-Modality Distillation for Cost-Preserving X-ray Inference
Pith reviewed 2026-05-21 10:55 UTC · model grok-4.3
The pith
JDCNet shows that gating CT distillation by teacher confidence improves X-ray model performance at no extra inference cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the BIMCV cohort with patient-level cross-validation, confidence-gated soft-KL supervision from 3-slice CT improves balanced accuracy by 0.035 and mid-slice hard supervision by 0.033 over the supervised ResNet-18 baseline, while ungated logit distillation and several other transfer techniques do not clear the performance gate.
What carries the argument
A confidence threshold that filters which training samples receive auxiliary targets derived from the CT teacher model.
Load-bearing premise
The gains from confidence-gated distillation will replicate on other independent paired CT-X-ray datasets beyond the 510-patient BIMCV cohort.
What would settle it
Failure to observe similar balanced accuracy improvements in a new external paired cohort with the same cross-validation protocol would indicate the method does not transfer.
Figures
read the original abstract
We study a systems-level visual inference problem: using an expensive privileged modality during training while preserving a fixed-cost, single-modality deployment path. We present JDCNet, a confidence-gated CT-to-X-ray distillation framework in which the CT teacher supplies an auxiliary hard or temperature-scaled target only on training samples whose teacher confidence exceeds a threshold; at deployment the student takes X-ray input alone and matches the parameter, MAC, and latency profile of the supervised X-ray baseline. On a 510-patient same-patient paired BIMCV cohort with patient-level 5-fold cross-validation, two JDCNet configurations clear a fixed transfer gate against the supervised ResNet-18 baseline: 3-slice soft-KL supervision yields $\Delta\mathrm{BA}{=}{+}0.035$ ($95\%$ CI $[{+}0.011,{+}0.057]$) and mid-slice hard supervision yields $+0.033$ ($[{+}0.007,{+}0.058]$). Under the same splits and gate, logit distillation, gated logit distillation, contrastive alignment, attention transfer, feature hints, BiomedCLIP fine-tuning, and a module-augmented variant do not pass. Confidence-gated auxiliary targets are therefore a more transferable channel than uniformly softened CT logits; the evidence is bounded to one paired cohort, so external paired-cohort replication is required before any deployment claim.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents JDCNet, a confidence-gated distillation framework that uses CT as a privileged teacher modality during training to improve X-ray-based inference at deployment. The student network receives auxiliary hard or temperature-scaled targets from the CT teacher only on samples where teacher confidence exceeds a fixed threshold; at test time the model operates on X-ray input alone and matches the parameter count, MACs, and latency of a standard supervised ResNet-18 baseline. On a 510-patient same-patient paired BIMCV cohort evaluated with patient-level 5-fold cross-validation, two JDCNet variants (3-slice soft-KL supervision and mid-slice hard supervision) produce balanced-accuracy gains of +0.035 (95% CI [+0.011, +0.057]) and +0.033 ([+0.007, +0.058]) over the supervised baseline, while logit distillation, contrastive alignment, attention transfer, and several other baselines do not exceed the same fixed transfer gate. The authors explicitly bound the result to this single cohort and call for external paired-cohort replication.
Significance. If the reported gains are reproducible, the work supplies a concrete, deployment-cost-preserving route for exploiting richer but expensive modalities (CT) during training of cheaper single-modality (X-ray) models. The key technical contribution is the demonstration that confidence gating yields more transferable auxiliary targets than uniform softening or feature-level alignment methods under identical splits and gate. The use of patient-level 5-fold CV together with 95% confidence intervals that exclude zero provides a transparent empirical foundation; the explicit caveat that external replication is required is appropriately cautious.
major comments (1)
- [§4.2 and Table 2] §4.2 and Table 2: The fixed transfer gate (confidence threshold) is applied uniformly across all methods, yet the manuscript does not report whether the threshold value itself was chosen on a held-out validation fold or on the full training set; if the latter, the reported deltas for the two passing configurations may be optimistically biased relative to the non-passing baselines.
minor comments (3)
- [Abstract and §3.1] Abstract and §3.1: The precise numerical value of the confidence threshold used for the fixed transfer gate is not stated; providing it would allow exact reproduction of the gating condition.
- [§4.1] §4.1: Hyperparameter details for temperature scaling, learning-rate schedules, and the exact ResNet-18 backbone variant are referenced only by citation; a short table or appendix listing the final values used would improve reproducibility.
- [Figure 3] Figure 3: The caption does not indicate whether the displayed confidence histograms are computed on training or validation folds; clarifying this would help readers interpret the gating behavior.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for greater transparency regarding threshold selection. We address the comment below.
read point-by-point responses
-
Referee: [§4.2 and Table 2] §4.2 and Table 2: The fixed transfer gate (confidence threshold) is applied uniformly across all methods, yet the manuscript does not report whether the threshold value itself was chosen on a held-out validation fold or on the full training set; if the latter, the reported deltas for the two passing configurations may be optimistically biased relative to the non-passing baselines.
Authors: We agree that the manuscript should explicitly document how the fixed confidence threshold was determined. In the original experiments the threshold was selected via a small grid search performed on a held-out validation portion of the training data within each patient-level fold (approximately 10 % of the training patients per fold), with the final value then frozen and applied uniformly to all methods and to the test fold. This procedure avoids test-set leakage while still using only training data. To make the process fully transparent we will add a paragraph in §4.2 describing the validation-based selection and will revise the caption of Table 2 to state that “the threshold was tuned on an internal validation split of the training folds and then held fixed across all compared methods.” These changes remove any ambiguity about optimistic bias. revision: yes
Circularity Check
No circularity: purely empirical comparison on held-out folds
full rationale
The manuscript reports balanced-accuracy deltas from patient-level 5-fold cross-validation on a single 510-patient paired BIMCV cohort, with explicit bounds on generalizability and a call for external replication. No equations, first-principles derivations, or predictions are presented that reduce to fitted parameters or self-citations by construction. All reported gains are direct statistical comparisons against multiple baselines under identical splits and a fixed transfer gate; the protocol is externally falsifiable and does not rely on any load-bearing self-citation or ansatz smuggling. This is the most common honest non-finding for an empirical systems paper.
Axiom & Free-Parameter Ledger
free parameters (2)
- confidence threshold
- temperature scaling factor
Reference graph
Works this paper leans on
-
[1]
V . Madaan, A. Roy, C. Gupta, P. Agrawal, A. Sharma, C. Bologa, and R. Prodan, “XCOVNet: Chest X-ray image classification for COVID-19 early detection using convolutional neural networks,”New Generation Computing, vol. 39, no. 3, pp. 583–597, 2021
work page 2021
-
[2]
COVID-ViT: Classification of COVID- 19 from CT chest images based on vision transformer models,
X. Gao, Y . Qian, and A. Gao, “COVID-ViT: Classification of COVID- 19 from CT chest images based on vision transformer models,”arXiv preprint arXiv:2107.01682, 2021
-
[3]
COVID-19 CT image recog- nition algorithm based on transformer and CNN,
X. Fan, X. Feng, Y . Dong, and H. Hou, “COVID-19 CT image recog- nition algorithm based on transformer and CNN,”Displays, vol. 73, p. 102150, 2022
work page 2022
-
[4]
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
J. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, M. P. Lungren, and A. Y . Ng, “CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning,”arXiv preprint arXiv:1711.05225, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
A survey on deep learning in medical image analysis,
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoo- rian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. S ´anchez, “A survey on deep learning in medical image analysis,”Medical Image Analysis, vol. 42, pp. 60–88, 2017
work page 2017
-
[6]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16 x 16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[7]
Vision transformer for classification of breast ultrasound images,
B. Gheflati and H. Rivaz, “Vision transformer for classification of breast ultrasound images,”arXiv preprint arXiv:2110.14731, 2021
-
[8]
Distilling the Knowledge in a Neural Network
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[9]
FitNets: Hints for Thin Deep Nets
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y . Ben- gio, “FitNets: Hints for thin deep nets,”arXiv preprint arXiv:1412.6550, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[10]
Utilizing knowledge distillation in deep learning for classification of chest X-ray abnormalities,
T. K. K. Ho and J. Gwak, “Utilizing knowledge distillation in deep learning for classification of chest X-ray abnormalities,”IEEE Access, vol. 8, pp. 160 749–160 761, 2020. 10
work page 2020
-
[11]
Soft-label anonymous gastric X-ray image distillation,
G. Li, R. Togo, T. Ogawa, and M. Haseyama, “Soft-label anonymous gastric X-ray image distillation,” in2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 305–309
work page 2020
-
[12]
Variational knowledge distillation for disease classification in chest X-rays,
T. van Sonsbeek, X. Zhen, M. Worring, and L. Shao, “Variational knowledge distillation for disease classification in chest X-rays,” in Information Processing in Medical Imaging, 2021, pp. 334–345
work page 2021
-
[13]
Self-supervised learning with adaptive distillation for hyperspectral image classification,
J. Yue, L. Fang, H. Rahmani, and P. Ghamisi, “Self-supervised learning with adaptive distillation for hyperspectral image classification,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2021
work page 2021
-
[14]
A new learning paradigm: Learning using privileged information,
V . Vapnik and A. Vashist, “A new learning paradigm: Learning using privileged information,”Neural Networks, vol. 22, no. 5–6, pp. 544–557, 2009
work page 2009
-
[15]
Unifying distillation and privileged information,
D. Lopez-Paz, L. Bottou, B. Sch ¨olkopf, and V . Vapnik, “Unifying distillation and privileged information,” inInternational Conference on Learning Representations (ICLR), 2016. [Online]. Available: http://leon.bottou.org/papers/lopez-paz-2016
work page 2016
-
[16]
Learning with side information through modality hallucination for action recognition,
J. Hoffman, S. Gupta, and T. Darrell, “Learning with side information through modality hallucination for action recognition,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 826–834
work page 2016
-
[17]
Cross modal distillation for super- vision transfer,
S. Gupta, J. Hoffman, and J. Malik, “Cross modal distillation for super- vision transfer,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2827–2836
work page 2016
-
[18]
S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via atten- tion transfer,” inInternational Conference on Learning Representations (ICLR), 2017
work page 2017
-
[19]
Key challenges for delivering clinical impact with artificial intelligence,
C. J. Kelly, A. Karthikesalingam, M. Suleyman, G. Corrado, and D. King, “Key challenges for delivering clinical impact with artificial intelligence,” BMC Medicine, vol. 17, no. 1, p. 195, 2019
work page 2019
-
[20]
L. Wynants, B. van Calster, G. S. Collins, R. D. Riley, G. Heinze, E. Schuit, E. Albu, B. Arshi, V . Bellou, M. M. J. Bontenet al., “Prediction models for diagnosis and prognosis of COVID-19 infection: Systematic review and critical appraisal,”BMJ, vol. 369, p. m1328, 2020
work page 2020
-
[21]
M. Roberts, D. Driggs, M. Thorpe, J. Gilbey, M. Yeung, S. Ursprung, A. I. Aviles-Rivero, C. Etmann, C. McCague, L. Beer, J. R. Weir- McCall, Z. Teng, E. Gkrania-Klotsas, J. H. F. Rudd, E. Sala, C.-B. Sch¨onliebet al., “Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT sca...
work page 2021
-
[22]
Why rankings of biomedical image analysis competitions should be interpreted with care,
L. Maier-Hein, M. Eisenmann, A. Reinke, S. Onogur, M. Stankovic, P. Scholz, T. Arbel, H. Bogunovic, A. P. Bradley, A. Carass, C. Feldmann, A. F. Frangi, P. M. Full, B. van Ginneken, A. Hanbury, K. Honauer, M. Kozubek, B. A. Landman, K. H. Maier-Hein, H. M ¨ulleret al., “Why rankings of biomedical image analysis competitions should be interpreted with care...
work page 2018
-
[23]
Machine learning for medical imag- ing: Methodological failures and recommendations for the future,
G. Varoquaux and V . Cheplygina, “Machine learning for medical imag- ing: Methodological failures and recommendations for the future,”npj Digital Medicine, vol. 5, p. 48, 2022
work page 2022
-
[24]
Covid-19 image data collection,
J. P. Cohen, P. Morrison, and L. Dao, “COVID-19 image data collection,”arXiv preprint arXiv:2003.11597, 2020. [Online]. Available: https://github.com/ieee8023/covid-chestxray-dataset
-
[25]
COVID-19 image data collection: Prospective predictions are the future,
J. P. Cohen, P. Morrison, L. Dao, K. Roth, T. Q. Duong, and M. Ghassemi, “COVID-19 image data collection: Prospective predictions are the future,”arXiv preprint arXiv:2006.11988, 2020. [Online]. Available: https://github.com/ieee8023/covid-chestxray-dataset
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.