pith. sign in

arxiv: 1906.09587 · v1 · pith:NQQGFS6Inew · submitted 2019-06-23 · 💻 cs.CV · cs.AI

Semi-Supervised Learning for Cancer Detection of Lymph Node Metastases

Pith reviewed 2026-05-25 17:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords semi-supervised learningpseudo labelsPatchCamelyonlymph node metastasesconvolutional neural networkAUC metrichistopathologycancer detection
0
0 comments X

The pith

A CNN model trained with pseudo labels on the PCam dataset achieves higher AUC than a strong supervised baseline for detecting lymph node metastases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep convolutional neural network to identify metastasized cancer cells in lymph node pathology scans from the PatchCamelyon benchmark. It applies semi-supervised training that generates pseudo labels for the unlabeled images and includes them during model fitting. This produces better results on the AUC metric than a strong CNN trained only on labeled data. The approach addresses the practical difficulty pathologists face when reviewing large numbers of scans for signs of metastasis.

Core claim

The paper establishes that a deep convolutional neural network trained with a semi-supervised learning approach by using pseudo labels on PCam-level significantly leads to better performances to strong CNN baseline on the AUC metric.

What carries the argument

The semi-supervised training procedure that generates pseudo labels from the unlabeled portion of the PCam dataset and adds them to the supervised training set.

If this is right

  • The semi-supervised model records a higher AUC score than the baseline CNN on the PCam benchmark.
  • Pseudo labels can be used to leverage additional unlabeled histopathology images during training.
  • The method reduces reliance on fully labeled data while maintaining or improving detection performance.
  • The trained model can be applied directly to new PCam-style scans for metastasis classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pseudo-labeling step could be tested on other limited-label medical imaging tasks to measure transfer.
  • Performance gains may depend on how well the initial supervised model performs before generating the pseudo labels.
  • Combining the approach with other consistency-based semi-supervised methods might produce additional improvements on the same dataset.

Load-bearing premise

The pseudo labels generated for the unlabeled PCam images are accurate enough that adding them improves the model's generalization rather than introducing label noise.

What would settle it

A controlled experiment that trains both models on the same data split and reports an AUC for the pseudo-label version that is equal to or lower than the supervised baseline would falsify the claim.

Figures

Figures reproduced from arXiv: 1906.09587 by Amit Kumar Jaiswal, Dimitrij Shulkin, Ivan Panshin, Nagender Aneja, Samuel Abramov.

Figure 1
Figure 1. Figure 1: DenseNet201 Block Architecture 2.3. One Cycle Policy In this work, we use one cycle policy approach. It was first introduced for SGD [26]. One cycle policy is a slight modification of cyclical learning rate policy (CLR) where a minimum and maximum learning rate limits with a step size was specified [24]. This policy allows the loss to plateau before the training ends. It combines the advantages of cur￾ricu… view at source ↗
Figure 2
Figure 2. Figure 2: One Cyclic Policy - Learning Rate Momentum and learning rate are closely related. The optimal learning rate depends on the momentum and the momentum depends on the learning rate [25]. Also, they found in their experiments that cyclical momentum led to better results. In practice, they recommend choosing two values such as 0.85 and 0.95 and reducing them from the higher to the lower value when the learning … view at source ↗
Figure 3
Figure 3. Figure 3: Images as Outliers in the Train Set Finally, we resize the images from 96 x 96 to 224 x 224 pixel as the pre-trained models were originally trained on this size. After each semi-supervised learning run, more and more pseudo labels could be predicted, thus the training corpus could be increased where we perform random split to train and validation set. Moreover, we apply a set of 10 online data augmenta￾tio… view at source ↗
Figure 4
Figure 4. Figure 4: Area under the ROC Curve [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Pathologists find tedious to examine the status of the sentinel lymph node on a large number of pathological scans. The examination process of such lymph node which encompasses metastasized cancer cells is histopathologically organized. However, the task of finding metastatic tissues is gradual which is often challenging. In this work, we present our deep convolutional neural network based model validated on PatchCamelyon (PCam) benchmark dataset for fundamental machine learning research in histopathology diagnosis. We find that our proposed model trained with a semi-supervised learning approach by using pseudo labels on PCam-level significantly leads to better performances to strong CNN baseline on the AUC metric.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a deep CNN model for detecting lymph node metastases on the PatchCamelyon (PCam) benchmark. It claims that training this model with a semi-supervised approach that generates and uses pseudo labels on the PCam data yields significantly higher AUC than a strong supervised CNN baseline.

Significance. If the central claim were substantiated with the missing experimental details, the result would indicate that pseudo-label-based semi-supervised learning can improve generalization in histopathology classification tasks where labeled data are expensive to obtain. This would be of practical interest for medical imaging applications that face similar annotation bottlenecks.

major comments (3)
  1. [Abstract] Abstract: The central claim of a significant AUC improvement is asserted without any reported numerical AUC values for the baseline or proposed model, without the exact pseudo-label generation procedure (initial model, thresholds, iteration schedule), without train/validation splits, and without statistical tests. These omissions make the claim impossible to evaluate from the manuscript.
  2. [Abstract] Abstract: No evidence is supplied that the reported gain is attributable to the semi-supervised procedure rather than additional training epochs, hyper-parameter tuning, or other uncontrolled factors on the same data. This circularity risk is load-bearing because the improvement is presented as resulting from the use of pseudo labels.
  3. [Abstract] Abstract: The manuscript supplies no description of how pseudo-label accuracy was verified on the unlabeled PCam patches or any ablation isolating the contribution of the pseudo labels versus the base CNN architecture. Without this, it is impossible to determine whether the pseudo labels reduce generalization error or inject harmful noise.
minor comments (2)
  1. [Abstract] Grammatical and phrasing issues: 'Pathologists find tedious to examine' should read 'Pathologists find it tedious to examine'; 'the task of finding metastatic tissues is gradual which is often challenging' is unclear and should be rephrased for precision.
  2. [Abstract] The phrase 'on PCam-level' is used without definition or explanation of what it denotes in the context of the dataset or method.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment below and will revise the abstract and related sections to improve clarity and substantiation of our claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of a significant AUC improvement is asserted without any reported numerical AUC values for the baseline or proposed model, without the exact pseudo-label generation procedure (initial model, thresholds, iteration schedule), without train/validation splits, and without statistical tests. These omissions make the claim impossible to evaluate from the manuscript.

    Authors: We agree that the abstract should report these details for self-containment. In the revision we will add the specific AUC values for the supervised baseline and the semi-supervised model, a concise description of the pseudo-label procedure (including initial model, threshold, and iterations), the train/validation splits employed, and the results of statistical tests comparing the two approaches. revision: yes

  2. Referee: [Abstract] Abstract: No evidence is supplied that the reported gain is attributable to the semi-supervised procedure rather than additional training epochs, hyper-parameter tuning, or other uncontrolled factors on the same data. This circularity risk is load-bearing because the improvement is presented as resulting from the use of pseudo labels.

    Authors: We acknowledge this concern. To isolate the contribution of pseudo-labeling, the revised manuscript will include a controlled comparison in which the baseline CNN is trained for an identical number of epochs and with the same hyper-parameters but without pseudo labels. This will provide direct evidence that the observed AUC gain stems from the semi-supervised procedure rather than extraneous training factors. revision: yes

  3. Referee: [Abstract] Abstract: The manuscript supplies no description of how pseudo-label accuracy was verified on the unlabeled PCam patches or any ablation isolating the contribution of the pseudo labels versus the base CNN architecture. Without this, it is impossible to determine whether the pseudo labels reduce generalization error or inject harmful noise.

    Authors: We agree that an explicit verification step and ablation are necessary. In the revision we will describe how pseudo-label quality was assessed (e.g., via a held-out labeled subset) and add an ablation experiment that trains the identical CNN architecture with and without the pseudo-label component, thereby isolating the effect of the pseudo labels on generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claim with no self-referential derivation.

full rationale

The paper reports an experimental outcome: a CNN trained via pseudo-label semi-supervised learning on PCam yields higher AUC than a supervised baseline. No equations, derivations, or load-bearing self-citations appear in the abstract or described text. The performance claim is an observed result rather than a quantity defined in terms of itself or a fitted parameter renamed as a prediction. No uniqueness theorems, ansatzes smuggled via citation, or renamings of known results are present. The result stands as a self-contained empirical finding on the given dataset and metric.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, no explicit assumptions, and no new entities. All modeling choices remain implicit.

pith-pipeline@v0.9.0 · 5642 in / 1157 out tokens · 27457 ms · 2026-05-25T17:45:47.422975+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 8 internal anchors

  1. [1]

    Simulated annealing and boltz- mann machines

    Emile Aarts and Jan Korst. Simulated annealing and boltz- mann machines. 1988

  2. [2]

    Diagnos- tic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer

    Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak, Meyke Hermsen, Quirine F Manson, Maschenka Balkenhol, et al. Diagnos- tic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22):2199–2210, 2017

  3. [3]

    Curriculum learning

    Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Ja- son Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48. ACM, 2009

  4. [4]

    Deep neural network ensembles for time series classification

    H Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhas- sane Idoumghar, and P Muller. Deep neural network ensembles for time series classification. arXiv preprint arXiv:1903.06602, 2019

  5. [5]

    Deep learning algorithms for detection of lymph node metastases from breast cancer: helping artifi- cial intelligence be seen

    Jeffrey Alan Golden. Deep learning algorithms for detection of lymph node metastases from breast cancer: helping artifi- cial intelligence be seen. Jama, 318(22):2184–2186, 2017

  6. [6]

    Prostate histopathology: Learning tissue component histograms for cancer detection and classifica- tion

    Lena Gorelick, Olga Veksler, Mena Gaed, Jos ´e A G ´omez, Madeleine Moussa, Glenn Bauman, Aaron Fenster, and Aaron D Ward. Prostate histopathology: Learning tissue component histograms for cancer detection and classifica- tion. IEEE transactions on medical imaging , 32(10):1804– 1818, 2013

  7. [7]

    Semi-supervised learning by entropy minimization

    Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pages 529–536, 2005

  8. [8]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  9. [9]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An- dreetto, and Hartwig Adam. Mobilenets: Efficient convolu- tional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

  10. [10]

    Densely connected convolutional net- works

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

  11. [11]

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

    Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal co- variate shift. arXiv preprint arXiv:1502.03167, 2015

  12. [12]

    The relative performance of ensemble methods with deep convo- lutional neural networks for image classification

    Cheng Ju, Aur ´elien Bibaut, and Mark van der Laan. The relative performance of ensemble methods with deep convo- lutional neural networks for image classification. Journal of Applied Statistics, 45(15):2800–2818, 2018

  13. [13]

    Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks

    Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, page 2, 2013

  14. [14]

    Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

    Xiang Li, Shuo Chen, Xiaolin Hu, and Jian Yang. Under- standing the disharmony between dropout and batch normal- ization by variance shift. arXiv preprint arXiv:1801.05134, 2018

  15. [15]

    Network In Network

    Min Lin, Qiang Chen, and Shuicheng Yan. Network in net- work. arXiv preprint arXiv:1312.4400, 2013

  16. [16]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

  17. [17]

    A survey on deep learning in medical image analysis

    Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Ar- naud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Gin- neken, and Clara I S ´anchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017

  18. [18]

    Make (Nearly) Every Neural Network Better: Generating Neural Network Ensembles by Weight Parameter Resampling

    Jiayi Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. Make (nearly) every neural network better: Generating neural network ensembles by weight parameter resampling. arXiv preprint arXiv:1807.00847, 2018

  19. [19]

    Detecting cancer metastases on gigapixel pathol- ogy images

    Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venu- gopalan, Aleksei Timofeev, Philip Q Nelson, Greg S Cor- rado, et al. Detecting cancer metastases on gigapixel pathol- ogy images. arXiv preprint arXiv:1703.02442, 2017

  20. [20]

    Improving Neural Architecture Search Image Classifiers via Ensemble Learning

    Vladimir Macko, Charles Weill, Hanna Mazzawi, and Javier Gonzalvo. Improving neural architecture search image classifiers via ensemble learning. arXiv preprint arXiv:1903.06236, 2019

  21. [21]

    Histopathological breast cancer image classification by deep neural network techniques guided by local cluster- ing

    Abdullah-Al Nahid, Mohamad Ali Mehrabi, and Yinan Kong. Histopathological breast cancer image classification by deep neural network techniques guided by local cluster- ing. BioMed Research International, 2018

  22. [22]

    H. Pang, W. Lin, C. Wang, and C. Zhao. Using transfer learn- ing to detect breast cancer without network training. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), pages 381–385, Nov 2018

  23. [23]

    On the difficulty of training recurrent neural networks

    Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. In Inter- national conference on machine learning, pages 1310–1318, 2013

  24. [24]

    Cyclical learning rates for training neural networks

    Leslie N Smith. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. IEEE, 2017

  25. [25]

    A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

    Leslie N Smith. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momen- 7 tum, and weight decay. arXiv preprint arXiv:1803.09820 , 2018

  26. [26]

    Super-convergence: Very fast training of residual networks using large learning rates

    Leslie N Smith and Nicholay Topin. Super-convergence: Very fast training of residual networks using large learning rates. 2018

  27. [27]

    A dataset for breast cancer histopathologi- cal image classification

    Fabio A Spanhol, Luiz S Oliveira, Caroline Petitjean, and Laurent Heutte. A dataset for breast cancer histopathologi- cal image classification. IEEE Transactions on Biomedical Engineering, 63(7):1455–1462, 2016

  28. [28]

    Dropout: a simple way to prevent neural networks from overfitting

    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014

  29. [29]

    Training very deep networks

    Rupesh K Srivastava, Klaus Greff, and J ¨urgen Schmidhuber. Training very deep networks. In Advances in neural infor- mation processing systems, pages 2377–2385, 2015

  30. [30]

    Steiner, Robert MacDonald, Yun Liu, Peter Truszkowski, Jason D

    David F. Steiner, Robert MacDonald, Yun Liu, Peter Truszkowski, Jason D. Hipp, Christopher Gammage, Flo- rence Thng, Lily Peng, and Martin C. Stumpe. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer, 2018

  31. [31]

    Going deeper with convolutions

    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

  32. [32]

    High-performance medicine: the conver- gence of human and artificial intelligence

    Eric J Topol. High-performance medicine: the conver- gence of human and artificial intelligence. Nature medicine, 25(1):44, 2019

  33. [33]

    Rotation equivariant cnns for digital pathology

    Bastiaan S Veeling, Jasper Linmans, Jim Winkens, Taco Co- hen, and Max Welling. Rotation equivariant cnns for digital pathology. In International Conference on Medical image computing and computer-assisted intervention , pages 210–

  34. [35]

    Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, and Andrew H. Beck. Deep Learning for Iden- tifying Metastatic Breast Cancer. arXiv e-prints , page arXiv:1606.05718, Jun 2016

  35. [36]

    Automatic brain tumor segmentation using con- volutional neural networks with test-time augmentation

    Guotai Wang, Wenqi Li, S ´ebastien Ourselin, and Tom Ver- cauteren. Automatic brain tumor segmentation using con- volutional neural networks with test-time augmentation. In International MICCAI Brainlesion Workshop, pages 61–72. Springer, 2018

  36. [37]

    Computer aided lung cancer diagnosis with deep learning algorithms, 2016

    Wei Qian Wenqing Sun, Bin Zheng. Computer aided lung cancer diagnosis with deep learning algorithms, 2016

  37. [38]

    Shufflenet: An extremely efficient convolutional neural net- work for mobile devices

    Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural net- work for mobile devices. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 6848–6856, 2018. 8