Semi-Supervised Learning for Cancer Detection of Lymph Node Metastases

Amit Kumar Jaiswal; Dimitrij Shulkin; Ivan Panshin; Nagender Aneja; Samuel Abramov

arxiv: 1906.09587 · v1 · pith:NQQGFS6Inew · submitted 2019-06-23 · 💻 cs.CV · cs.AI

Semi-Supervised Learning for Cancer Detection of Lymph Node Metastases

Amit Kumar Jaiswal , Ivan Panshin , Dimitrij Shulkin , Nagender Aneja , Samuel Abramov This is my paper

Pith reviewed 2026-05-25 17:45 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords semi-supervised learningpseudo labelsPatchCamelyonlymph node metastasesconvolutional neural networkAUC metrichistopathologycancer detection

0 comments

The pith

A CNN model trained with pseudo labels on the PCam dataset achieves higher AUC than a strong supervised baseline for detecting lymph node metastases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep convolutional neural network to identify metastasized cancer cells in lymph node pathology scans from the PatchCamelyon benchmark. It applies semi-supervised training that generates pseudo labels for the unlabeled images and includes them during model fitting. This produces better results on the AUC metric than a strong CNN trained only on labeled data. The approach addresses the practical difficulty pathologists face when reviewing large numbers of scans for signs of metastasis.

Core claim

The paper establishes that a deep convolutional neural network trained with a semi-supervised learning approach by using pseudo labels on PCam-level significantly leads to better performances to strong CNN baseline on the AUC metric.

What carries the argument

The semi-supervised training procedure that generates pseudo labels from the unlabeled portion of the PCam dataset and adds them to the supervised training set.

If this is right

The semi-supervised model records a higher AUC score than the baseline CNN on the PCam benchmark.
Pseudo labels can be used to leverage additional unlabeled histopathology images during training.
The method reduces reliance on fully labeled data while maintaining or improving detection performance.
The trained model can be applied directly to new PCam-style scans for metastasis classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pseudo-labeling step could be tested on other limited-label medical imaging tasks to measure transfer.
Performance gains may depend on how well the initial supervised model performs before generating the pseudo labels.
Combining the approach with other consistency-based semi-supervised methods might produce additional improvements on the same dataset.

Load-bearing premise

The pseudo labels generated for the unlabeled PCam images are accurate enough that adding them improves the model's generalization rather than introducing label noise.

What would settle it

A controlled experiment that trains both models on the same data split and reports an AUC for the pseudo-label version that is equal to or lower than the supervised baseline would falsify the claim.

Figures

Figures reproduced from arXiv: 1906.09587 by Amit Kumar Jaiswal, Dimitrij Shulkin, Ivan Panshin, Nagender Aneja, Samuel Abramov.

**Figure 1.** Figure 1: DenseNet201 Block Architecture 2.3. One Cycle Policy In this work, we use one cycle policy approach. It was first introduced for SGD [26]. One cycle policy is a slight modification of cyclical learning rate policy (CLR) where a minimum and maximum learning rate limits with a step size was specified [24]. This policy allows the loss to plateau before the training ends. It combines the advantages of curricu… view at source ↗

**Figure 2.** Figure 2: One Cyclic Policy - Learning Rate Momentum and learning rate are closely related. The optimal learning rate depends on the momentum and the momentum depends on the learning rate [25]. Also, they found in their experiments that cyclical momentum led to better results. In practice, they recommend choosing two values such as 0.85 and 0.95 and reducing them from the higher to the lower value when the learning … view at source ↗

**Figure 3.** Figure 3: Images as Outliers in the Train Set Finally, we resize the images from 96 x 96 to 224 x 224 pixel as the pre-trained models were originally trained on this size. After each semi-supervised learning run, more and more pseudo labels could be predicted, thus the training corpus could be increased where we perform random split to train and validation set. Moreover, we apply a set of 10 online data augmentatio… view at source ↗

**Figure 4.** Figure 4: Area under the ROC Curve [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Pathologists find tedious to examine the status of the sentinel lymph node on a large number of pathological scans. The examination process of such lymph node which encompasses metastasized cancer cells is histopathologically organized. However, the task of finding metastatic tissues is gradual which is often challenging. In this work, we present our deep convolutional neural network based model validated on PatchCamelyon (PCam) benchmark dataset for fundamental machine learning research in histopathology diagnosis. We find that our proposed model trained with a semi-supervised learning approach by using pseudo labels on PCam-level significantly leads to better performances to strong CNN baseline on the AUC metric.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies pseudo-labeling to PCam and claims an AUC gain, but gives no numbers, no label-generation details, and no way to check the result.

read the letter

The main takeaway is that this work trains a CNN on the PCam dataset using pseudo labels for the unlabeled patches and reports better AUC than a supervised baseline. That is the entire claim in plain terms. The motivation about helping pathologists with lymph node scans is straightforward and the choice of a public benchmark is sensible. Beyond the application itself there is nothing new in the method or the setup. It is a standard semi-supervised recipe on an already-used dataset. The paper does not introduce any new algorithm or derivation. What it does well is keep the focus on a real clinical task and stick to reproducible data. That is the extent of the credit. The soft spots are large and central. The abstract supplies no AUC values at all, no description of how the pseudo labels were produced, no confidence threshold, no iteration count, and no train-validation split information. Without those elements the reported gain cannot be evaluated. The stress-test point holds: if the initial model that generates the labels is weak, the added labels are just noise and the improvement could be an artifact of extra training time or tuning rather than semi-supervised benefit. The paper does not address label quality or run any check against that possibility. The evidence is too thin to support the central claim. This is the sort of short application note that might interest a small group already working on SSL for digital pathology, mainly to discuss what details are missing. A broader reader gets little usable information. The work does not show enough grounding or evidence to deserve referee time. I would not send it for peer review until the methods and results sections are expanded with the missing numbers and procedures.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a deep CNN model for detecting lymph node metastases on the PatchCamelyon (PCam) benchmark. It claims that training this model with a semi-supervised approach that generates and uses pseudo labels on the PCam data yields significantly higher AUC than a strong supervised CNN baseline.

Significance. If the central claim were substantiated with the missing experimental details, the result would indicate that pseudo-label-based semi-supervised learning can improve generalization in histopathology classification tasks where labeled data are expensive to obtain. This would be of practical interest for medical imaging applications that face similar annotation bottlenecks.

major comments (3)

[Abstract] Abstract: The central claim of a significant AUC improvement is asserted without any reported numerical AUC values for the baseline or proposed model, without the exact pseudo-label generation procedure (initial model, thresholds, iteration schedule), without train/validation splits, and without statistical tests. These omissions make the claim impossible to evaluate from the manuscript.
[Abstract] Abstract: No evidence is supplied that the reported gain is attributable to the semi-supervised procedure rather than additional training epochs, hyper-parameter tuning, or other uncontrolled factors on the same data. This circularity risk is load-bearing because the improvement is presented as resulting from the use of pseudo labels.
[Abstract] Abstract: The manuscript supplies no description of how pseudo-label accuracy was verified on the unlabeled PCam patches or any ablation isolating the contribution of the pseudo labels versus the base CNN architecture. Without this, it is impossible to determine whether the pseudo labels reduce generalization error or inject harmful noise.

minor comments (2)

[Abstract] Grammatical and phrasing issues: 'Pathologists find tedious to examine' should read 'Pathologists find it tedious to examine'; 'the task of finding metastatic tissues is gradual which is often challenging' is unclear and should be rephrased for precision.
[Abstract] The phrase 'on PCam-level' is used without definition or explanation of what it denotes in the context of the dataset or method.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment below and will revise the abstract and related sections to improve clarity and substantiation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of a significant AUC improvement is asserted without any reported numerical AUC values for the baseline or proposed model, without the exact pseudo-label generation procedure (initial model, thresholds, iteration schedule), without train/validation splits, and without statistical tests. These omissions make the claim impossible to evaluate from the manuscript.

Authors: We agree that the abstract should report these details for self-containment. In the revision we will add the specific AUC values for the supervised baseline and the semi-supervised model, a concise description of the pseudo-label procedure (including initial model, threshold, and iterations), the train/validation splits employed, and the results of statistical tests comparing the two approaches. revision: yes
Referee: [Abstract] Abstract: No evidence is supplied that the reported gain is attributable to the semi-supervised procedure rather than additional training epochs, hyper-parameter tuning, or other uncontrolled factors on the same data. This circularity risk is load-bearing because the improvement is presented as resulting from the use of pseudo labels.

Authors: We acknowledge this concern. To isolate the contribution of pseudo-labeling, the revised manuscript will include a controlled comparison in which the baseline CNN is trained for an identical number of epochs and with the same hyper-parameters but without pseudo labels. This will provide direct evidence that the observed AUC gain stems from the semi-supervised procedure rather than extraneous training factors. revision: yes
Referee: [Abstract] Abstract: The manuscript supplies no description of how pseudo-label accuracy was verified on the unlabeled PCam patches or any ablation isolating the contribution of the pseudo labels versus the base CNN architecture. Without this, it is impossible to determine whether the pseudo labels reduce generalization error or inject harmful noise.

Authors: We agree that an explicit verification step and ablation are necessary. In the revision we will describe how pseudo-label quality was assessed (e.g., via a held-out labeled subset) and add an ablation experiment that trains the identical CNN architecture with and without the pseudo-label component, thereby isolating the effect of the pseudo labels on generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claim with no self-referential derivation.

full rationale

The paper reports an experimental outcome: a CNN trained via pseudo-label semi-supervised learning on PCam yields higher AUC than a supervised baseline. No equations, derivations, or load-bearing self-citations appear in the abstract or described text. The performance claim is an observed result rather than a quantity defined in terms of itself or a fitted parameter renamed as a prediction. No uniqueness theorems, ansatzes smuggled via citation, or renamings of known results are present. The result stands as a self-contained empirical finding on the given dataset and metric.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, no explicit assumptions, and no new entities. All modeling choices remain implicit.

pith-pipeline@v0.9.0 · 5642 in / 1157 out tokens · 27457 ms · 2026-05-25T17:45:47.422975+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 8 internal anchors

[1]

Simulated annealing and boltz- mann machines

Emile Aarts and Jan Korst. Simulated annealing and boltz- mann machines. 1988

work page 1988
[2]

Diagnos- tic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer

Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak, Meyke Hermsen, Quirine F Manson, Maschenka Balkenhol, et al. Diagnos- tic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22):2199–2210, 2017

work page 2017
[3]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Ja- son Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48. ACM, 2009

work page 2009
[4]

Deep neural network ensembles for time series classiﬁcation

H Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhas- sane Idoumghar, and P Muller. Deep neural network ensembles for time series classiﬁcation. arXiv preprint arXiv:1903.06602, 2019

work page arXiv 1903
[5]

Deep learning algorithms for detection of lymph node metastases from breast cancer: helping artiﬁ- cial intelligence be seen

Jeffrey Alan Golden. Deep learning algorithms for detection of lymph node metastases from breast cancer: helping artiﬁ- cial intelligence be seen. Jama, 318(22):2184–2186, 2017

work page 2017
[6]

Prostate histopathology: Learning tissue component histograms for cancer detection and classiﬁca- tion

Lena Gorelick, Olga Veksler, Mena Gaed, Jos ´e A G ´omez, Madeleine Moussa, Glenn Bauman, Aaron Fenster, and Aaron D Ward. Prostate histopathology: Learning tissue component histograms for cancer detection and classiﬁca- tion. IEEE transactions on medical imaging , 32(10):1804– 1818, 2013

work page 2013
[7]

Semi-supervised learning by entropy minimization

Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pages 529–536, 2005

work page 2005
[8]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[9]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An- dreetto, and Hartwig Adam. Mobilenets: Efﬁcient convolu- tional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Densely connected convolutional net- works

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017
[11]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal co- variate shift. arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

The relative performance of ensemble methods with deep convo- lutional neural networks for image classiﬁcation

Cheng Ju, Aur ´elien Bibaut, and Mark van der Laan. The relative performance of ensemble methods with deep convo- lutional neural networks for image classiﬁcation. Journal of Applied Statistics, 45(15):2800–2818, 2018

work page 2018
[13]

Pseudo-label: The simple and efﬁcient semi-supervised learning method for deep neural networks

Dong-Hyun Lee. Pseudo-label: The simple and efﬁcient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, page 2, 2013

work page 2013
[14]

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

Xiang Li, Shuo Chen, Xiaolin Hu, and Jian Yang. Under- standing the disharmony between dropout and batch normal- ization by variance shift. arXiv preprint arXiv:1801.05134, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Network In Network

Min Lin, Qiang Chen, and Shuicheng Yan. Network in net- work. arXiv preprint arXiv:1312.4400, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[16]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

work page 2017
[17]

A survey on deep learning in medical image analysis

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Ar- naud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Gin- neken, and Clara I S ´anchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017

work page 2017
[18]

Make (Nearly) Every Neural Network Better: Generating Neural Network Ensembles by Weight Parameter Resampling

Jiayi Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. Make (nearly) every neural network better: Generating neural network ensembles by weight parameter resampling. arXiv preprint arXiv:1807.00847, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Detecting cancer metastases on gigapixel pathol- ogy images

Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venu- gopalan, Aleksei Timofeev, Philip Q Nelson, Greg S Cor- rado, et al. Detecting cancer metastases on gigapixel pathol- ogy images. arXiv preprint arXiv:1703.02442, 2017

work page arXiv 2017
[20]

Improving Neural Architecture Search Image Classifiers via Ensemble Learning

Vladimir Macko, Charles Weill, Hanna Mazzawi, and Javier Gonzalvo. Improving neural architecture search image classiﬁers via ensemble learning. arXiv preprint arXiv:1903.06236, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[21]

Histopathological breast cancer image classiﬁcation by deep neural network techniques guided by local cluster- ing

Abdullah-Al Nahid, Mohamad Ali Mehrabi, and Yinan Kong. Histopathological breast cancer image classiﬁcation by deep neural network techniques guided by local cluster- ing. BioMed Research International, 2018

work page 2018
[22]

H. Pang, W. Lin, C. Wang, and C. Zhao. Using transfer learn- ing to detect breast cancer without network training. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), pages 381–385, Nov 2018

work page 2018
[23]

On the difﬁculty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difﬁculty of training recurrent neural networks. In Inter- national conference on machine learning, pages 1310–1318, 2013

work page 2013
[24]

Cyclical learning rates for training neural networks

Leslie N Smith. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. IEEE, 2017

work page 2017
[25]

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Leslie N Smith. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momen- 7 tum, and weight decay. arXiv preprint arXiv:1803.09820 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Super-convergence: Very fast training of residual networks using large learning rates

Leslie N Smith and Nicholay Topin. Super-convergence: Very fast training of residual networks using large learning rates. 2018

work page 2018
[27]

A dataset for breast cancer histopathologi- cal image classiﬁcation

Fabio A Spanhol, Luiz S Oliveira, Caroline Petitjean, and Laurent Heutte. A dataset for breast cancer histopathologi- cal image classiﬁcation. IEEE Transactions on Biomedical Engineering, 63(7):1455–1462, 2016

work page 2016
[28]

Dropout: a simple way to prevent neural networks from overﬁtting

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014

work page 1929
[29]

Training very deep networks

Rupesh K Srivastava, Klaus Greff, and J ¨urgen Schmidhuber. Training very deep networks. In Advances in neural infor- mation processing systems, pages 2377–2385, 2015

work page 2015
[30]

Steiner, Robert MacDonald, Yun Liu, Peter Truszkowski, Jason D

David F. Steiner, Robert MacDonald, Yun Liu, Peter Truszkowski, Jason D. Hipp, Christopher Gammage, Flo- rence Thng, Lily Peng, and Martin C. Stumpe. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer, 2018

work page 2018
[31]

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

work page 2015
[32]

High-performance medicine: the conver- gence of human and artiﬁcial intelligence

Eric J Topol. High-performance medicine: the conver- gence of human and artiﬁcial intelligence. Nature medicine, 25(1):44, 2019

work page 2019
[33]

Rotation equivariant cnns for digital pathology

Bastiaan S Veeling, Jasper Linmans, Jim Winkens, Taco Co- hen, and Max Welling. Rotation equivariant cnns for digital pathology. In International Conference on Medical image computing and computer-assisted intervention , pages 210–

work page
[35]

Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, and Andrew H. Beck. Deep Learning for Iden- tifying Metastatic Breast Cancer. arXiv e-prints , page arXiv:1606.05718, Jun 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[36]

Automatic brain tumor segmentation using con- volutional neural networks with test-time augmentation

Guotai Wang, Wenqi Li, S ´ebastien Ourselin, and Tom Ver- cauteren. Automatic brain tumor segmentation using con- volutional neural networks with test-time augmentation. In International MICCAI Brainlesion Workshop, pages 61–72. Springer, 2018

work page 2018
[37]

Computer aided lung cancer diagnosis with deep learning algorithms, 2016

Wei Qian Wenqing Sun, Bin Zheng. Computer aided lung cancer diagnosis with deep learning algorithms, 2016

work page 2016
[38]

Shufﬂenet: An extremely efﬁcient convolutional neural net- work for mobile devices

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufﬂenet: An extremely efﬁcient convolutional neural net- work for mobile devices. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 6848–6856, 2018. 8

work page 2018

[1] [1]

Simulated annealing and boltz- mann machines

Emile Aarts and Jan Korst. Simulated annealing and boltz- mann machines. 1988

work page 1988

[2] [2]

Diagnos- tic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer

Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak, Meyke Hermsen, Quirine F Manson, Maschenka Balkenhol, et al. Diagnos- tic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22):2199–2210, 2017

work page 2017

[3] [3]

Curriculum learning

Yoshua Bengio, J ´erˆome Louradour, Ronan Collobert, and Ja- son Weston. Curriculum learning. InProceedings of the 26th annual international conference on machine learning, pages 41–48. ACM, 2009

work page 2009

[4] [4]

Deep neural network ensembles for time series classiﬁcation

H Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhas- sane Idoumghar, and P Muller. Deep neural network ensembles for time series classiﬁcation. arXiv preprint arXiv:1903.06602, 2019

work page arXiv 1903

[5] [5]

Deep learning algorithms for detection of lymph node metastases from breast cancer: helping artiﬁ- cial intelligence be seen

Jeffrey Alan Golden. Deep learning algorithms for detection of lymph node metastases from breast cancer: helping artiﬁ- cial intelligence be seen. Jama, 318(22):2184–2186, 2017

work page 2017

[6] [6]

Prostate histopathology: Learning tissue component histograms for cancer detection and classiﬁca- tion

Lena Gorelick, Olga Veksler, Mena Gaed, Jos ´e A G ´omez, Madeleine Moussa, Glenn Bauman, Aaron Fenster, and Aaron D Ward. Prostate histopathology: Learning tissue component histograms for cancer detection and classiﬁca- tion. IEEE transactions on medical imaging , 32(10):1804– 1818, 2013

work page 2013

[7] [7]

Semi-supervised learning by entropy minimization

Yves Grandvalet and Yoshua Bengio. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, pages 529–536, 2005

work page 2005

[8] [8]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[9] [9]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco An- dreetto, and Hartwig Adam. Mobilenets: Efﬁcient convolu- tional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Densely connected convolutional net- works

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kil- ian Q Weinberger. Densely connected convolutional net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017

[11] [11]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal co- variate shift. arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

The relative performance of ensemble methods with deep convo- lutional neural networks for image classiﬁcation

Cheng Ju, Aur ´elien Bibaut, and Mark van der Laan. The relative performance of ensemble methods with deep convo- lutional neural networks for image classiﬁcation. Journal of Applied Statistics, 45(15):2800–2818, 2018

work page 2018

[13] [13]

Pseudo-label: The simple and efﬁcient semi-supervised learning method for deep neural networks

Dong-Hyun Lee. Pseudo-label: The simple and efﬁcient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, page 2, 2013

work page 2013

[14] [14]

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

Xiang Li, Shuo Chen, Xiaolin Hu, and Jian Yang. Under- standing the disharmony between dropout and batch normal- ization by variance shift. arXiv preprint arXiv:1801.05134, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Network In Network

Min Lin, Qiang Chen, and Shuicheng Yan. Network in net- work. arXiv preprint arXiv:1312.4400, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[16] [16]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

work page 2017

[17] [17]

A survey on deep learning in medical image analysis

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Ar- naud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Gin- neken, and Clara I S ´anchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60–88, 2017

work page 2017

[18] [18]

Make (Nearly) Every Neural Network Better: Generating Neural Network Ensembles by Weight Parameter Resampling

Jiayi Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. Make (nearly) every neural network better: Generating neural network ensembles by weight parameter resampling. arXiv preprint arXiv:1807.00847, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Detecting cancer metastases on gigapixel pathol- ogy images

Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venu- gopalan, Aleksei Timofeev, Philip Q Nelson, Greg S Cor- rado, et al. Detecting cancer metastases on gigapixel pathol- ogy images. arXiv preprint arXiv:1703.02442, 2017

work page arXiv 2017

[20] [20]

Improving Neural Architecture Search Image Classifiers via Ensemble Learning

Vladimir Macko, Charles Weill, Hanna Mazzawi, and Javier Gonzalvo. Improving neural architecture search image classiﬁers via ensemble learning. arXiv preprint arXiv:1903.06236, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[21] [21]

Histopathological breast cancer image classiﬁcation by deep neural network techniques guided by local cluster- ing

Abdullah-Al Nahid, Mohamad Ali Mehrabi, and Yinan Kong. Histopathological breast cancer image classiﬁcation by deep neural network techniques guided by local cluster- ing. BioMed Research International, 2018

work page 2018

[22] [22]

H. Pang, W. Lin, C. Wang, and C. Zhao. Using transfer learn- ing to detect breast cancer without network training. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), pages 381–385, Nov 2018

work page 2018

[23] [23]

On the difﬁculty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difﬁculty of training recurrent neural networks. In Inter- national conference on machine learning, pages 1310–1318, 2013

work page 2013

[24] [24]

Cyclical learning rates for training neural networks

Leslie N Smith. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. IEEE, 2017

work page 2017

[25] [25]

A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay

Leslie N Smith. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momen- 7 tum, and weight decay. arXiv preprint arXiv:1803.09820 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Super-convergence: Very fast training of residual networks using large learning rates

Leslie N Smith and Nicholay Topin. Super-convergence: Very fast training of residual networks using large learning rates. 2018

work page 2018

[27] [27]

A dataset for breast cancer histopathologi- cal image classiﬁcation

Fabio A Spanhol, Luiz S Oliveira, Caroline Petitjean, and Laurent Heutte. A dataset for breast cancer histopathologi- cal image classiﬁcation. IEEE Transactions on Biomedical Engineering, 63(7):1455–1462, 2016

work page 2016

[28] [28]

Dropout: a simple way to prevent neural networks from overﬁtting

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014

work page 1929

[29] [29]

Training very deep networks

Rupesh K Srivastava, Klaus Greff, and J ¨urgen Schmidhuber. Training very deep networks. In Advances in neural infor- mation processing systems, pages 2377–2385, 2015

work page 2015

[30] [30]

Steiner, Robert MacDonald, Yun Liu, Peter Truszkowski, Jason D

David F. Steiner, Robert MacDonald, Yun Liu, Peter Truszkowski, Jason D. Hipp, Christopher Gammage, Flo- rence Thng, Lily Peng, and Martin C. Stumpe. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer, 2018

work page 2018

[31] [31]

Going deeper with convolutions

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

work page 2015

[32] [32]

High-performance medicine: the conver- gence of human and artiﬁcial intelligence

Eric J Topol. High-performance medicine: the conver- gence of human and artiﬁcial intelligence. Nature medicine, 25(1):44, 2019

work page 2019

[33] [33]

Rotation equivariant cnns for digital pathology

Bastiaan S Veeling, Jasper Linmans, Jim Winkens, Taco Co- hen, and Max Welling. Rotation equivariant cnns for digital pathology. In International Conference on Medical image computing and computer-assisted intervention , pages 210–

work page

[34] [35]

Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, and Andrew H. Beck. Deep Learning for Iden- tifying Metastatic Breast Cancer. arXiv e-prints , page arXiv:1606.05718, Jun 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[35] [36]

Automatic brain tumor segmentation using con- volutional neural networks with test-time augmentation

Guotai Wang, Wenqi Li, S ´ebastien Ourselin, and Tom Ver- cauteren. Automatic brain tumor segmentation using con- volutional neural networks with test-time augmentation. In International MICCAI Brainlesion Workshop, pages 61–72. Springer, 2018

work page 2018

[36] [37]

Computer aided lung cancer diagnosis with deep learning algorithms, 2016

Wei Qian Wenqing Sun, Bin Zheng. Computer aided lung cancer diagnosis with deep learning algorithms, 2016

work page 2016

[37] [38]

Shufﬂenet: An extremely efﬁcient convolutional neural net- work for mobile devices

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufﬂenet: An extremely efﬁcient convolutional neural net- work for mobile devices. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 6848–6856, 2018. 8

work page 2018