Learning from Thresholds: Fully Automated Classification of Tumor Infiltrating Lymphocytes for Multiple Cancer Types

Chao Chen; Dimitris Samaras; Joel Saltz; Le Hou; Rajarsi Gupta; Rebecca Batiste; Shahira Abousamra; Shroyer Kenneth; Tahsin Kurc; Tianhao Zhao

arxiv: 1907.03960 · v1 · pith:ZYZHXNCLnew · submitted 2019-07-09 · 📡 eess.IV · cs.CV

Learning from Thresholds: Fully Automated Classification of Tumor Infiltrating Lymphocytes for Multiple Cancer Types

Shahira Abousamra , Le Hou , Rajarsi Gupta , Chao Chen , Dimitris Samaras , Tahsin Kurc , Rebecca Batiste , Tianhao Zhao

show 2 more authors

Shroyer Kenneth Joel Saltz

This is my paper

Pith reviewed 2026-05-25 00:18 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords tumor infiltrating lymphocytesdigital pathologysemi-automated annotationdeep learningwhole slide imagesH&E imagesmulti-cancer classification

0 comments

The pith

Semi-automated use of prior thresholds trains one network that beats human-adjusted TIL predictions across 12 cancer types

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes reusing thresholded outputs from earlier single-cancer TIL classifiers as large-scale semi-automatic annotations. These labels are combined with existing manual annotations to train deep networks for classifying tumor infiltrating lymphocytes in H&E whole slide images. The resulting models apply automatically to 12 cancer types without requiring human threshold adjustments for each type. A reader would care because manual annotation of pathology data is time-consuming and expensive, and this method scales training data to capture visual differences across cancers.

Core claim

By treating thresholded results from prior per-cancer TIL classifiers as semi-automatic annotations, the authors train deep networks that, combined with manual annotations, automatically produce better TIL prediction results in 12 cancer types than the original human-in-the-loop threshold adjustment approach.

What carries the argument

The semi-automated annotation method that converts thresholded prior outputs into training labels for a unified multi-cancer deep network.

Load-bearing premise

The thresholded results from prior per-cancer approaches supply sufficiently accurate and representative annotations that capture visual variability across cancer types without introducing systematic bias into the new multi-cancer model.

What would settle it

A side-by-side test on held-out data from the 12 cancer types in which the new multi-cancer network fails to exceed the accuracy of the original human-in-the-loop methods would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.03960 by Chao Chen, Dimitris Samaras, Joel Saltz, Le Hou, Rajarsi Gupta, Rebecca Batiste, Shahira Abousamra, Shroyer Kenneth, Tahsin Kurc, Tianhao Zhao.

**Figure 1.** Figure 1: The problem of identifying Tumor Infiltrating Lymphocyte (TIL) regions in gigapixel pathology WSIs of 12 cancer types. (A). H&E stained WSI of lung adenocarcinoma. (B). Example of a region of tissue. (C). Example of a thresholded TIL map overlaid on the region of tissue. (D). Examples of TIL positive (framed in red) and negative (framed in green) patches. A lymphocyte is typically dark, round to ovoid, an… view at source ↗

**Figure 2.** Figure 2: Detailed patch classification results on the remaining 10 cancer types (results on LUAD and BRCA are in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of various models on identifying regions with low to high TILs. x-axis: ground truth labels of low/medium/high TILs; y-axis: TIL prediction results. 3.2 Identifying regions with low/medium/high TILs We also evaluated the performance of these models in terms of identifying the amount of lymphocytes in tissue regions using a binary TIL classifier. For this purpose, three pathologists labeled larg… view at source ↗

read the original abstract

Deep learning classifiers for characterization of whole slide tissue morphology require large volumes of annotated data to learn variations across different tissue and cancer types. As is well known, manual generation of digital pathology training data is time consuming and expensive. In this paper, we propose a semi-automated method for annotating a group of similar instances at once, instead of collecting only per-instance manual annotations. This allows for a much larger training set, that reflects visual variability across multiple cancer types and thus training of a single network which can be automatically applied to each cancer type without human adjustment. We apply our method to the important task of classifying Tumor Infiltrating Lymphocytes (TILs) in H&E images. Prior approaches were trained for individual cancer types, with smaller training sets and human-in-the-loop threshold adjustment. We utilize these thresholded results as large scale "semi-automatic" annotations. Combined with existing manual annotations, our trained deep networks are able to automatically produce better TIL prediction results in 12 cancer types, compared to the human-in-the-loop approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical semi-auto labeling shortcut for multi-cancer TIL scoring by pooling thresholded outputs from earlier single-cancer models, but the gain over human-in-the-loop baselines is not yet shown to be free of inherited bias.

read the letter

The core move here is to treat thresholded predictions from prior per-cancer TIL models as cheap bulk labels, mix them with the existing manual annotations, and train one network that works across 12 cancer types without per-type threshold tuning. That is a straightforward engineering step that directly attacks the annotation cost problem in digital pathology. It is new in the sense that it demonstrates the multi-cancer application at scale rather than leaving the method as a per-cancer human-in-the-loop exercise. The abstract is clear about the motivation and the intended workflow. Credit is due for keeping the claim modest: they are not claiming a new architecture or a theoretical advance, just a labeling shortcut that lets them train on more data. The soft spot is exactly the one the stress-test flags. The thresholded outputs carry whatever systematic errors the earlier models had—staining artifacts, morphology differences, or threshold choices that over- or under-call TILs in particular cancers. Pooling them does not automatically cancel those errors, and adding a smaller set of manual labels does not guarantee the new model escapes the same decision boundaries. Without explicit checks (independent pathologist review on held-out slides, error analysis stratified by cancer type, or comparison against a fully manual multi-cancer baseline) it is difficult to know whether the reported improvement is real generalization or just reproduction of the old labeling artifacts. The abstract itself gives no numbers, so the full paper will need to supply those controls. This is the kind of methods paper that belongs in a computational pathology venue. A serious editor should send it to review; the idea is usable and the limitation is fixable with the right validation experiments. Readers working on weak supervision or digital pathology annotation pipelines will find it worth reading even if the final performance numbers need tightening.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a semi-automated annotation approach for Tumor Infiltrating Lymphocytes (TILs) in H&E whole-slide images. It re-uses thresholded outputs from prior per-cancer human-in-the-loop classifiers as large-scale training labels, augments them with existing manual annotations, and trains a single deep network claimed to generalize across 12 cancer types and outperform the original per-cancer human-in-the-loop pipelines.

Significance. If the central claim holds after rigorous validation of label quality, the work would demonstrate a practical route to scaling TIL classifiers with reduced manual effort while improving cross-cancer generalization. The approach directly targets the annotation bottleneck in computational pathology and could be extended to other morphology tasks.

major comments (2)

[Abstract] Abstract: the central claim that the multi-cancer network produces 'better TIL prediction results' is unsupported by any reported quantitative metrics, baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether reported gains reflect genuine improvement or artifacts of the label-generation process.
[Abstract] Abstract (method description): the training labels are generated by thresholding outputs of prior per-cancer models; no verification is supplied that these labels are accurate or free of systematic bias (e.g., consistent over- or under-calling due to staining, morphology, or threshold choice). If such bias exists, the new network will internalize it, undermining the claim of improved generalization across 12 cancer types.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our semi-automated TIL annotation approach. The comments highlight the need for clearer quantitative support in the abstract and explicit validation of the generated labels. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the multi-cancer network produces 'better TIL prediction results' is unsupported by any reported quantitative metrics, baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether reported gains reflect genuine improvement or artifacts of the label-generation process.

Authors: The full manuscript reports quantitative comparisons in the Results section, including AUC and F1 scores on held-out test sets across the 12 cancer types, with the single multi-cancer model outperforming the original per-cancer human-in-the-loop pipelines. The evaluation uses the same test data as the prior works for direct baseline comparison. We agree the abstract is too concise and will revise it to include key metrics, a brief description of the cross-validation protocol, and reference to the error analysis already present in the body of the paper. revision: yes
Referee: [Abstract] Abstract (method description): the training labels are generated by thresholding outputs of prior per-cancer models; no verification is supplied that these labels are accurate or free of systematic bias (e.g., consistent over- or under-calling due to staining, morphology, or threshold choice). If such bias exists, the new network will internalize it, undermining the claim of improved generalization across 12 cancer types.

Authors: The threshold values and per-cancer models originate from previously published and independently validated TIL classifiers. We combined these with existing manual annotations to form the training set. We acknowledge that the current manuscript does not include a dedicated quantitative check for systematic bias in the thresholded labels across all cancer types. We will add a new analysis subsection that measures agreement (e.g., Dice or pixel-level concordance) between the thresholded outputs and available manual annotations on overlapping slides to address this concern directly. revision: yes

Circularity Check

0 steps flagged

No circularity: training on external thresholded labels does not reduce the multi-cancer model output to its inputs by construction

full rationale

The paper trains a deep network on a combination of manual annotations and thresholded outputs from prior per-cancer human-in-the-loop methods, then claims the resulting model yields better TIL predictions across 12 cancer types than the original per-cancer approaches. This is a standard semi-supervised training setup whose output (the trained network) is not definitionally identical to the input labels, nor is any performance metric shown to be a direct algebraic rearrangement of the training labels. No self-definitional equations, fitted-input-renamed-as-prediction, load-bearing self-citations, or ansatz smuggling appear in the abstract or described method. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5751 in / 1003 out tokens · 22135 ms · 2026-05-25T00:18:45.390483+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Current opinion in im- munology 25(2), 261–267 (2013)

Angell, H., Galon, J.: From the immune contexture to the immunoscore: the role of prognostic and predictive immune markers in cancer. Current opinion in im- munology 25(2), 261–267 (2013)

work page 2013
[2]

In: Cancer Research

Barnes, M., Sarkar, A., Redman, R., et al.: Development of a histology-based digital pathology image analysis algorithm for assessment of tumor inﬁltrating lympho- cytes in her2+ breast cancer. In: Cancer Research. vol. 78 (2018)

work page 2018
[3]

In: Medical Imaging 2018: Digital Pathology

Corredor, G., Wang, X., Lu, C., et al.: A watershed and feature-based approach for automated detection of lymphocytes on lung cancer images. In: Medical Imaging 2018: Digital Pathology. vol. 10581, p. 105810R. International Society for Optics and Photonics (2018)

work page 2018
[4]

Garcia, R

E. Garcia, R. Hermoza, C.B.C., et al.: Automatic lymphocyte detection on gastric cancer ihc images using deep learning. In: 2017 IEEE 30th International Sympo- sium on Computer-Based Medical Systems (CBMS). pp. 200–204 (June 2017)

work page 2017
[5]

Advances in anatomic pathology 24(6), 311–335 (2017)

Hendry, S., Salgado, R., Gevaert, T., et al.: Assessing tumor-inﬁltrating lym- phocytes in solid tumors: A practical review for pathologists and proposal for a standardized method from the international immuno-oncology biomarkers working group: Part 2. Advances in anatomic pathology 24(6), 311–335 (2017)

work page 2017
[6]

Pattern recognition 86, 188–200 (2019)

Hou, L., Nguyen, V., Kanevsky, A.B., et al.: Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern recognition 86, 188–200 (2019)

work page 2019
[7]

Advances in Anatomic Pathology 24(6), 311–335 (2016)

John, M., Salgado, R., Gevaert, T., et al.: Assessing tumor-inﬁltrating lymphocytes in solid tumors. Advances in Anatomic Pathology 24(6), 311–335 (2016)

work page 2016
[8]

Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)

Linder, N., Taylor, J.C., Colling, R., et al.: Deep learning for detecting tumour- inﬁltrating lymphocytes in testicular germ cell tumours. Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)

work page 2019
[9]

Journal of clinical oncology 29(15), 1949–1955 (2011)

Mahmoud, S.M., Paish, E.C., Powe, D.G., et al.: Tumor-inﬁltrating cd8+ lympho- cytes predict clinical outcome in breast cancer. Journal of clinical oncology 29(15), 1949–1955 (2011)

work page 1949
[10]

Cancer and Metastasis Reviews 30(1), 5–12 (2011)

Mlecnik, B., Bindea, G., Pag` es, F., Galon, J.: Tumor immunosurveillance in human cancers. Cancer and Metastasis Reviews 30(1), 5–12 (2011)

work page 2011
[11]

Journal of clinical oncology 29(6), 610–618 (2011)

Mlecnik, B., Tosolini, M., Kirilovsky, A., et al.: Histopathologic-based prognostic factors of colorectal cancers are associated with the state of the local immune reaction. Journal of clinical oncology 29(6), 610–618 (2011)

work page 2011
[12]

Annals of oncology 26(2), 259–271 (2014)

Salgado, R., Denkert, C., Demaria, S., et al.: The evaluation of tumor-inﬁltrating lymphocytes (tils) in breast cancer: recommendations by an international tils work- ing group 2014. Annals of oncology 26(2), 259–271 (2014)

work page 2014
[13]

Cell Reports 23(1), 181 – 193.e7 (2018)

Saltz, J., Gupta, R., Hou, L., et al.: Spatial organization and molecular correlation of tumor-inﬁltrating lymphocytes using deep learning on pathology images. Cell Reports 23(1), 181 – 193.e7 (2018)

work page 2018
[14]

In: ICLR (2015)

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

work page 2015
[15]

Journal for immunotherapy of cancer 6(1), 20 (2018)

Steele, K.E., Tan, T.H., Korn, R., et al.: Measuring multiple parameters of cd8+ tumor-inﬁltrating lymphocytes in human cancers by image analysis. Journal for immunotherapy of cancer 6(1), 20 (2018)

work page 2018
[16]

In: Thirty-First AAAI Confer- ence on Artiﬁcial Intelligence (2017) 9

Szegedy, C., Ioﬀe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Confer- ence on Artiﬁcial Intelligence (2017) 9

work page 2017
[17]

https://tcga-data.nci

TCGA development team: The Cancer Genome Atlas. https://tcga-data.nci. nih.gov/docs/publications/tcga/

work page
[18]

Immunity 48(4), 812–830 (2018)

Thorsson, V., Gibbs, D.L., Brown, S.D., et al.: The immune landscape of cancer. Immunity 48(4), 812–830 (2018)

work page 2018
[19]

Cancer 3(1), 32–35 (1950)

Youden, W.J.: Index for rating diagnostic tests. Cancer 3(1), 32–35 (1950)

work page 1950

[1] [1]

Current opinion in im- munology 25(2), 261–267 (2013)

Angell, H., Galon, J.: From the immune contexture to the immunoscore: the role of prognostic and predictive immune markers in cancer. Current opinion in im- munology 25(2), 261–267 (2013)

work page 2013

[2] [2]

In: Cancer Research

Barnes, M., Sarkar, A., Redman, R., et al.: Development of a histology-based digital pathology image analysis algorithm for assessment of tumor inﬁltrating lympho- cytes in her2+ breast cancer. In: Cancer Research. vol. 78 (2018)

work page 2018

[3] [3]

In: Medical Imaging 2018: Digital Pathology

Corredor, G., Wang, X., Lu, C., et al.: A watershed and feature-based approach for automated detection of lymphocytes on lung cancer images. In: Medical Imaging 2018: Digital Pathology. vol. 10581, p. 105810R. International Society for Optics and Photonics (2018)

work page 2018

[4] [4]

Garcia, R

E. Garcia, R. Hermoza, C.B.C., et al.: Automatic lymphocyte detection on gastric cancer ihc images using deep learning. In: 2017 IEEE 30th International Sympo- sium on Computer-Based Medical Systems (CBMS). pp. 200–204 (June 2017)

work page 2017

[5] [5]

Advances in anatomic pathology 24(6), 311–335 (2017)

Hendry, S., Salgado, R., Gevaert, T., et al.: Assessing tumor-inﬁltrating lym- phocytes in solid tumors: A practical review for pathologists and proposal for a standardized method from the international immuno-oncology biomarkers working group: Part 2. Advances in anatomic pathology 24(6), 311–335 (2017)

work page 2017

[6] [6]

Pattern recognition 86, 188–200 (2019)

Hou, L., Nguyen, V., Kanevsky, A.B., et al.: Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern recognition 86, 188–200 (2019)

work page 2019

[7] [7]

Advances in Anatomic Pathology 24(6), 311–335 (2016)

John, M., Salgado, R., Gevaert, T., et al.: Assessing tumor-inﬁltrating lymphocytes in solid tumors. Advances in Anatomic Pathology 24(6), 311–335 (2016)

work page 2016

[8] [8]

Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)

Linder, N., Taylor, J.C., Colling, R., et al.: Deep learning for detecting tumour- inﬁltrating lymphocytes in testicular germ cell tumours. Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)

work page 2019

[9] [9]

Journal of clinical oncology 29(15), 1949–1955 (2011)

Mahmoud, S.M., Paish, E.C., Powe, D.G., et al.: Tumor-inﬁltrating cd8+ lympho- cytes predict clinical outcome in breast cancer. Journal of clinical oncology 29(15), 1949–1955 (2011)

work page 1949

[10] [10]

Cancer and Metastasis Reviews 30(1), 5–12 (2011)

Mlecnik, B., Bindea, G., Pag` es, F., Galon, J.: Tumor immunosurveillance in human cancers. Cancer and Metastasis Reviews 30(1), 5–12 (2011)

work page 2011

[11] [11]

Journal of clinical oncology 29(6), 610–618 (2011)

Mlecnik, B., Tosolini, M., Kirilovsky, A., et al.: Histopathologic-based prognostic factors of colorectal cancers are associated with the state of the local immune reaction. Journal of clinical oncology 29(6), 610–618 (2011)

work page 2011

[12] [12]

Annals of oncology 26(2), 259–271 (2014)

Salgado, R., Denkert, C., Demaria, S., et al.: The evaluation of tumor-inﬁltrating lymphocytes (tils) in breast cancer: recommendations by an international tils work- ing group 2014. Annals of oncology 26(2), 259–271 (2014)

work page 2014

[13] [13]

Cell Reports 23(1), 181 – 193.e7 (2018)

Saltz, J., Gupta, R., Hou, L., et al.: Spatial organization and molecular correlation of tumor-inﬁltrating lymphocytes using deep learning on pathology images. Cell Reports 23(1), 181 – 193.e7 (2018)

work page 2018

[14] [14]

In: ICLR (2015)

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

work page 2015

[15] [15]

Journal for immunotherapy of cancer 6(1), 20 (2018)

Steele, K.E., Tan, T.H., Korn, R., et al.: Measuring multiple parameters of cd8+ tumor-inﬁltrating lymphocytes in human cancers by image analysis. Journal for immunotherapy of cancer 6(1), 20 (2018)

work page 2018

[16] [16]

In: Thirty-First AAAI Confer- ence on Artiﬁcial Intelligence (2017) 9

Szegedy, C., Ioﬀe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Confer- ence on Artiﬁcial Intelligence (2017) 9

work page 2017

[17] [17]

https://tcga-data.nci

TCGA development team: The Cancer Genome Atlas. https://tcga-data.nci. nih.gov/docs/publications/tcga/

work page

[18] [18]

Immunity 48(4), 812–830 (2018)

Thorsson, V., Gibbs, D.L., Brown, S.D., et al.: The immune landscape of cancer. Immunity 48(4), 812–830 (2018)

work page 2018

[19] [19]

Cancer 3(1), 32–35 (1950)

Youden, W.J.: Index for rating diagnostic tests. Cancer 3(1), 32–35 (1950)

work page 1950