pith. sign in

arxiv: 1907.03960 · v1 · pith:ZYZHXNCLnew · submitted 2019-07-09 · 📡 eess.IV · cs.CV

Learning from Thresholds: Fully Automated Classification of Tumor Infiltrating Lymphocytes for Multiple Cancer Types

Pith reviewed 2026-05-25 00:18 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords tumor infiltrating lymphocytesdigital pathologysemi-automated annotationdeep learningwhole slide imagesH&E imagesmulti-cancer classification
0
0 comments X

The pith

Semi-automated use of prior thresholds trains one network that beats human-adjusted TIL predictions across 12 cancer types

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes reusing thresholded outputs from earlier single-cancer TIL classifiers as large-scale semi-automatic annotations. These labels are combined with existing manual annotations to train deep networks for classifying tumor infiltrating lymphocytes in H&E whole slide images. The resulting models apply automatically to 12 cancer types without requiring human threshold adjustments for each type. A reader would care because manual annotation of pathology data is time-consuming and expensive, and this method scales training data to capture visual differences across cancers.

Core claim

By treating thresholded results from prior per-cancer TIL classifiers as semi-automatic annotations, the authors train deep networks that, combined with manual annotations, automatically produce better TIL prediction results in 12 cancer types than the original human-in-the-loop threshold adjustment approach.

What carries the argument

The semi-automated annotation method that converts thresholded prior outputs into training labels for a unified multi-cancer deep network.

Load-bearing premise

The thresholded results from prior per-cancer approaches supply sufficiently accurate and representative annotations that capture visual variability across cancer types without introducing systematic bias into the new multi-cancer model.

What would settle it

A side-by-side test on held-out data from the 12 cancer types in which the new multi-cancer network fails to exceed the accuracy of the original human-in-the-loop methods would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.03960 by Chao Chen, Dimitris Samaras, Joel Saltz, Le Hou, Rajarsi Gupta, Rebecca Batiste, Shahira Abousamra, Shroyer Kenneth, Tahsin Kurc, Tianhao Zhao.

Figure 1
Figure 1. Figure 1: The problem of identifying Tumor Infiltrating Lymphocyte (TIL) regions in gigapixel pathology WSIs of 12 cancer types. (A). H&E stained WSI of lung adeno￾carcinoma. (B). Example of a region of tissue. (C). Example of a thresholded TIL map overlaid on the region of tissue. (D). Examples of TIL positive (framed in red) and negative (framed in green) patches. A lymphocyte is typically dark, round to ovoid, an… view at source ↗
Figure 2
Figure 2. Figure 2: Detailed patch classification results on the remaining 10 cancer types (results on LUAD and BRCA are in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of various models on identifying regions with low to high TILs. x-axis: ground truth labels of low/medium/high TILs; y-axis: TIL prediction results. 3.2 Identifying regions with low/medium/high TILs We also evaluated the performance of these models in terms of identifying the amount of lymphocytes in tissue regions using a binary TIL classifier. For this purpose, three pathologists labeled larg… view at source ↗
read the original abstract

Deep learning classifiers for characterization of whole slide tissue morphology require large volumes of annotated data to learn variations across different tissue and cancer types. As is well known, manual generation of digital pathology training data is time consuming and expensive. In this paper, we propose a semi-automated method for annotating a group of similar instances at once, instead of collecting only per-instance manual annotations. This allows for a much larger training set, that reflects visual variability across multiple cancer types and thus training of a single network which can be automatically applied to each cancer type without human adjustment. We apply our method to the important task of classifying Tumor Infiltrating Lymphocytes (TILs) in H&E images. Prior approaches were trained for individual cancer types, with smaller training sets and human-in-the-loop threshold adjustment. We utilize these thresholded results as large scale "semi-automatic" annotations. Combined with existing manual annotations, our trained deep networks are able to automatically produce better TIL prediction results in 12 cancer types, compared to the human-in-the-loop approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a semi-automated annotation approach for Tumor Infiltrating Lymphocytes (TILs) in H&E whole-slide images. It re-uses thresholded outputs from prior per-cancer human-in-the-loop classifiers as large-scale training labels, augments them with existing manual annotations, and trains a single deep network claimed to generalize across 12 cancer types and outperform the original per-cancer human-in-the-loop pipelines.

Significance. If the central claim holds after rigorous validation of label quality, the work would demonstrate a practical route to scaling TIL classifiers with reduced manual effort while improving cross-cancer generalization. The approach directly targets the annotation bottleneck in computational pathology and could be extended to other morphology tasks.

major comments (2)
  1. [Abstract] Abstract: the central claim that the multi-cancer network produces 'better TIL prediction results' is unsupported by any reported quantitative metrics, baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether reported gains reflect genuine improvement or artifacts of the label-generation process.
  2. [Abstract] Abstract (method description): the training labels are generated by thresholding outputs of prior per-cancer models; no verification is supplied that these labels are accurate or free of systematic bias (e.g., consistent over- or under-calling due to staining, morphology, or threshold choice). If such bias exists, the new network will internalize it, undermining the claim of improved generalization across 12 cancer types.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our semi-automated TIL annotation approach. The comments highlight the need for clearer quantitative support in the abstract and explicit validation of the generated labels. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the multi-cancer network produces 'better TIL prediction results' is unsupported by any reported quantitative metrics, baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether reported gains reflect genuine improvement or artifacts of the label-generation process.

    Authors: The full manuscript reports quantitative comparisons in the Results section, including AUC and F1 scores on held-out test sets across the 12 cancer types, with the single multi-cancer model outperforming the original per-cancer human-in-the-loop pipelines. The evaluation uses the same test data as the prior works for direct baseline comparison. We agree the abstract is too concise and will revise it to include key metrics, a brief description of the cross-validation protocol, and reference to the error analysis already present in the body of the paper. revision: yes

  2. Referee: [Abstract] Abstract (method description): the training labels are generated by thresholding outputs of prior per-cancer models; no verification is supplied that these labels are accurate or free of systematic bias (e.g., consistent over- or under-calling due to staining, morphology, or threshold choice). If such bias exists, the new network will internalize it, undermining the claim of improved generalization across 12 cancer types.

    Authors: The threshold values and per-cancer models originate from previously published and independently validated TIL classifiers. We combined these with existing manual annotations to form the training set. We acknowledge that the current manuscript does not include a dedicated quantitative check for systematic bias in the thresholded labels across all cancer types. We will add a new analysis subsection that measures agreement (e.g., Dice or pixel-level concordance) between the thresholded outputs and available manual annotations on overlapping slides to address this concern directly. revision: yes

Circularity Check

0 steps flagged

No circularity: training on external thresholded labels does not reduce the multi-cancer model output to its inputs by construction

full rationale

The paper trains a deep network on a combination of manual annotations and thresholded outputs from prior per-cancer human-in-the-loop methods, then claims the resulting model yields better TIL predictions across 12 cancer types than the original per-cancer approaches. This is a standard semi-supervised training setup whose output (the trained network) is not definitionally identical to the input labels, nor is any performance metric shown to be a direct algebraic rearrangement of the training labels. No self-definitional equations, fitted-input-renamed-as-prediction, load-bearing self-citations, or ansatz smuggling appear in the abstract or described method. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5751 in / 1003 out tokens · 22135 ms · 2026-05-25T00:18:45.390483+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Current opinion in im- munology 25(2), 261–267 (2013)

    Angell, H., Galon, J.: From the immune contexture to the immunoscore: the role of prognostic and predictive immune markers in cancer. Current opinion in im- munology 25(2), 261–267 (2013)

  2. [2]

    In: Cancer Research

    Barnes, M., Sarkar, A., Redman, R., et al.: Development of a histology-based digital pathology image analysis algorithm for assessment of tumor infiltrating lympho- cytes in her2+ breast cancer. In: Cancer Research. vol. 78 (2018)

  3. [3]

    In: Medical Imaging 2018: Digital Pathology

    Corredor, G., Wang, X., Lu, C., et al.: A watershed and feature-based approach for automated detection of lymphocytes on lung cancer images. In: Medical Imaging 2018: Digital Pathology. vol. 10581, p. 105810R. International Society for Optics and Photonics (2018)

  4. [4]

    Garcia, R

    E. Garcia, R. Hermoza, C.B.C., et al.: Automatic lymphocyte detection on gastric cancer ihc images using deep learning. In: 2017 IEEE 30th International Sympo- sium on Computer-Based Medical Systems (CBMS). pp. 200–204 (June 2017)

  5. [5]

    Advances in anatomic pathology 24(6), 311–335 (2017)

    Hendry, S., Salgado, R., Gevaert, T., et al.: Assessing tumor-infiltrating lym- phocytes in solid tumors: A practical review for pathologists and proposal for a standardized method from the international immuno-oncology biomarkers working group: Part 2. Advances in anatomic pathology 24(6), 311–335 (2017)

  6. [6]

    Pattern recognition 86, 188–200 (2019)

    Hou, L., Nguyen, V., Kanevsky, A.B., et al.: Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern recognition 86, 188–200 (2019)

  7. [7]

    Advances in Anatomic Pathology 24(6), 311–335 (2016)

    John, M., Salgado, R., Gevaert, T., et al.: Assessing tumor-infiltrating lymphocytes in solid tumors. Advances in Anatomic Pathology 24(6), 311–335 (2016)

  8. [8]

    Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)

    Linder, N., Taylor, J.C., Colling, R., et al.: Deep learning for detecting tumour- infiltrating lymphocytes in testicular germ cell tumours. Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)

  9. [9]

    Journal of clinical oncology 29(15), 1949–1955 (2011)

    Mahmoud, S.M., Paish, E.C., Powe, D.G., et al.: Tumor-infiltrating cd8+ lympho- cytes predict clinical outcome in breast cancer. Journal of clinical oncology 29(15), 1949–1955 (2011)

  10. [10]

    Cancer and Metastasis Reviews 30(1), 5–12 (2011)

    Mlecnik, B., Bindea, G., Pag` es, F., Galon, J.: Tumor immunosurveillance in human cancers. Cancer and Metastasis Reviews 30(1), 5–12 (2011)

  11. [11]

    Journal of clinical oncology 29(6), 610–618 (2011)

    Mlecnik, B., Tosolini, M., Kirilovsky, A., et al.: Histopathologic-based prognostic factors of colorectal cancers are associated with the state of the local immune reaction. Journal of clinical oncology 29(6), 610–618 (2011)

  12. [12]

    Annals of oncology 26(2), 259–271 (2014)

    Salgado, R., Denkert, C., Demaria, S., et al.: The evaluation of tumor-infiltrating lymphocytes (tils) in breast cancer: recommendations by an international tils work- ing group 2014. Annals of oncology 26(2), 259–271 (2014)

  13. [13]

    Cell Reports 23(1), 181 – 193.e7 (2018)

    Saltz, J., Gupta, R., Hou, L., et al.: Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Reports 23(1), 181 – 193.e7 (2018)

  14. [14]

    In: ICLR (2015)

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

  15. [15]

    Journal for immunotherapy of cancer 6(1), 20 (2018)

    Steele, K.E., Tan, T.H., Korn, R., et al.: Measuring multiple parameters of cd8+ tumor-infiltrating lymphocytes in human cancers by image analysis. Journal for immunotherapy of cancer 6(1), 20 (2018)

  16. [16]

    In: Thirty-First AAAI Confer- ence on Artificial Intelligence (2017) 9

    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Confer- ence on Artificial Intelligence (2017) 9

  17. [17]

    https://tcga-data.nci

    TCGA development team: The Cancer Genome Atlas. https://tcga-data.nci. nih.gov/docs/publications/tcga/

  18. [18]

    Immunity 48(4), 812–830 (2018)

    Thorsson, V., Gibbs, D.L., Brown, S.D., et al.: The immune landscape of cancer. Immunity 48(4), 812–830 (2018)

  19. [19]

    Cancer 3(1), 32–35 (1950)

    Youden, W.J.: Index for rating diagnostic tests. Cancer 3(1), 32–35 (1950)