Learning from Thresholds: Fully Automated Classification of Tumor Infiltrating Lymphocytes for Multiple Cancer Types
Pith reviewed 2026-05-25 00:18 UTC · model grok-4.3
The pith
Semi-automated use of prior thresholds trains one network that beats human-adjusted TIL predictions across 12 cancer types
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By treating thresholded results from prior per-cancer TIL classifiers as semi-automatic annotations, the authors train deep networks that, combined with manual annotations, automatically produce better TIL prediction results in 12 cancer types than the original human-in-the-loop threshold adjustment approach.
What carries the argument
The semi-automated annotation method that converts thresholded prior outputs into training labels for a unified multi-cancer deep network.
Load-bearing premise
The thresholded results from prior per-cancer approaches supply sufficiently accurate and representative annotations that capture visual variability across cancer types without introducing systematic bias into the new multi-cancer model.
What would settle it
A side-by-side test on held-out data from the 12 cancer types in which the new multi-cancer network fails to exceed the accuracy of the original human-in-the-loop methods would falsify the claim.
Figures
read the original abstract
Deep learning classifiers for characterization of whole slide tissue morphology require large volumes of annotated data to learn variations across different tissue and cancer types. As is well known, manual generation of digital pathology training data is time consuming and expensive. In this paper, we propose a semi-automated method for annotating a group of similar instances at once, instead of collecting only per-instance manual annotations. This allows for a much larger training set, that reflects visual variability across multiple cancer types and thus training of a single network which can be automatically applied to each cancer type without human adjustment. We apply our method to the important task of classifying Tumor Infiltrating Lymphocytes (TILs) in H&E images. Prior approaches were trained for individual cancer types, with smaller training sets and human-in-the-loop threshold adjustment. We utilize these thresholded results as large scale "semi-automatic" annotations. Combined with existing manual annotations, our trained deep networks are able to automatically produce better TIL prediction results in 12 cancer types, compared to the human-in-the-loop approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a semi-automated annotation approach for Tumor Infiltrating Lymphocytes (TILs) in H&E whole-slide images. It re-uses thresholded outputs from prior per-cancer human-in-the-loop classifiers as large-scale training labels, augments them with existing manual annotations, and trains a single deep network claimed to generalize across 12 cancer types and outperform the original per-cancer human-in-the-loop pipelines.
Significance. If the central claim holds after rigorous validation of label quality, the work would demonstrate a practical route to scaling TIL classifiers with reduced manual effort while improving cross-cancer generalization. The approach directly targets the annotation bottleneck in computational pathology and could be extended to other morphology tasks.
major comments (2)
- [Abstract] Abstract: the central claim that the multi-cancer network produces 'better TIL prediction results' is unsupported by any reported quantitative metrics, baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether reported gains reflect genuine improvement or artifacts of the label-generation process.
- [Abstract] Abstract (method description): the training labels are generated by thresholding outputs of prior per-cancer models; no verification is supplied that these labels are accurate or free of systematic bias (e.g., consistent over- or under-calling due to staining, morphology, or threshold choice). If such bias exists, the new network will internalize it, undermining the claim of improved generalization across 12 cancer types.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our semi-automated TIL annotation approach. The comments highlight the need for clearer quantitative support in the abstract and explicit validation of the generated labels. We address each point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the multi-cancer network produces 'better TIL prediction results' is unsupported by any reported quantitative metrics, baseline comparisons, cross-validation protocol, or error analysis. Without these, it is impossible to determine whether reported gains reflect genuine improvement or artifacts of the label-generation process.
Authors: The full manuscript reports quantitative comparisons in the Results section, including AUC and F1 scores on held-out test sets across the 12 cancer types, with the single multi-cancer model outperforming the original per-cancer human-in-the-loop pipelines. The evaluation uses the same test data as the prior works for direct baseline comparison. We agree the abstract is too concise and will revise it to include key metrics, a brief description of the cross-validation protocol, and reference to the error analysis already present in the body of the paper. revision: yes
-
Referee: [Abstract] Abstract (method description): the training labels are generated by thresholding outputs of prior per-cancer models; no verification is supplied that these labels are accurate or free of systematic bias (e.g., consistent over- or under-calling due to staining, morphology, or threshold choice). If such bias exists, the new network will internalize it, undermining the claim of improved generalization across 12 cancer types.
Authors: The threshold values and per-cancer models originate from previously published and independently validated TIL classifiers. We combined these with existing manual annotations to form the training set. We acknowledge that the current manuscript does not include a dedicated quantitative check for systematic bias in the thresholded labels across all cancer types. We will add a new analysis subsection that measures agreement (e.g., Dice or pixel-level concordance) between the thresholded outputs and available manual annotations on overlapping slides to address this concern directly. revision: yes
Circularity Check
No circularity: training on external thresholded labels does not reduce the multi-cancer model output to its inputs by construction
full rationale
The paper trains a deep network on a combination of manual annotations and thresholded outputs from prior per-cancer human-in-the-loop methods, then claims the resulting model yields better TIL predictions across 12 cancer types than the original per-cancer approaches. This is a standard semi-supervised training setup whose output (the trained network) is not definitionally identical to the input labels, nor is any performance metric shown to be a direct algebraic rearrangement of the training labels. No self-definitional equations, fitted-input-renamed-as-prediction, load-bearing self-citations, or ansatz smuggling appear in the abstract or described method. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Current opinion in im- munology 25(2), 261–267 (2013)
Angell, H., Galon, J.: From the immune contexture to the immunoscore: the role of prognostic and predictive immune markers in cancer. Current opinion in im- munology 25(2), 261–267 (2013)
work page 2013
-
[2]
Barnes, M., Sarkar, A., Redman, R., et al.: Development of a histology-based digital pathology image analysis algorithm for assessment of tumor infiltrating lympho- cytes in her2+ breast cancer. In: Cancer Research. vol. 78 (2018)
work page 2018
-
[3]
In: Medical Imaging 2018: Digital Pathology
Corredor, G., Wang, X., Lu, C., et al.: A watershed and feature-based approach for automated detection of lymphocytes on lung cancer images. In: Medical Imaging 2018: Digital Pathology. vol. 10581, p. 105810R. International Society for Optics and Photonics (2018)
work page 2018
- [4]
-
[5]
Advances in anatomic pathology 24(6), 311–335 (2017)
Hendry, S., Salgado, R., Gevaert, T., et al.: Assessing tumor-infiltrating lym- phocytes in solid tumors: A practical review for pathologists and proposal for a standardized method from the international immuno-oncology biomarkers working group: Part 2. Advances in anatomic pathology 24(6), 311–335 (2017)
work page 2017
-
[6]
Pattern recognition 86, 188–200 (2019)
Hou, L., Nguyen, V., Kanevsky, A.B., et al.: Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern recognition 86, 188–200 (2019)
work page 2019
-
[7]
Advances in Anatomic Pathology 24(6), 311–335 (2016)
John, M., Salgado, R., Gevaert, T., et al.: Assessing tumor-infiltrating lymphocytes in solid tumors. Advances in Anatomic Pathology 24(6), 311–335 (2016)
work page 2016
-
[8]
Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)
Linder, N., Taylor, J.C., Colling, R., et al.: Deep learning for detecting tumour- infiltrating lymphocytes in testicular germ cell tumours. Journal of Clinical Pathol- ogy 72(2), 157–164 (2019)
work page 2019
-
[9]
Journal of clinical oncology 29(15), 1949–1955 (2011)
Mahmoud, S.M., Paish, E.C., Powe, D.G., et al.: Tumor-infiltrating cd8+ lympho- cytes predict clinical outcome in breast cancer. Journal of clinical oncology 29(15), 1949–1955 (2011)
work page 1949
-
[10]
Cancer and Metastasis Reviews 30(1), 5–12 (2011)
Mlecnik, B., Bindea, G., Pag` es, F., Galon, J.: Tumor immunosurveillance in human cancers. Cancer and Metastasis Reviews 30(1), 5–12 (2011)
work page 2011
-
[11]
Journal of clinical oncology 29(6), 610–618 (2011)
Mlecnik, B., Tosolini, M., Kirilovsky, A., et al.: Histopathologic-based prognostic factors of colorectal cancers are associated with the state of the local immune reaction. Journal of clinical oncology 29(6), 610–618 (2011)
work page 2011
-
[12]
Annals of oncology 26(2), 259–271 (2014)
Salgado, R., Denkert, C., Demaria, S., et al.: The evaluation of tumor-infiltrating lymphocytes (tils) in breast cancer: recommendations by an international tils work- ing group 2014. Annals of oncology 26(2), 259–271 (2014)
work page 2014
-
[13]
Cell Reports 23(1), 181 – 193.e7 (2018)
Saltz, J., Gupta, R., Hou, L., et al.: Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Reports 23(1), 181 – 193.e7 (2018)
work page 2018
-
[14]
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
work page 2015
-
[15]
Journal for immunotherapy of cancer 6(1), 20 (2018)
Steele, K.E., Tan, T.H., Korn, R., et al.: Measuring multiple parameters of cd8+ tumor-infiltrating lymphocytes in human cancers by image analysis. Journal for immunotherapy of cancer 6(1), 20 (2018)
work page 2018
-
[16]
In: Thirty-First AAAI Confer- ence on Artificial Intelligence (2017) 9
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Confer- ence on Artificial Intelligence (2017) 9
work page 2017
-
[17]
TCGA development team: The Cancer Genome Atlas. https://tcga-data.nci. nih.gov/docs/publications/tcga/
-
[18]
Immunity 48(4), 812–830 (2018)
Thorsson, V., Gibbs, D.L., Brown, S.D., et al.: The immune landscape of cancer. Immunity 48(4), 812–830 (2018)
work page 2018
-
[19]
Youden, W.J.: Index for rating diagnostic tests. Cancer 3(1), 32–35 (1950)
work page 1950
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.