pith. sign in

arxiv: 2605.25878 · v1 · pith:VGFJXQ3Cnew · submitted 2026-05-25 · 📡 eess.IV · cs.CV

A Clinically Validated Foundation Model for Comprehensive Lung Pathology Interpretation

Pith reviewed 2026-06-29 19:21 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords foundation modellung pathologywhole-slide imagesprospective validationrandomized controlled trialdiagnostic accuracyH&E stainingmolecular markers
0
0 comments X

The pith

PulmoFoundation reaches 92.3% average AUC on 11 lung pathology tasks in a 1,357-patient prospective study and raises pathologists' accuracy from 83.8% to 91.7% in an RCT.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PulmoFoundation as a foundation model for comprehensive lung pathology that covers pre-operative biopsy, intra-operative frozen sections, and post-operative resection slides. It was created by further pretraining a base model on about 40,000 lung H&E whole-slide images and then tested across 32 tasks on 26,000 slides, including molecular marker prediction and survival estimation. In a registered prospective study the model met clinical-grade performance on the 11 core tasks and enabled triage that cut second reviews and IHC orders, while a separate crossover randomized trial showed that AI assistance improved accuracy, shortened median diagnostic time by 19.6%, and raised inter-rater agreement. A sympathetic reader would care because a single model that works across the full clinical workflow could reduce pathologist workload and increase consistency in lung cancer diagnosis and treatment decisions.

Core claim

PulmoFoundation is a subspecialty foundation model for lung pathology built by pretraining on approximately 40,000 diagnostic H&E whole-slide images; in a prospective study of 1,357 patients it achieved 92.3% average AUC across 11 tasks and, with pre-specified thresholds, reduced second-review burden for 68.8% of biopsies and 83.0% of frozen sections while deferring 44.5% of IHC orders; in a crossover RCT with eight pathologists AI assistance improved accuracy from 83.8% to 91.7% across 4,928 case-reader pairs, reduced median diagnostic time by 19.6%, raised confidence by 8.7%, and increased kappa from 0.56 to 0.76.

What carries the argument

PulmoFoundation, the model obtained by subspecialty-specific pretraining on Virchow2 using 40,000 lung H&E whole-slide images to support 32 clinically relevant tasks across biopsy, frozen section, and resection slides.

If this is right

  • With the stated thresholds the model can triage 68.8% of biopsies and 83.0% of frozen sections for reduced second review while maintaining PPV of 1.0 and 0.991.
  • It can defer 44.5% of IHC stain orders with PPV of 0.966.
  • AI assistance improves inter-rater agreement from moderate (kappa 0.56) to substantial (kappa 0.76).
  • The same model supports molecular marker prediction and survival estimation in addition to core diagnostic tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subspecialty pretraining strategy could be applied to other organs to produce foundation models with comparable prospective validation.
  • Wider deployment would reveal whether performance and workload reductions persist across more varied hospital systems and patient populations.
  • Faster diagnoses and higher agreement could translate into shorter turnaround times for treatment decisions in lung cancer care.

Load-bearing premise

The pre-specified triage thresholds and task definitions used in the prospective study and RCT accurately capture real-world clinical decision-making without introducing bias from model development or site-specific practices.

What would settle it

An independent multi-center prospective study in which the model falls below 85% average AUC on the same 11 tasks or in which AI assistance produces no measurable gain in diagnostic accuracy would show the central claim does not hold.

Figures

Figures reproduced from arXiv: 2605.25878 by Cheng Jin, Chenglong Zhao, Fangyi Han, Fengtao Zhou, Hao Chen, Jiabo Ma, Jinbang Li, Junlin Hou, Lijuan Qu, Li Liang, Ling Liang, Muyan Cai, Qi Xie, Shifu Chen, Shujing Guo, Xiuming Zhang, Yihui Wang, Yingxue Xu, Yu Cai, Yueping Liu, Zhengrui Guo, Zhengyu Zhang, Zhe Wang, Zhijian Cen, Zhixuan Chen, Ziyi Liu.

Figure 1
Figure 1. Figure 1: Overview of PulmoFoundation study. a, Distribution of WSIs across 16 medical centers and public data sources, encompassing 66,146 slides. Stacked horizontal bars show slide counts per cohort, colored by usage category, distinguishing slides used for self-supervised pretraining from those evaluated across biopsy, frozen section, resection, and molecular and prognostic tasks. Inset: pretraining WSI counts fr… view at source ↗
Figure 2
Figure 2. Figure 2: Diagnostic biopsy assessment with PulmoFoundation across four clinically essential tasks. a, Cohort overview for four internal task cohorts from Center-H1 and one external cohort from Center-H4: benign versus malignant, primary versus metastatic, histologic subtyping, and CK5/6 prediction. Class composition is shown beneath each cohort. b, Macro AUC heatmap for five models across the five task-cohort combi… view at source ↗
Figure 3
Figure 3. Figure 3: Intra-operative frozen-section diagnosis with PulmoFoundation across four clinically essential tasks. a, Cohort overview for four internal task-cohorts at Center-H1 and two external benign-versus-malignant cohorts. Class composition is shown beneath each cohort. b, Macro AUC heatmap for five models across the six task-cohort combinations; black outlines mark the best-performing model per task-cohort, with … view at source ↗
Figure 6
Figure 6. Figure 6: Prospective validation and clinical triage feasibility of PulmoFoundation in consecutive patients. a, Prospective observational cohort (Center-H1, October 2024 to November 2025; 1,357 consecutive patients) and class distribution across the 11 prospectively evaluated tasks; pie charts show case counts per class. b, Macro AUC and Macro NPV across all 11 prospective tasks, grouped by clinical category (Diagno… view at source ↗
read the original abstract

Pathological assessment guides lung cancer diagnosis, treatment selection, and prognostic evaluation, yet current CPath approaches rely on task-specific models for isolated objectives. Although pan-cancer foundation models offer versatility, they lack subspecialty-level depth and have not been evaluated across clinical workflows or prospectively validated in real-world settings. We introduce PulmoFoundation, a multi-center, prospectively validated, randomized controlled trial (RCT)-evaluated foundation model for comprehensive lung pathology assessment across pre-operative, intra-operative, and post-operative care. Built upon Virchow2 via subspecialty-specific pretraining using ~40,000 diagnostic H&E-stained whole-slide images (WSIs), PulmoFoundation was systematically evaluated on ~26,000 WSIs across 32 clinically relevant tasks. In addition to accurately predicting molecular markers and patient survival, our model achieves clinical-grade performance in core diagnostic tasks across biopsy, frozen section, and surgical resection slides. In a registered prospective study of 1,357 patients across 11 diagnostic tasks, our model achieved an average AUC of 92.3%. Using pre-specified triage thresholds, PulmoFoundation could reduce additional second-review burden for 68.8% of biopsies and 83.0% of frozen sections, and defer 44.5% of IHC stain orders, with PPVs of 1.0, 0.991, and 0.966. Beyond prospective validation, we conducted a crossover RCT with eight pathologists, in which AI assistance improved diagnostic accuracy across 4,928 case-reader pairs (91.7% w/ AI vs. 83.8% w/o AI). AI assistance also reduced median diagnostic time by 19.6%, increased diagnostic confidence by 8.7%, and improved inter-rater agreement from moderate (kappa = 0.56) to substantial (kappa = 0.76). Together, these evaluations support PulmoFoundation as a clinically validated decision-support system for lung pathology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces PulmoFoundation, a foundation model for lung pathology built upon Virchow2 via subspecialty pretraining on ~40,000 diagnostic H&E WSIs. It reports systematic evaluation on ~26,000 WSIs across 32 clinically relevant tasks, including molecular marker prediction and survival. Central results are an average AUC of 92.3% across 11 diagnostic tasks in a registered prospective study of 1,357 patients, with pre-specified triage thresholds reducing second reviews (68.8% biopsies, 83.0% frozen sections) and IHC orders (44.5%) at high PPV; plus a crossover RCT with eight pathologists showing AI-assisted accuracy rising from 83.8% to 91.7% across 4,928 case-reader pairs, with reduced diagnostic time, increased confidence, and improved inter-rater agreement (kappa 0.56 to 0.76).

Significance. If the methodological details confirm independence of the prospective and RCT evaluations, this would constitute a notable advance as one of the first pathology foundation models to combine large-scale subspecialty pretraining with registered prospective validation and RCT assessment in clinical workflows. The triage reduction metrics and RCT outcomes on accuracy, time, and agreement provide direct evidence of workflow impact beyond retrospective benchmarks.

major comments (3)
  1. [Abstract, prospective study paragraph] Abstract, paragraph describing the registered prospective study of 1,357 patients: The assertion of pre-specified triage thresholds and 11 tasks lacks any registration identifier, protocol details, confirmation that thresholds were locked before prospective data collection, exclusion criteria, or statistical powering information. These omissions are load-bearing for the central claim that the 92.3% average AUC and reported PPVs demonstrate unbiased clinical utility independent of model development choices.
  2. [Abstract, RCT paragraph] Abstract, paragraph describing the crossover RCT: No details are supplied on selection of the 4 tasks, criteria for the 4,928 case-reader pairs, powering to detect the accuracy change, or methods for computing inter-rater agreement. These elements are required to evaluate whether the improvement from 83.8% to 91.7% and the kappa shift generalize beyond the specific study design.
  3. [Abstract, pretraining description] Abstract, description of subspecialty-specific pretraining: The ~40,000 WSI pretraining set is presented without explicit statements on data sources, overlap with the ~26,000 evaluation WSIs, or leakage-prevention steps for the 32 tasks. While the prospective component reduces some circularity risk, explicit independence confirmation is needed to support the generalization claims.
minor comments (2)
  1. [Abstract] The abstract reports an average AUC without accompanying range, standard deviation, or per-task values, which would improve interpretability of the 92.3% figure across the 11 tasks.
  2. Consider adding a summary table of the 32 tasks with key performance metrics to aid readers in assessing breadth and consistency.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address each major comment below with clarifications drawn from the full manuscript and have revised the abstract to incorporate key methodological details where space permits.

read point-by-point responses
  1. Referee: [Abstract, prospective study paragraph] Abstract, paragraph describing the registered prospective study of 1,357 patients: The assertion of pre-specified triage thresholds and 11 tasks lacks any registration identifier, protocol details, confirmation that thresholds were locked before prospective data collection, exclusion criteria, or statistical powering information. These omissions are load-bearing for the central claim that the 92.3% average AUC and reported PPVs demonstrate unbiased clinical utility independent of model development choices.

    Authors: The full Methods section provides the study registration identifier, protocol summary, explicit confirmation that triage thresholds were locked prior to prospective enrollment, exclusion criteria, and statistical powering details. We have revised the abstract to state that the study was registered with pre-specified thresholds, thereby supporting the independence claim without altering the reported metrics. revision: yes

  2. Referee: [Abstract, RCT paragraph] Abstract, paragraph describing the crossover RCT: No details are supplied on selection of the 4 tasks, criteria for the 4,928 case-reader pairs, powering to detect the accuracy change, or methods for computing inter-rater agreement. These elements are required to evaluate whether the improvement from 83.8% to 91.7% and the kappa shift generalize beyond the specific study design.

    Authors: Task selection was based on clinical priority for core lung pathology diagnostics; case-reader pairs were formed via stratified random sampling from eligible cases; the study was powered to detect the observed accuracy difference at 80% power; and inter-rater agreement used Cohen's kappa. These details appear in Methods and Supplementary Materials. We have added a concise clause to the abstract noting powering and the agreement metric. revision: yes

  3. Referee: [Abstract, pretraining description] Abstract, description of subspecialty-specific pretraining: The ~40,000 WSI pretraining set is presented without explicit statements on data sources, overlap with the ~26,000 evaluation WSIs, or leakage-prevention steps for the 32 tasks. While the prospective component reduces some circularity risk, explicit independence confirmation is needed to support the generalization claims.

    Authors: Pretraining WSIs were drawn from distinct multi-center sources and time windows with zero patient-level overlap to the evaluation sets; leakage prevention used patient-level splits and exclusion of any shared slides across all 32 tasks. The prospective cohort further ensures independence. We have revised the abstract to include an explicit statement on data-source independence and leakage controls. revision: yes

Circularity Check

0 steps flagged

No significant circularity; prospective and RCT validations are independent of model fitting

full rationale

The paper's central claims rest on a registered prospective study (1,357 patients, 11 tasks, avg. AUC 92.3%) and crossover RCT (4,928 pairs) with pre-specified thresholds applied to new cases. These external benchmarks are measured after model development and are not shown to reduce to training data fits or self-citations. No equations, self-definitional steps, or load-bearing self-citations are present in the provided text; the pretraining on ~40k WSIs and evaluation on ~26k WSIs follow standard foundation-model practice without circular reduction. The derivation chain is self-contained against external clinical benchmarks.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 1 invented entities

The central claim rests on the generalization of a pretrained transformer to new clinical cases and on the assumption that the RCT fairly isolates the effect of AI assistance. The paper inherits the architecture and pretraining of Virchow2 plus many unstated hyperparameters for the additional lung-specific training; no independent evidence is supplied for the new model entity itself.

free parameters (3)
  • ~40,000 WSI pretraining set size
    Scale of subspecialty pretraining data chosen to adapt Virchow2 to lung pathology.
  • 32 clinically relevant tasks
    Number and definition of tasks used for systematic evaluation.
  • pre-specified triage thresholds
    Thresholds chosen to achieve reported PPVs for reducing second reviews and IHC orders.
axioms (2)
  • domain assumption H&E-stained whole-slide images contain sufficient visual information to predict molecular markers and survival
    Invoked when claiming the model accurately predicts these quantities across biopsy, frozen, and resection slides.
  • domain assumption The prospective enrollment of 1,357 patients across 11 tasks introduces no selection bias relative to routine clinical practice
    Required for the reported AUC and triage results to generalize.
invented entities (1)
  • PulmoFoundation no independent evidence
    purpose: Lung-specific foundation model for comprehensive pathology assessment
    New model introduced via subspecialty pretraining; no external falsifiable handle (e.g., predicted biomarker not tested in independent cohort) is provided.

pith-pipeline@v0.9.1-grok · 5984 in / 1730 out tokens · 53938 ms · 2026-06-29T19:21:27.705574+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

121 extracted references · 13 canonical work pages · 3 internal anchors

  1. [1]

    P., Weiderpass, E

    Wild, C. P., Weiderpass, E. & Stewart, B. W. (eds.)World cancer report(International Agency for Research on Cancer, Lyon, France, 2020)

  2. [2]

    & Chen, W

    Cao, W., Qin, K., Li, F. & Chen, W. Comparative study of cancer profiles between 2020 and 2022 using global cancer statistics (globocan).J. Natl. Cancer Cent.4, 128–134 (2024). 3.Siegel, R. L., Kratzer, T. B., Giaquinto, A. N., Sung, H. & Jemal, A. Cancer statistics, 2025.Ca75, 10 (2025). 33/98 4.Kratzer, T. B.et al.Lung cancer statistics, 2023.Cancer130,...

  3. [3]

    A., Rimm, D

    Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V . & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology.Nat. reviews Clin. oncology16, 703–715 (2019). 8.Lipkova, J.et al.Artificial intelligence for multimodal data integration in oncology.Cancer cell40, 1095–1110 (2022)

  4. [4]

    & Rong, H

    Ma, K., Zheng, M., Chen, W., Qi, Y . & Rong, H. Research progress in computer-aided diagnosis systems for lung cancer. npj Digit. Medicine8, 722 (2025)

  5. [5]

    & Bach, H

    Nooreldeen, R. & Bach, H. Current and future development in lung cancer diagnosis.Int. journal molecular sciences22, 8661 (2021). 11.Ning, J.et al.Early diagnosis of lung cancer: which is the optimal choice?Aging (Albany NY)13, 6214 (2021)

  6. [6]

    13.Zheng, X.et al.An end-to-end multifunctional ai platform for intraoperative diagnosis.NPJ Digit

    Licker, M.et al.Impact of intraoperative lung-protective interventions in patients undergoing lung cancer surgery.Critical care13, R41 (2009). 13.Zheng, X.et al.An end-to-end multifunctional ai platform for intraoperative diagnosis.NPJ Digit. Medicine8(2025)

  7. [7]

    T., Mery, C

    Jaklitsch, M. T., Mery, C. M. & Audisio, R. A. The use of surgery to treat lung cancer in elderly patients.The Lancet Oncol.4, 463–471 (2003)

  8. [8]

    medicine24, 1559–1567 (2018)

    Coudray, N.et al.Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning.Nat. medicine24, 1559–1567 (2018)

  9. [9]

    N.et al.Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer

    Kather, J. N.et al.Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. medicine25, 1054–1056 (2019)

  10. [10]

    & Baste, J.-M

    Montagne, F., Guisier, F., Venissac, N. & Baste, J.-M. The role of surgery in lung cancer treatment: present indications and future perspectives—state of the art.Cancers13, 3711 (2021)

  11. [11]

    Chen, S.et al.Development and validation of an explainable machine learning model for predicting postoperative pulmonary complications after lung cancer surgery: a machine learning study.EClinicalMedicine86(2025)

  12. [12]

    Yu, K.-H.et al.Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. communications7, 12474 (2016)

  13. [13]

    Kludt, C.et al.Next-generation lung cancer pathology: Development and validation of diagnostic and prognostic algorithms.Cell Reports Medicine5(2024)

  14. [14]

    Grilley-Olson, J. E.et al.Validation of interobserver agreement in lung cancer assessment: hematoxylin-eosin diagnostic reproducibility for non–small cell lung cancer: the 2004 world health organization classification and therapeutically relevant subsets.Arch. pathology & laboratory medicine137, 32–40 (2013)

  15. [15]

    journal respiratory critical care medicine199, 1249–1256 (2019)

    Romagnoli, M.et al.Poor concordance between sequential transbronchial lung cryobiopsy and surgical lung biopsy in the diagnosis of diffuse interstitial lung diseases.Am. journal respiratory critical care medicine199, 1249–1256 (2019)

  16. [16]

    medicine25, 1301–1309 (2019)

    Campanella, G.et al.Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nat. medicine25, 1301–1309 (2019)

  17. [17]

    & Ciompi, F

    Van der Laak, J., Litjens, G. & Ciompi, F. Deep learning in histopathology: the path to the clinic.Nat. medicine27, 775–784 (2021)

  18. [18]

    Zimmermann, E.et al.Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738(2024)

  19. [19]

    J.et al.Towards a general-purpose foundation model for computational pathology.Nat

    Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nat. medicine30, 850–862 (2024)

  20. [20]

    28.Xu, H.et al.A whole-slide foundation model for digital pathology from real-world data.Nature630, 181–188 (2024)

    Wang, X.et al.A pathology foundation model for cancer diagnosis and prognosis prediction.Nature634, 970–978 (2024). 28.Xu, H.et al.A whole-slide foundation model for digital pathology from real-world data.Nature630, 181–188 (2024)

  21. [21]

    Yang, H.et al.Deep learning-based six-type classifier for lung cancer and mimics from histopathological whole slide images: a retrospective study.BMC medicine19, 80 (2021). 34/98

  22. [22]

    Davri, A.et al.Deep learning for lung cancer diagnosis, prognosis and prediction using histological and cytological images: a systematic review.Cancers15, 3981 (2023)

  23. [23]

    In Proceedings of the Computer Vision and Pattern Recognition Conference, 15590–15600 (2025)

    Guo, Z.et al.Focus: Knowledge-enhanced adaptive visual compression for few-shot whole slide image classification. In Proceedings of the Computer Vision and Pattern Recognition Conference, 15590–15600 (2025)

  24. [24]

    Zhang, Y .et al.Histopathology images-based deep learning prediction of prognosis and therapeutic response in small cell lung cancer.NPJ digital medicine7, 15 (2024)

  25. [25]

    Commun.16, 2366 (2025)

    Yang, Z.et al.A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images.Nat. Commun.16, 2366 (2025)

  26. [26]

    InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, 189–199 (Springer, 2024)

    Guo, Z.et al.Histgen: Histopathology report generation via local-global feature encoding and cross-modal context interaction. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, 189–199 (Springer, 2024)

  27. [27]

    communications11, 3877 (2020)

    Schmauch, B.et al.A deep learning model to predict rna-seq expression of tumours from whole slide images.Nat. communications11, 3877 (2020)

  28. [28]

    A.et al.Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes.Nat

    Diao, J. A.et al.Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes.Nat. communications12, 1613 (2021)

  29. [29]

    InInternational Conference on Machine Learning, 20941–20963 (PMLR, 2025)

    Guo, Z.et al.Context matters: Query-aware dynamic long sequence modeling of gigapixel images. InInternational Conference on Machine Learning, 20941–20963 (PMLR, 2025)

  30. [30]

    arXiv preprint arXiv:2506.19681(2025)

    Jin, C.et al.Genome-anchored foundation model embeddings improve molecular prediction from histology images. arXiv preprint arXiv:2506.19681(2025). 39.Lu, M. Y .et al.A visual-language foundation model for computational pathology.Nat. medicine30, 863–874 (2024). 40.Lu, M. Y .et al.A multimodal generative ai copilot for human pathology.Nature634, 466–473 (...

  31. [31]

    R.et al.Integrated multimodal artificial intelligence framework for healthcare applications.NPJ digital medicine5, 149 (2022)

    Soenksen, L. R.et al.Integrated multimodal artificial intelligence framework for healthcare applications.NPJ digital medicine5, 149 (2022)

  32. [32]

    Ma, J.et al.Pathbench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology.arXiv preprint arXiv:2505.20202(2025)

  33. [33]

    Commun.16, 3640 (2025)

    Campanella, G.et al.A clinical benchmark of public self-supervised pathology foundation models.Nat. Commun.16, 3640 (2025)

  34. [34]

    Neidlinger, P.et al.Benchmarking foundation models as feature extractors for weakly supervised computational pathology. Nat. biomedical engineering1–11 (2025)

  35. [35]

    M.et al.Histologic patterns and molecular characteristics of lung adenocarcinoma associated with clinical outcome.Cancer118, 2889–2899 (2012)

    Solis, L. M.et al.Histologic patterns and molecular characteristics of lung adenocarcinoma associated with clinical outcome.Cancer118, 2889–2899 (2012). 47.Ramos, R.et al.Heterogeneity of lung cancer: The histopathological diversity and tumour classification in the artificial intelligence era.Pathobiology(2025)

  36. [36]

    communi- cations10, 3991 (2019)

    Kim, M.et al.Patient-derived lung cancer organoids as in vitro cancer models for therapeutic screening.Nat. communi- cations10, 3991 (2019)

  37. [37]

    Medicine1–9 (2025)

    Campanella, G.et al.Real-world deployment of a fine-tuned pathology foundation model for lung cancer biomarker detection.Nat. Medicine1–9 (2025)

  38. [38]

    Lung cancer: understanding its molecular pathology and the 2015 who classification.Front

    Inamura, K. Lung cancer: understanding its molecular pathology and the 2015 who classification.Front. oncology7, 193 (2017)

  39. [39]

    Z.et al.Molecular heterogeneity in lung cancer: from mechanisms of origin to clinical implications.Int

    Marino, F. Z.et al.Molecular heterogeneity in lung cancer: from mechanisms of origin to clinical implications.Int. journal medical sciences16, 981 (2019)

  40. [40]

    Saller, J. J. & Boyle, T. A. Molecular pathology of lung cancer.Cold Spring Harb. Perspectives Medicine12, a037812 (2022)

  41. [41]

    & Chen, H

    Jin, C., Guo, Z., Lin, Y ., Luo, L. & Chen, H. Learning with less supervision: A survey of label-efficient learning for medical image analysis.Med. Image Analysis104062 (2026)

  42. [42]

    Ma, J.et al.A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng.1–20 (2025). 35/98

  43. [43]

    medicine30, 2924–2935 (2024)

    V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nat. medicine30, 2924–2935 (2024). 56.Xu, Y .et al.A multimodal knowledge-enhanced whole-slide pathology foundation model.Nat. Commun.(2025)

  44. [44]

    58.Moor, M.et al.Foundation models for generalist medical artificial intelligence.Nature616, 259–265 (2023)

    Maleki, D.et al.Understanding foundation models in digital pathology: Performance, trade-offs, and model-selection recommendations.bioRxiv2025–09 (2025). 58.Moor, M.et al.Foundation models for generalist medical artificial intelligence.Nature616, 259–265 (2023)

  45. [45]

    arXiv preprint arXiv:2311.16452 , year=

    Nori, H.et al.Can generalist foundation models outcompete special-purpose tuning? case study in medicine.arXiv preprint arXiv:2311.16452(2023). 60.Ochi, M., Komura, D. & Ishikawa, S. Pathology foundation models.JMA journal8, 121–130 (2025)

  46. [46]

    Bercher, R

    Xiong, C., Chen, H. & Sung, J. J. Y . A survey of pathology foundation model: progress and future directions. In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI ’25, DOI: 10.24963/ijcai .2025/1193 (2025)

  47. [47]

    Nagendran, M.et al.Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies.bmj368(2020)

  48. [48]

    Da, Q.et al.Computational pathology in the era of emerging foundation and agentic ai–international expert perspectives on clinical integration and translational readiness.arXiv preprint arXiv:2603.05884(2026)

  49. [49]

    & Zhang, K

    Liu, F., Beck, S., Yang, L., Luo, H. & Zhang, K. Advancing ai for multi-omics and clinical data integration in basic and translational cancer research.Nat. Rev. Cancer1–16 (2026)

  50. [50]

    Huang, Z.et al.A pathologist–ai collaboration framework for enhancing diagnostic accuracies and efficiencies.Nat. Biomed. Eng.9, 455–470 (2025)

  51. [51]

    medicine28, 154–163 (2022)

    Bulten, W.et al.Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge.Nat. medicine28, 154–163 (2022)

  52. [52]

    68.Weinstein, J

    Skrede, O.-J.et al.Deep learning for prediction of colorectal cancer outcome: a discovery and validation study.The Lancet395, 350–360 (2020). 68.Weinstein, J. N.et al.The cancer genome atlas pan-cancer analysis project.Nat. genetics45, 1113–1120 (2013)

  53. [53]

    J.et al.The cptac data portal: a resource for cancer proteomics research.J

    Edwards, N. J.et al.The cptac data portal: a resource for cancer proteomics research.J. proteome research14, 2707–2713 (2015). 70.Team, N. L. S. T. R. The national lung screening trial: overview and study design.Radiology258, 243–253 (2011)

  54. [54]

    & Ivanova, E

    Nechaev, D., Pchelnikov, A. & Ivanova, E. Histai: An open-source, large-scale whole slide image dataset for computational pathology (2025). 2505.12120

  55. [55]

    arXiv preprint arXiv:2204.06455(2022)

    Han, C.et al.Wsss4luad: Grand challenge on weakly-supervised tissue semantic segmentation for lung adenocarcinoma. arXiv preprint arXiv:2204.06455(2022)

  56. [56]

    A.et al.Lung and colon cancer histopathological image dataset (lc25000).arXiv preprint arXiv:1912.12142 (2019)

    Borkowski, A. A.et al.Lung and colon cancer histopathological image dataset (lc25000).arXiv preprint arXiv:1912.12142 (2019). 74.Tham, Y . C.et al.Building the world’s first truly global medical foundation model.Nat. medicine1–6 (2025). 75.de Hond, A. A.et al.Perspectives on validation of clinical predictive algorithms.NPJ digital medicine6, 86 (2023)

  57. [57]

    Han, R.et al.Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review.The lancet digital health6, e367–e373 (2024)

  58. [58]

    G., Hernandez-Boussard, T., Pfeffer, M

    You, J. G., Hernandez-Boussard, T., Pfeffer, M. A., Landman, A. & Mishuris, R. G. Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications.NPJ Digit. Medicine8, 107 (2025)

  59. [59]

    Utility of small biopsies for diagnosis of lung nodules: doing more with less.Mod

    Mukhopadhyay, S. Utility of small biopsies for diagnosis of lung nodules: doing more with less.Mod. Pathol.25, S43–S57 (2012)

  60. [60]

    The challenges of evaluating predictive biomarkers using small biopsy tissue samples and liquid biopsies from non-small cell lung cancer patients.J

    Hofman, P. The challenges of evaluating predictive biomarkers using small biopsy tissue samples and liquid biopsies from non-small cell lung cancer patients.J. thoracic disease11, S57 (2019)

  61. [61]

    medicine25, 954–961 (2019)

    Ardila, D.et al.End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography.Nat. medicine25, 954–961 (2019). 36/98

  62. [62]

    Bhamani, A.et al.Low-dose ct for lung cancer screening in a high-risk population (summit): a prospective, longitudinal cohort study.The Lancet Oncol.26, 609–619 (2025)

  63. [63]

    S., Morgensztern, D

    Herbst, R. S., Morgensztern, D. & Boshoff, C. The biology and management of non-small cell lung cancer.Nature553, 446–454 (2018)

  64. [64]

    Wang, M., Herbst, R. S. & Boshoff, C. Toward personalized treatment approaches for non-small-cell lung cancer.Nat. medicine27, 1345–1356 (2021)

  65. [65]

    Travis, W. D.et al.International association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary classification of lung adenocarcinoma.J. thoracic oncology6, 244–285 (2011)

  66. [66]

    Travis, W. D.et al.Diagnosis of lung cancer in small biopsies and cytology: implications of the 2011 international association for the study of lung cancer/american thoracic society/european respiratory society classification.Arch. Pathol. Lab. Medicine137, 668–684 (2013)

  67. [67]

    Zhao, Z.et al.A clinical-grade universal foundation model for intraoperative pathology.arXiv preprint arXiv:2510.04861 (2025)

  68. [68]

    Novis, D. A. & Zarbo, R. J. Interinstitutional comparison of frozen section turnaround time.Arch. pathology & laboratory medicine121, 559 (1997)

  69. [69]

    C., Zander, D

    Sienko, A., Allen, T. C., Zander, D. S. & Cagle, P. T. Frozen section of lung specimens.Arch. pathology & laboratory medicine129, 1602–1609 (2005)

  70. [70]

    thoracic disease8, 1974 (2016)

    Li, W.et al.Intraoperative frozen sections of the regional lymph nodes contribute to surgical decision-making in non-small cell lung cancer patients.J. thoracic disease8, 1974 (2016)

  71. [71]

    & Marchevsky, A

    Gupta, R., McKenna Jr, R. & Marchevsky, A. M. Lessons learned from mistakes and deferrals in the frozen section diagnosis of bronchioloalveolar carcinoma and well-differentiated pulmonary adenocarcinoma: an evidence-based pathology approach.Am. journal clinical pathology130, 11–20 (2008)

  72. [72]

    clinical oncology34, 307–313 (2016)

    Liu, S.et al.Precise diagnosis of intraoperative frozen section is an effective method to guide resection strategy for peripheral small-sized lung adenocarcinoma.J. clinical oncology34, 307–313 (2016)

  73. [73]

    Saji, H.et al.Segmentectomy versus lobectomy in small-sized peripheral non-small-cell lung cancer (jcog0802/wjog4607l): a multicentre, open-label, phase 3, randomised, controlled, non-inferiority trial.The Lancet399, 1607–1617 (2022)

  74. [74]

    Altorki, N.et al.Lobar or sublobar resection for peripheral stage ia non–small-cell lung cancer.New Engl. J. Medicine 388, 489–498 (2023)

  75. [75]

    A.et al.Diagnostic yield of routine frozen section pathology examination of lymph nodes in lung resections for clinical stage ia nsclc.The J

    Ortiz, B. A.et al.Diagnostic yield of routine frozen section pathology examination of lymph nodes in lung resections for clinical stage ia nsclc.The J. Thorac. Cardiovasc. Surg.(2025)

  76. [76]

    G.et al.The 2021 who classification of lung tumors: impact of advances since 2015.J

    Nicholson, A. G.et al.The 2021 who classification of lung tumors: impact of advances since 2015.J. Thorac. Oncol.17, 362–387 (2022)

  77. [77]

    an international interobserver study.Mod

    Thunnissen, E.et al.Reproducibility of histopathological subtypes and invasion in pulmonary adenocarcinoma. an international interobserver study.Mod. pathology25, 1574–1583 (2012)

  78. [78]

    Goldstraw, P.et al.The iaslc lung cancer staging project: proposals for revision of the tnm stage groupings in the forthcoming (eighth) edition of the tnm classification for lung cancer.J. Thorac. Oncol.11, 39–51 (2016). 98.Lu, M. Y .et al.Ai-based pathology predicts origins for cancers of unknown primary.Nature594, 106–110 (2021)

  79. [79]

    Medicine29, 2057–2067 (2023)

    Moon, I.et al.Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary.Nat. Medicine29, 2057–2067 (2023). 100.Hendriks, L. E.et al.Non-small-cell lung cancer.Nat. Rev. Dis. Primers10, 71 (2024)

  80. [80]

    Huang, Q.et al.Advances in molecular pathology and therapy of non-small cell lung cancer.Signal Transduct. Target. Ther.10, 186 (2025)

Showing first 80 references.