Benchmarking Foundation Models for Renal Lesion Stratification in CT

Alessa Hering; Bram van Ginneken; Hartmut H\"antze; Jawed Nawabi; Keno Bressem; Lisa Adams; Mathias Prokop; Myrthe Buser; Sarah de Boer; Sebastian Ziegelmayer

arxiv: 2605.07749 · v1 · submitted 2026-05-08 · 💻 cs.CV

Benchmarking Foundation Models for Renal Lesion Stratification in CT

Hartmut H\"antze , Sarah de Boer , Myrthe Buser , Alessa Hering , Bram van Ginneken , Mathias Prokop , Jawed Nawabi , Sebastian Ziegelmayer

show 2 more authors

Lisa Adams Keno Bressem

This is my paper

Pith reviewed 2026-05-11 03:18 UTC · model grok-4.3

classification 💻 cs.CV

keywords foundation modelsrenal lesionsCT imagingradiomicsmedical image classificationtransfer learningbenchmarkinglesion stratification

0 comments

The pith

Medical foundation models match but do not beat radiomics for classifying six types of renal lesions on CT.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks three medical foundation models on a six-class task of distinguishing renal lesions such as cysts and clear cell carcinoma from CT scans. It applies a frozen feature-probing protocol to compare their embeddings against a handcrafted radiomics classifier and a 3D ResNet-50 trained from scratch on 2,854 lesions, then evaluates on an external set of 234 lesions. The foundation models reach AUCs of 0.70-0.77, close to the ResNet at 0.72, yet the radiomics baseline reaches 0.88 and outperforms all deep learning methods. This shows that while foundation models cut hardware needs dramatically, their pre-trained representations miss the texture and shape details that drive subtype discrimination in this setting.

Core claim

The authors establish that generalist medical foundation model embeddings, extracted via frozen feature probing, achieve AUC values of 0.70-0.77 on the six-class renal lesion stratification task, matching the performance of a 3D ResNet-50 trained from scratch at AUC 0.72 while requiring only seconds of CPU time after feature extraction, yet falling significantly below a conventional radiomics baseline at AUC 0.88 on the external test set of 234 lesions.

What carries the argument

The frozen feature-probing protocol that extracts fixed embeddings from pre-trained medical foundation models and feeds them to a simple classifier for the renal lesion task.

If this is right

Foundation model embeddings can serve as a low-compute alternative to training networks from scratch in data-scarce medical classification settings.
Radiomics retains superiority for tasks that hinge on fine-grained texture and shape heterogeneity in histological subtype discrimination.
Current generalist medical foundation models require further adaptation or richer pre-training data to close the gap with established feature-based methods on this task.
The efficiency gains of foundation models come at the cost of accuracy relative to radiomics in the current benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the same benchmark protocol to other organs or modalities could reveal whether the performance gap is specific to renal CT texture or more general.
Combining handcrafted radiomics features with foundation model embeddings might produce hybrid classifiers that exceed either alone.
If future foundation models incorporate more CT-specific texture data during pre-training, their transfer performance on similar scarce-data tasks could improve without full retraining.

Load-bearing premise

The external test set of 234 lesions and the frozen probing protocol give an unbiased, generalizable measure of each model's capability without selection biases or distribution shifts.

What would settle it

A new foundation model, evaluated with the same frozen probing protocol on the identical external set of 234 lesions, producing an AUC significantly above 0.88.

read the original abstract

The rapid proliferation of open-source medical foundation models (FMs) raises a practical question: how well do their pre-trained representations transfer to clinically relevant but data-scarce classification tasks? Particularly in CT-based renal lesion classification, a push toward greater generalizability would be meaningful, as the field is constrained by inherently limited training data. We addressed this through a benchmark of three medical FMs on this specific task. This six-class problem spans common entities like cysts and clear cell renal cell carcinoma, alongside rare subtypes. Using a frozen feature-probing protocol, we compared FM embeddings against a handcrafted radiomics classifier and a 3D ResNet-50 trained from scratch. Models were trained on a composite dataset of 2,854 lesions and evaluated on an external test set of 234 lesions from The Cancer Imaging Archive. Our results reveal two key findings. First, FM performance (AUC 0.70-0.77) matched the from-scratch ResNet (AUC 0.72) while drastically reducing hardware demand, requiring only seconds on a CPU after feature extraction. However, the conventional radiomics baseline significantly outperformed all deep learning approaches, achieving an AUC of 0.88 (all p $\leq$ 0.002). This suggests that current generalist FM embeddings do not yet capture the fine-grained texture and shape heterogeneity driving histological subtype discrimination. Despite their potential in data-scarce settings, medical FMs did not surpass established models for renal lesion stratification, leaving radiomics as the current state-of-the-art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Radiomics still beats the medical foundation models on this 6-class renal lesion task, but the aggregate AUC gap alone does not pin down the fine-grained heterogeneity claim.

read the letter

The main thing to know is that on an external test set of 234 lesions, handcrafted radiomics reached 0.88 AUC while the three medical foundation models sat at 0.70-0.77 and a from-scratch 3D ResNet-50 hit 0.72. The paper therefore lands on the practical side: current generalist FM embeddings do not yet beat established radiomics for this data-scarce CT classification problem, even though they match the trained network at far lower compute cost after feature extraction.

Referee Report

2 major / 2 minor

Summary. The paper benchmarks three medical foundation models using a frozen feature-probing protocol on a six-class CT-based renal lesion classification task (cysts, ccRCC, and rare subtypes). Models are trained on a composite set of 2,854 lesions and evaluated on an external TCIA test set of 234 lesions, with comparisons to a 3D ResNet-50 trained from scratch and a handcrafted radiomics baseline. Results show FM AUCs of 0.70-0.77 (matching ResNet at 0.72) but significantly lower than radiomics at 0.88 (p ≤ 0.002), leading to the conclusion that current generalist FM embeddings do not capture the fine-grained texture and shape heterogeneity needed for histological subtype discrimination.

Significance. If the results hold after addressing the noted gaps, this provides a useful empirical benchmark showing that conventional radiomics remains superior for this data-scarce clinical task while FM probing offers efficiency gains (CPU seconds post-extraction). The external test set strengthens generalizability claims, and the direct comparison to both DL and radiomics baselines offers practical guidance for FM adoption in medical imaging.

major comments (2)

[Abstract] Abstract: The interpretive claim that FM embeddings 'do not yet capture the fine-grained texture and shape heterogeneity driving histological subtype discrimination' rests on the aggregate AUC gap (0.70-0.77 vs. 0.88). Without per-class AUCs, confusion matrices, or class distribution counts for the 234-lesion external test set, the delta cannot be isolated to performance on rare/difficult subtypes rather than majority classes separable by basic features.
[Methods/Results] Methods/Results: The manuscript reports p-values (all p ≤ 0.002) for the radiomics superiority but does not specify the statistical test (e.g., DeLong for AUC comparison), whether multiple-comparison correction was applied, or provide class-wise breakdowns. These details are load-bearing for validating the central claim and ruling out dataset-specific effects.

minor comments (2)

[Abstract] Abstract: The three specific medical foundation models evaluated are not named; listing them would improve clarity for readers.
[Methods] The training set size (2,854 lesions) and test set (234 lesions) are given, but explicit reporting of class imbalance ratios in both would aid interpretation of the aggregate metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which has improved the clarity and transparency of our work. We address each major comment below and have revised the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Abstract] Abstract: The interpretive claim that FM embeddings 'do not yet capture the fine-grained texture and shape heterogeneity driving histological subtype discrimination' rests on the aggregate AUC gap (0.70-0.77 vs. 0.88). Without per-class AUCs, confusion matrices, or class distribution counts for the 234-lesion external test set, the delta cannot be isolated to performance on rare/difficult subtypes rather than majority classes separable by basic features.

Authors: We agree that aggregate AUCs alone limit the ability to attribute the performance gap to specific classes. In the revised manuscript we have added the class distribution counts for the 234-lesion external test set, per-class AUC values for all models, and confusion matrices. These additions allow readers to evaluate whether the observed differences are concentrated on the rarer subtypes. revision: yes
Referee: [Methods/Results] Methods/Results: The manuscript reports p-values (all p ≤ 0.002) for the radiomics superiority but does not specify the statistical test (e.g., DeLong for AUC comparison), whether multiple-comparison correction was applied, or provide class-wise breakdowns. These details are load-bearing for validating the central claim and ruling out dataset-specific effects.

Authors: We have clarified the statistical procedures in the revised Methods section: p-values for AUC comparisons were obtained with the DeLong test and Bonferroni correction was applied to account for multiple pairwise tests. Class-wise performance breakdowns have also been added to the Results to support transparent evaluation of the central claims. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical benchmark on held-out data

full rationale

The paper performs a standard model comparison by training on a composite dataset of 2,854 lesions and evaluating AUC on an independent external test set of 234 lesions. No equations, derivations, fitted parameters renamed as predictions, or self-citations are present in the provided text. The central claim rests on measured performance deltas (radiomics 0.88 vs. FM/ResNet ~0.70-0.77), which are falsifiable against the external data rather than reducing to the inputs by construction. This matches the default case of a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This empirical benchmark paper introduces no mathematical axioms, invented physical entities, or new conserved quantities. The only implicit assumptions are standard transfer-learning premises that pre-trained embeddings are useful without fine-tuning and that the chosen radiomics features are representative of clinical texture information.

pith-pipeline@v0.9.0 · 5616 in / 1225 out tokens · 62098 ms · 2026-05-11T03:18:47.283372+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the conventional radiomics baseline significantly outperformed all deep learning approaches, achieving an AUC of 0.88... current generalist FM embeddings do not yet capture the fine-grained texture and shape heterogeneity
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using a frozen feature-probing protocol... 3D ResNet-50 trained from scratch

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

[1]

The age of foundation models

Lipkova J and Kather JN. The age of foundation models. Nature Reviews Clinical Oncology 2024; 21:769–70.doi: 10.1038/s41571-024-00941-8

work page doi:10.1038/s41571-024-00941-8 2024
[2]

Overcoming data scarcity in biomedical imaging with a foundational multi-task model

Schäfer R, Nicke T, Höfener H, Lange A, Merhof D, Feuerhake F, Schulz V, Lotz J, and Kiessling F. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nature Computational Science 2024; 4:495–509.doi: 10.1038/s43588-024-00662-z

work page doi:10.1038/s43588-024-00662-z 2024
[3]

Kidney Fact Sheet

International Agency for Research on Cancer (IARC). Kidney Fact Sheet. Accessed: 2026-02-

work page 2026
[4]

Available from: https://gco.iarc.who.int/media/globocan/factsheets/cancers/29-ki dney-fact-sheet.pdf 10

2021. Available from: https://gco.iarc.who.int/media/globocan/factsheets/cancers/29-ki dney-fact-sheet.pdf 10

work page 2021
[5]

CT and MRI of small renal masses

Wang ZJ, Westphalen AC, and Zagoria RJ. CT and MRI of small renal masses. The British Journal of Radiology 2018; 91:20180131.doi: 10.1259/bjr.20180131

work page doi:10.1259/bjr.20180131 2018
[6]

Renal cell carcinoma: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up

Powles T, Albiges L, Bex A, Comperat E, Grünwald V, Kanesvaran R, Kitamura H, McKay R, Porta C, Procopio G, et al. Renal cell carcinoma: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Annals of Oncology 2024.doi: 10.1016/j.annonc.2024.05 .537

work page doi:10.1016/j.annonc.2024.05 2024
[7]

S3- Guideline for Diagnosis, Therapy, and Follow-up of Renal Cell Carcinoma, Short Version 5.0

Leitlinienprogramm Onkologie (Deutsche Krebsgesellschaft, Deutsche Krebshilfe, AWMF). S3- Guideline for Diagnosis, Therapy, and Follow-up of Renal Cell Carcinoma, Short Version 5.0. AWMF Registration Number: 043-017OL, Accessed: 23.02.2025. 2024. Available from: https: //www.leitlinienprogramm-onkologie.de/leitlinien/nierenzellkarzinom/

work page 2025
[8]

Bosniak classification of cystic renal masses, version 2019: an update proposal and needs assessment

Silverman SG, Pedrosa I, Ellis JH, Hindman NM, Schieda N, et al. Bosniak classification of cystic renal masses, version 2019: an update proposal and needs assessment. Radiology 2019; 292:475–88.doi: 10.1148/radiol.2019182646

work page doi:10.1148/radiol.2019182646 2019
[9]

Renal angiomy- olipoma: a radiological classification and update on recent developments in diagnosis and management

Jinzaki M, Silverman SG, Akita H, Nagashima Y, Mikami S, and Oya M. Renal angiomy- olipoma: a radiological classification and update on recent developments in diagnosis and management. Abdominal imaging 2014; 39:588–604.doi: 10.1007/s00261-014-0083-3

work page doi:10.1007/s00261-014-0083-3 2014
[10]

Differentiation of papillary renal cell carcinoma subtypes on CT and MRI

Egbert ND, Caoili EM, Cohan RH, Davenport MS, Francis IR, Kunju LP, and Ellis JH. Differentiation of papillary renal cell carcinoma subtypes on CT and MRI. American Journal of Roentgenology 2013; 201:347–55.doi: 10.2214/AJR.12.9451

work page doi:10.2214/ajr.12.9451 2013
[11]

Renal oncocytoma: CT features cannot reliably distinguish oncocytoma from other renal neoplasms

Choudhary S, Rajesh A, Mayer N, Mulcahy K, and Haroon A. Renal oncocytoma: CT features cannot reliably distinguish oncocytoma from other renal neoplasms. Clinical radiology 2009; 64:517–22.doi: 10.1016/j.crad.2008.12.011

work page doi:10.1016/j.crad.2008.12.011 2009
[12]

MRI features of renal oncocytoma and chromophobe renal cell carcinoma

Rosenkrantz AB, Hindman N, Fitzgerald EF, Niver BE, Melamed J, and Babb JS. MRI features of renal oncocytoma and chromophobe renal cell carcinoma. American Journal of Roentgenology 2010; 195:W421–W427.doi: 10.2214/AJR.10.4718

work page doi:10.2214/ajr.10.4718 2010
[13]

Deep learning for end-to-end kidney cancer diagnosis on multi-phase abdominal computed tomography

Uhm KH, Jung SW, Choi MH, Shin HK, Yoo JI, et al. Deep learning for end-to-end kidney cancer diagnosis on multi-phase abdominal computed tomography. NPJ precision oncology 2021; 5:54.doi: 10.1038/s41698-021-00195-y

work page doi:10.1038/s41698-021-00195-y 2021
[14]

Value of radiomics in differential diagnosis of chromophobe renal cell carcinoma and renal oncocytoma

Li Y, Huang X, Xia Y, and Long L. Value of radiomics in differential diagnosis of chromophobe renal cell carcinoma and renal oncocytoma. Abdominal Radiology 2020; 45:3193–201.doi: 10.1007/s00261-019-02269-9

work page doi:10.1007/s00261-019-02269-9 2020
[15]

Cancers 2022; 14:3609.doi: 10.3390/cancers14153609

AlhussainiAJ,SteeleJD,andNabiG.Comparativeanalysisforthedistinctionofchromophobe renal cell carcinoma from renal oncocytoma in computed tomography imaging using machine learning radiomics analysis. Cancers 2022; 14:3609.doi: 10.3390/cancers14153609

work page doi:10.3390/cancers14153609 2022
[16]

Apparent diffusion coefficient map-based texture analysis for the differentiation of chromophobe renal cell carcinoma from renal oncocytoma

Uchida Y, Yoshida S, Arita Y, Shimoda H, Kimura K, et al. Apparent diffusion coefficient map-based texture analysis for the differentiation of chromophobe renal cell carcinoma from renal oncocytoma. Diagnostics 2022; 12:817.doi: 10.3390/diagnostics12040817

work page doi:10.3390/diagnostics12040817 2022
[17]

Use of MRI in differentiation of papillary renal cell carcinoma subtypes: qualitative and quan- titative analysis

Doshi AM, Ream JM, Kierans AS, Bilbily M, Rusinek H, Huang WC, and Chandarana H. Use of MRI in differentiation of papillary renal cell carcinoma subtypes: qualitative and quan- titative analysis. American Journal of Roentgenology 2016; 206:566–72.doi: 10.2214/AJR.1 5.15004 11

work page doi:10.2214/ajr.1 2016
[18]

Differential diagnosis of type 1 and type 2 papillary renal cell carcinoma based on enhanced CT radiomics nomogram

Gao Y, Wang X, Wang S, Miao Y, Zhu C, Li C, Huang G, Jiang Y, Li J, Zhao X, et al. Differential diagnosis of type 1 and type 2 papillary renal cell carcinoma based on enhanced CT radiomics nomogram. Frontiers in Oncology 2022; 12:854979.doi: 10.3389/fonc.2022.85 4979

work page doi:10.3389/fonc.2022.85 2022
[19]

2603.02790

Stegeman M, Philipp L, Graaf F van der, D’Amato M, Grisi C, Builtjes L, Bosma JS, Lefkes J, Weber RA, Meakin JA, et al. Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language. arXiv preprint arXiv:2603.02790 2026.doi: 10.48550/arXiv.2603.02790

work page doi:10.48550/arxiv.2603.02790 2026
[20]

Incompletely characterized incidental renal masses: emerging data support conservative management

Silverman SG, Israel GM, and Trinh QD. Incompletely characterized incidental renal masses: emerging data support conservative management. Radiology 2015; 275:28–42.doi: 10.1148/r adiol.14141144

work page doi:10.1148/r 2015
[21]

2023 Kidney and Kidney Tumor Segmentation Challenge

Heller N, Isensee F, Tejpau R, Wood A, Papanikolopoulos N, and Weight C. 2023 Kidney and Kidney Tumor Segmentation Challenge. 2023 Apr.doi: 10.5281/zenodo.7840134. Available from: https://doi.org/10.5281/zenodo.7840134

work page doi:10.5281/zenodo.7840134 2023
[22]

The cancer genome atlas kidney renal clear cell carcinoma collection (TCGA-KIRC)(Version 3)[Data set]

Akin O, Elnajjar P, Heller M, Jarosz R, Erickson BJ, et al. The cancer genome atlas kidney renal clear cell carcinoma collection (TCGA-KIRC)(Version 3)[Data set]. Cancer Imaging Arch 2016.doi: 10.7937/K9/TCIA.2016.V6PBVTDR

work page doi:10.7937/k9/tcia.2016.v6pbvtdr 2016
[23]

W. LM, A. GRSC, and S. L. The Cancer Genome Atlas Kidney Chromophobe Collection (TCGA-KICH) (Version 3) [Data set]. Cancer Imaging Arch 2016.doi: 10.7937/K9/TCIA.2 016.YU3RBCZN

work page doi:10.7937/k9/tcia.2 2016
[24]

The cancer genome atlas cervical kidney renal papillary cell carcinoma collection (TCGA-KIRP), version 4

Linehan M, Gautam R, Kirk S, Lee Y, Roche C, Bonaccio E, Filippini J, Rieger-Christ K, Lemmerman J, and Jarosz R. The cancer genome atlas cervical kidney renal papillary cell carcinoma collection (TCGA-KIRP), version 4. The Cancer Imaging Archive 2016.doi: 10.7 937/K9/TCIA.2016.ACWOGBEF

work page 2016
[25]

Accessible and Reproducible Renal Cell Carcinoma Research Through Open-Sourcing Data and Annotations

de Boer S, Häntze H, Ziegelmayer S, Ginneken B van, Prokop M, Bressem KK, and Hering A. Accessible and Reproducible Renal Cell Carcinoma Research Through Open-Sourcing Data and Annotations. medRxiv 2026.doi: 10.64898/2026.04.22.26351451

work page doi:10.64898/2026.04.22.26351451 2026
[26]

Robust Kidney Abnormality Segmentation: A Validation Study of an AI-Based Framework

de Boer S, Häntze H, Venkadesh KV, Buser MA, Mamani GEH, Xu L, Adams LC, Nawabi J, Bressem KK, Ginneken B van, et al. Robust Kidney Abnormality Segmentation: A Validation Study of an AI-Based Framework. arXiv preprint arXiv:2505.07573 2025.doi: 10.48550/arXi v.2505.07573

work page doi:10.48550/arxi 2025
[27]

Foundation model for cancer imaging biomarkers

Pai S, Bontempi D, Hadzic I, Prudente V, Sokač M, Chaunzwa TL, Bernatz S, Hosny A, Mak RH, Birkbak NJ, et al. Foundation model for cancer imaging biomarkers. Nature machine intelligence 2024; 6:354–67.doi: 10.1038/s42256-024-00807-9

work page doi:10.1038/s42256-024-00807-9 2024
[28]

arXiv preprint arXiv:2501.09001

Pai S, Hadzic I, Bontempi D, Bressem K, Kann BH, Fedorov A, Mak RH, and Aerts HJ. Vision foundation models for computed tomography. arXiv preprint arXiv:2501.09001 2025. doi: 10.48550/arXiv.2501.09001

work page doi:10.48550/arxiv.2501.09001 2025
[29]

Tissue concepts: Supervised foundation models in computational pathology

Nicke T, Schäfer JR, Höfener H, Feuerhake F, Merhof D, Kießling F, and Lotz J. Tissue concepts: Supervised foundation models in computational pathology. Computers in biology and medicine 2025; 186:109621.doi: 10.1016/j.compbiomed.2024.109621

work page doi:10.1016/j.compbiomed.2024.109621 2025
[30]

Proceedings of the 22nd

Chen T and Guestrin C. Xgboost: A scalable tree boosting system.Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016 :785–94. doi: 10.1145/2939672.293978 12

work page doi:10.1145/2939672.293978 2016
[31]

Cancer research 2017; 77:e104–e107.doi: 10.1 158/0008-5472.CAN-17-0339

VanGriethuysenJJ,FedorovA,ParmarC,HosnyA,AucoinN,etal.Computationalradiomics system to decode the radiographic phenotype. Cancer research 2017; 77:e104–e107.doi: 10.1 158/0008-5472.CAN-17-0339

work page 2017
[32]

Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians

Carpenter J and Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in medicine 2000; 19:1141–64.doi: 10.1002/(sici)10 97-0258(20000515)19:9<1141::aid-sim479>3.0.co;2-f

work page doi:10.1002/(sici)10 2000
[33]

FindBounce: Package for multi-field bounce actions

Mandel M and Betensky RA. Simultaneous confidence intervals based on the percentile boot- strap approach. Computational statistics & data analysis 2008; 52:2158–65.doi: 10.1016/j.c sda.2007.07.005

work page doi:10.1016/j.c 2008
[34]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

McInnes L, Healy J, and Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 2018.doi: 10.48550/arXiv.1802.03426 13

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1802.03426 2018

[1] [1]

The age of foundation models

Lipkova J and Kather JN. The age of foundation models. Nature Reviews Clinical Oncology 2024; 21:769–70.doi: 10.1038/s41571-024-00941-8

work page doi:10.1038/s41571-024-00941-8 2024

[2] [2]

Overcoming data scarcity in biomedical imaging with a foundational multi-task model

Schäfer R, Nicke T, Höfener H, Lange A, Merhof D, Feuerhake F, Schulz V, Lotz J, and Kiessling F. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nature Computational Science 2024; 4:495–509.doi: 10.1038/s43588-024-00662-z

work page doi:10.1038/s43588-024-00662-z 2024

[3] [3]

Kidney Fact Sheet

International Agency for Research on Cancer (IARC). Kidney Fact Sheet. Accessed: 2026-02-

work page 2026

[4] [4]

Available from: https://gco.iarc.who.int/media/globocan/factsheets/cancers/29-ki dney-fact-sheet.pdf 10

2021. Available from: https://gco.iarc.who.int/media/globocan/factsheets/cancers/29-ki dney-fact-sheet.pdf 10

work page 2021

[5] [5]

CT and MRI of small renal masses

Wang ZJ, Westphalen AC, and Zagoria RJ. CT and MRI of small renal masses. The British Journal of Radiology 2018; 91:20180131.doi: 10.1259/bjr.20180131

work page doi:10.1259/bjr.20180131 2018

[6] [6]

Renal cell carcinoma: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up

Powles T, Albiges L, Bex A, Comperat E, Grünwald V, Kanesvaran R, Kitamura H, McKay R, Porta C, Procopio G, et al. Renal cell carcinoma: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Annals of Oncology 2024.doi: 10.1016/j.annonc.2024.05 .537

work page doi:10.1016/j.annonc.2024.05 2024

[7] [7]

S3- Guideline for Diagnosis, Therapy, and Follow-up of Renal Cell Carcinoma, Short Version 5.0

Leitlinienprogramm Onkologie (Deutsche Krebsgesellschaft, Deutsche Krebshilfe, AWMF). S3- Guideline for Diagnosis, Therapy, and Follow-up of Renal Cell Carcinoma, Short Version 5.0. AWMF Registration Number: 043-017OL, Accessed: 23.02.2025. 2024. Available from: https: //www.leitlinienprogramm-onkologie.de/leitlinien/nierenzellkarzinom/

work page 2025

[8] [8]

Bosniak classification of cystic renal masses, version 2019: an update proposal and needs assessment

Silverman SG, Pedrosa I, Ellis JH, Hindman NM, Schieda N, et al. Bosniak classification of cystic renal masses, version 2019: an update proposal and needs assessment. Radiology 2019; 292:475–88.doi: 10.1148/radiol.2019182646

work page doi:10.1148/radiol.2019182646 2019

[9] [9]

Renal angiomy- olipoma: a radiological classification and update on recent developments in diagnosis and management

Jinzaki M, Silverman SG, Akita H, Nagashima Y, Mikami S, and Oya M. Renal angiomy- olipoma: a radiological classification and update on recent developments in diagnosis and management. Abdominal imaging 2014; 39:588–604.doi: 10.1007/s00261-014-0083-3

work page doi:10.1007/s00261-014-0083-3 2014

[10] [10]

Differentiation of papillary renal cell carcinoma subtypes on CT and MRI

Egbert ND, Caoili EM, Cohan RH, Davenport MS, Francis IR, Kunju LP, and Ellis JH. Differentiation of papillary renal cell carcinoma subtypes on CT and MRI. American Journal of Roentgenology 2013; 201:347–55.doi: 10.2214/AJR.12.9451

work page doi:10.2214/ajr.12.9451 2013

[11] [11]

Renal oncocytoma: CT features cannot reliably distinguish oncocytoma from other renal neoplasms

Choudhary S, Rajesh A, Mayer N, Mulcahy K, and Haroon A. Renal oncocytoma: CT features cannot reliably distinguish oncocytoma from other renal neoplasms. Clinical radiology 2009; 64:517–22.doi: 10.1016/j.crad.2008.12.011

work page doi:10.1016/j.crad.2008.12.011 2009

[12] [12]

MRI features of renal oncocytoma and chromophobe renal cell carcinoma

Rosenkrantz AB, Hindman N, Fitzgerald EF, Niver BE, Melamed J, and Babb JS. MRI features of renal oncocytoma and chromophobe renal cell carcinoma. American Journal of Roentgenology 2010; 195:W421–W427.doi: 10.2214/AJR.10.4718

work page doi:10.2214/ajr.10.4718 2010

[13] [13]

Deep learning for end-to-end kidney cancer diagnosis on multi-phase abdominal computed tomography

Uhm KH, Jung SW, Choi MH, Shin HK, Yoo JI, et al. Deep learning for end-to-end kidney cancer diagnosis on multi-phase abdominal computed tomography. NPJ precision oncology 2021; 5:54.doi: 10.1038/s41698-021-00195-y

work page doi:10.1038/s41698-021-00195-y 2021

[14] [14]

Value of radiomics in differential diagnosis of chromophobe renal cell carcinoma and renal oncocytoma

Li Y, Huang X, Xia Y, and Long L. Value of radiomics in differential diagnosis of chromophobe renal cell carcinoma and renal oncocytoma. Abdominal Radiology 2020; 45:3193–201.doi: 10.1007/s00261-019-02269-9

work page doi:10.1007/s00261-019-02269-9 2020

[15] [15]

Cancers 2022; 14:3609.doi: 10.3390/cancers14153609

AlhussainiAJ,SteeleJD,andNabiG.Comparativeanalysisforthedistinctionofchromophobe renal cell carcinoma from renal oncocytoma in computed tomography imaging using machine learning radiomics analysis. Cancers 2022; 14:3609.doi: 10.3390/cancers14153609

work page doi:10.3390/cancers14153609 2022

[16] [16]

Apparent diffusion coefficient map-based texture analysis for the differentiation of chromophobe renal cell carcinoma from renal oncocytoma

Uchida Y, Yoshida S, Arita Y, Shimoda H, Kimura K, et al. Apparent diffusion coefficient map-based texture analysis for the differentiation of chromophobe renal cell carcinoma from renal oncocytoma. Diagnostics 2022; 12:817.doi: 10.3390/diagnostics12040817

work page doi:10.3390/diagnostics12040817 2022

[17] [17]

Use of MRI in differentiation of papillary renal cell carcinoma subtypes: qualitative and quan- titative analysis

Doshi AM, Ream JM, Kierans AS, Bilbily M, Rusinek H, Huang WC, and Chandarana H. Use of MRI in differentiation of papillary renal cell carcinoma subtypes: qualitative and quan- titative analysis. American Journal of Roentgenology 2016; 206:566–72.doi: 10.2214/AJR.1 5.15004 11

work page doi:10.2214/ajr.1 2016

[18] [18]

Differential diagnosis of type 1 and type 2 papillary renal cell carcinoma based on enhanced CT radiomics nomogram

Gao Y, Wang X, Wang S, Miao Y, Zhu C, Li C, Huang G, Jiang Y, Li J, Zhao X, et al. Differential diagnosis of type 1 and type 2 papillary renal cell carcinoma based on enhanced CT radiomics nomogram. Frontiers in Oncology 2022; 12:854979.doi: 10.3389/fonc.2022.85 4979

work page doi:10.3389/fonc.2022.85 2022

[19] [19]

2603.02790

Stegeman M, Philipp L, Graaf F van der, D’Amato M, Grisi C, Builtjes L, Bosma JS, Lefkes J, Weber RA, Meakin JA, et al. Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language. arXiv preprint arXiv:2603.02790 2026.doi: 10.48550/arXiv.2603.02790

work page doi:10.48550/arxiv.2603.02790 2026

[20] [20]

Incompletely characterized incidental renal masses: emerging data support conservative management

Silverman SG, Israel GM, and Trinh QD. Incompletely characterized incidental renal masses: emerging data support conservative management. Radiology 2015; 275:28–42.doi: 10.1148/r adiol.14141144

work page doi:10.1148/r 2015

[21] [21]

2023 Kidney and Kidney Tumor Segmentation Challenge

Heller N, Isensee F, Tejpau R, Wood A, Papanikolopoulos N, and Weight C. 2023 Kidney and Kidney Tumor Segmentation Challenge. 2023 Apr.doi: 10.5281/zenodo.7840134. Available from: https://doi.org/10.5281/zenodo.7840134

work page doi:10.5281/zenodo.7840134 2023

[22] [22]

The cancer genome atlas kidney renal clear cell carcinoma collection (TCGA-KIRC)(Version 3)[Data set]

Akin O, Elnajjar P, Heller M, Jarosz R, Erickson BJ, et al. The cancer genome atlas kidney renal clear cell carcinoma collection (TCGA-KIRC)(Version 3)[Data set]. Cancer Imaging Arch 2016.doi: 10.7937/K9/TCIA.2016.V6PBVTDR

work page doi:10.7937/k9/tcia.2016.v6pbvtdr 2016

[23] [23]

W. LM, A. GRSC, and S. L. The Cancer Genome Atlas Kidney Chromophobe Collection (TCGA-KICH) (Version 3) [Data set]. Cancer Imaging Arch 2016.doi: 10.7937/K9/TCIA.2 016.YU3RBCZN

work page doi:10.7937/k9/tcia.2 2016

[24] [24]

The cancer genome atlas cervical kidney renal papillary cell carcinoma collection (TCGA-KIRP), version 4

Linehan M, Gautam R, Kirk S, Lee Y, Roche C, Bonaccio E, Filippini J, Rieger-Christ K, Lemmerman J, and Jarosz R. The cancer genome atlas cervical kidney renal papillary cell carcinoma collection (TCGA-KIRP), version 4. The Cancer Imaging Archive 2016.doi: 10.7 937/K9/TCIA.2016.ACWOGBEF

work page 2016

[25] [25]

Accessible and Reproducible Renal Cell Carcinoma Research Through Open-Sourcing Data and Annotations

de Boer S, Häntze H, Ziegelmayer S, Ginneken B van, Prokop M, Bressem KK, and Hering A. Accessible and Reproducible Renal Cell Carcinoma Research Through Open-Sourcing Data and Annotations. medRxiv 2026.doi: 10.64898/2026.04.22.26351451

work page doi:10.64898/2026.04.22.26351451 2026

[26] [26]

Robust Kidney Abnormality Segmentation: A Validation Study of an AI-Based Framework

de Boer S, Häntze H, Venkadesh KV, Buser MA, Mamani GEH, Xu L, Adams LC, Nawabi J, Bressem KK, Ginneken B van, et al. Robust Kidney Abnormality Segmentation: A Validation Study of an AI-Based Framework. arXiv preprint arXiv:2505.07573 2025.doi: 10.48550/arXi v.2505.07573

work page doi:10.48550/arxi 2025

[27] [27]

Foundation model for cancer imaging biomarkers

Pai S, Bontempi D, Hadzic I, Prudente V, Sokač M, Chaunzwa TL, Bernatz S, Hosny A, Mak RH, Birkbak NJ, et al. Foundation model for cancer imaging biomarkers. Nature machine intelligence 2024; 6:354–67.doi: 10.1038/s42256-024-00807-9

work page doi:10.1038/s42256-024-00807-9 2024

[28] [28]

arXiv preprint arXiv:2501.09001

Pai S, Hadzic I, Bontempi D, Bressem K, Kann BH, Fedorov A, Mak RH, and Aerts HJ. Vision foundation models for computed tomography. arXiv preprint arXiv:2501.09001 2025. doi: 10.48550/arXiv.2501.09001

work page doi:10.48550/arxiv.2501.09001 2025

[29] [29]

Tissue concepts: Supervised foundation models in computational pathology

Nicke T, Schäfer JR, Höfener H, Feuerhake F, Merhof D, Kießling F, and Lotz J. Tissue concepts: Supervised foundation models in computational pathology. Computers in biology and medicine 2025; 186:109621.doi: 10.1016/j.compbiomed.2024.109621

work page doi:10.1016/j.compbiomed.2024.109621 2025

[30] [30]

Proceedings of the 22nd

Chen T and Guestrin C. Xgboost: A scalable tree boosting system.Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016 :785–94. doi: 10.1145/2939672.293978 12

work page doi:10.1145/2939672.293978 2016

[31] [31]

Cancer research 2017; 77:e104–e107.doi: 10.1 158/0008-5472.CAN-17-0339

VanGriethuysenJJ,FedorovA,ParmarC,HosnyA,AucoinN,etal.Computationalradiomics system to decode the radiographic phenotype. Cancer research 2017; 77:e104–e107.doi: 10.1 158/0008-5472.CAN-17-0339

work page 2017

[32] [32]

Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians

Carpenter J and Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in medicine 2000; 19:1141–64.doi: 10.1002/(sici)10 97-0258(20000515)19:9<1141::aid-sim479>3.0.co;2-f

work page doi:10.1002/(sici)10 2000

[33] [33]

FindBounce: Package for multi-field bounce actions

Mandel M and Betensky RA. Simultaneous confidence intervals based on the percentile boot- strap approach. Computational statistics & data analysis 2008; 52:2158–65.doi: 10.1016/j.c sda.2007.07.005

work page doi:10.1016/j.c 2008

[34] [34]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

McInnes L, Healy J, and Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 2018.doi: 10.48550/arXiv.1802.03426 13

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1802.03426 2018