Radiomics- and Clinical Feature-Driven Prediction of Volumetric Response in Skull-Base Meningioma after CyberKnife Radiosurgery
Pith reviewed 2026-05-08 04:37 UTC · model grok-4.3
The pith
Radiomic features from pre-treatment MRI combined with clinical variables predict volumetric response to CyberKnife radiosurgery in skull-base meningiomas.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pre-treatment MRI radiomic features plus clinical variables contain enough information to classify volumetric response after CyberKnife radiosurgery; when modeled with TabPFN under nested cross-validation, this combination yields an AUC of 0.81 along with favorable sensitivity and specificity metrics, outperforming the other five tested algorithms on the 104-patient cohort.
What carries the argument
Radiomics-plus-clinical feature set fed into TabPFN (Tabular Prior-Data Fitted Network) inside a nested cross-validation scheme that separates feature selection, model training, and performance estimation to guard against overfitting in the high-dimensional, small-sample regime.
If this is right
- Clinicians could obtain a probability score for volumetric response before deciding on radiosurgery versus other options.
- The same feature set and validation protocol could be applied to stratify patients for alternative radiation doses or follow-up schedules.
- High-performing models like TabPFN reduce the need for manual feature engineering while still handling the small-sample, high-feature problem common in radiomics.
- Volumetric response becomes a measurable, image-based endpoint that can be predicted earlier than progression-free survival.
- The nested cross-validation workflow provides a reproducible template for other single-center radiomics studies facing similar data constraints.
Where Pith is reading between the lines
- If the model generalizes, it could be embedded in treatment-planning software to flag low-response cases for closer monitoring or alternative therapies.
- Extending the pipeline to include post-treatment scans might allow early detection of non-responders and adaptive re-planning.
- The approach might transfer to other skull-base lesions or different radiosurgery platforms, provided the MRI acquisition protocol remains comparable.
- Combining this prediction with genomic or molecular markers could further improve accuracy, though that step lies outside the current study.
Load-bearing premise
The extracted radiomic features and clinical variables together carry a genuine, generalizable signal about treatment response rather than noise or cohort-specific artifacts.
What would settle it
A prospective test on an independent set of at least 50 new patients where the same radiomic pipeline and TabPFN model produces an AUC below 0.70.
Figures
read the original abstract
Skull-base meningiomas are often characterized by favorable long-term prognosis, yet their anatomical complexity and proximity to critical neurovascular structures make treatment selection challenging. Stereotactic radiosurgery with CyberKnife represents an effective therapeutic option when surgical resection is not feasible; however, not all patients benefit equally from this treatment. Early identification of patients likely to respond to radiosurgery remains an open clinical problem. In this study, we propose a radiomics- and clinical feature-driven framework for predicting volumetric response in skull-base meningiomas treated with CyberKnife. Unlike most existing approaches that focus on progression-free survival or recurrence, our method targets volumetric response as an indicator of treatment efficacy. Pre-treatment MRI images from 104 patients were processed to extract radiomic features, which were combined with clinical variables and analyzed using six models. To ensure methodological rigor, the entire modeling process was implemented within a nested cross-validation scheme. Among the evaluated models, TabPFN achieved the best overall performance, with an AUC of 0.81 and consistently favorable classification metrics. These results suggest that advanced machine learning architectures, when combined with robust validation strategies, can effectively capture patterns associated with treatment response even in small-sample, high-dimensional settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a radiomics- and clinical feature-driven machine learning framework to predict volumetric response to CyberKnife radiosurgery in skull-base meningiomas. Pre-treatment MRI scans from 104 patients are used to extract radiomic features that are combined with clinical variables; six models are evaluated under a nested cross-validation scheme, with TabPFN achieving the highest AUC of 0.81 and favorable classification metrics. The central goal is to enable early identification of treatment responders in a setting where anatomical complexity complicates decision-making.
Significance. If the performance generalizes, the approach could support personalized treatment selection for skull-base meningiomas, where not all patients respond equally to radiosurgery. The nested cross-validation is a clear methodological strength that reduces overfitting risk in the small-n, high-dimensional regime typical of radiomics. However, the lack of external validation and missing details on feature handling limit the immediate translational significance and the strength of the generalizability claim.
major comments (3)
- Abstract: The headline result (TabPFN AUC 0.81) is reported without any information on the number of radiomic features initially extracted, the feature-selection procedure, or the final retained feature count. In a p ≫ n setting with n=104, this omission is load-bearing because it prevents assessment of whether the nested CV truly prevented leakage or selection bias.
- Abstract and Methods: No details are provided on the exact volumetric response threshold used to binarize the outcome, the class balance in the 104-patient cohort, or whether feature selection and hyperparameter optimization occurred strictly inside the inner CV loop. These omissions directly affect the interpretability and credibility of the reported classification metrics.
- Results: The claim that TabPFN achieved the best overall performance lacks a permutation-test baseline or a clinical-variables-only comparator. Without such controls, it is impossible to determine whether the radiomic features contribute recoverable signal beyond chance or simple clinical predictors.
minor comments (2)
- Abstract: Adding the total number of patients and the list of six models evaluated would improve completeness and allow readers to immediately gauge the experimental scope.
- Discussion: A dedicated limitations paragraph explicitly addressing the single-center design and absence of external validation would strengthen the manuscript and align with standard reporting expectations for radiomics studies.
Simulated Author's Rebuttal
We are grateful to the referee for the thoughtful comments, which have helped us improve the clarity and rigor of our manuscript. Below we provide point-by-point responses to the major comments. We have revised the manuscript to incorporate additional details and analyses as suggested.
read point-by-point responses
-
Referee: Abstract: The headline result (TabPFN AUC 0.81) is reported without any information on the number of radiomic features initially extracted, the feature-selection procedure, or the final retained feature count. In a p ≫ n setting with n=104, this omission is load-bearing because it prevents assessment of whether the nested CV truly prevented leakage or selection bias.
Authors: We acknowledge that the abstract lacked these specifics, which are crucial for evaluating the methodology in a high-dimensional setting. The full methods section describes the extraction of radiomic features from pre-treatment MRI using standard software and the subsequent combination with clinical variables. To address this concern directly, we have updated the abstract to report the initial number of radiomic features extracted, the feature selection approach employed, and the number of features retained in the final models. Additionally, we have clarified that all feature selection steps were confined to the inner loop of the nested cross-validation to avoid any information leakage. revision: yes
-
Referee: Abstract and Methods: No details are provided on the exact volumetric response threshold used to binarize the outcome, the class balance in the 104-patient cohort, or whether feature selection and hyperparameter optimization occurred strictly inside the inner CV loop. These omissions directly affect the interpretability and credibility of the reported classification metrics.
Authors: Thank you for highlighting these omissions. We have revised both the abstract and the methods section to specify the volumetric response threshold used for binarization, the resulting class distribution in the cohort, and to explicitly state that feature selection and hyperparameter optimization were performed strictly within the inner cross-validation loop. This ensures full transparency regarding the modeling pipeline and supports the credibility of the performance metrics. revision: yes
-
Referee: Results: The claim that TabPFN achieved the best overall performance lacks a permutation-test baseline or a clinical-variables-only comparator. Without such controls, it is impossible to determine whether the radiomic features contribute recoverable signal beyond chance or simple clinical predictors.
Authors: We agree that additional controls would strengthen the interpretation of our results. While the comparison among the six models provides some context, we did not originally include a permutation test or an explicit clinical-variables-only baseline. In the revised manuscript, we have added a clinical-variables-only model for comparison and performed a permutation test to assess whether the performance exceeds what would be expected by chance. These additions demonstrate that the inclusion of radiomic features provides meaningful improvement over clinical variables alone and over random baselines. revision: yes
Circularity Check
No circularity: empirical ML performance via nested CV on held-out data
full rationale
The paper reports an AUC of 0.81 for TabPFN as an empirical result obtained through nested cross-validation on a 104-patient cohort, with radiomic and clinical features as inputs and volumetric response as the target label. No equations, derivations, or self-referential steps are described that would reduce the reported performance to parameters fitted on the same outcome by construction. The modeling pipeline is presented as a standard supervised learning workflow with internal validation; no self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claim. This is a self-contained empirical evaluation whose validity can be assessed against external benchmarks or replication, rather than being tautological.
Axiom & Free-Parameter Ledger
free parameters (1)
- radiomic feature selection and model hyperparameters
axioms (1)
- domain assumption Pre-treatment MRI radiomic features plus clinical variables contain predictive information about post-treatment volumetric change
Reference graph
Works this paper leans on
-
[1]
Meling, T., Da Broi, M., Scheie, D. & Helseth, E. Meningio mas: skull base versus non-skull base. Neurosurgical Review. 42, 163-173 (2019)
work page 2019
- [2]
-
[3]
El´ ıas, J., Cacho, A., Luj´ an, A., L ´ opez, J. & Trejo, J. E fficacy and Safety of Stereotactic Radiosurgery in Patients With Large -V olume Meningiomas ≥ 10cm3: A Systematic Review and Single-Arm Meta- Analysis. Cureus. 17 (2025)
work page 2025
-
[4]
Lem´ ee, J., Corniola, M., Da Broi, M., Joswig, H., Scheie , D., Schaller, K., Helseth, E. & Meling, T. Extent of resection in meningiom a: predictive factors and clinical implications. Scientific Reports . 9, 5944 (2019)
work page 2019
-
[5]
Lin, Y ., Barbieri, R., Aquino, D., Lauria, G., Grisoli, M ., De Momi, E., Redaelli, A. & Ferrante, S. Glioblastoma Overall Survival P rediction With Vision Transformers. 2025 47th Annual International Conference Of The IEEE Engineering In Medicine And Biology Society (EMB C). pp. 1-4 (2025)
work page 2025
-
[6]
Lin, Y ., Aquino, D., Lauria, G., Grisoli, M., Redaelli, A ., Barbieri, R. & Ferrante, S. Lightweight ensemble vision transformer fra mework for non-invasive survival prediction in glioblastoma. Neurocomputing. pp. 133303 (2026)
work page 2026
-
[7]
Colombo, F., Casentini, L., Cavedon, C., Scalchi, P ., Co ra, S. & Francescon, P . Cyberknife radiosurgery for benign meningi omas: short- term results in 199 patients. Neurosurgery. 64, A7-A13 (2009)
work page 2009
-
[8]
Abualnaja, S., Morris, J., Rashid, H., Cook, W. & Helmy, A . Machine learning for predicting post-operative outcomes in mening iomas: a systematic review and meta-analysis. Acta Neurochirurgica. 166, 505 (2024)
work page 2024
-
[9]
Ren, L., Chen, J., Deng, J., Qing, X., Cheng, H., Wang, D., Ji, J., Chen, H., Juratli, T., Wakimoto, H. & Others The developm ent of a combined clinico-radiomics model for predicting post- operative recurrence in atypical meningiomas: a multicenter study. Journal Of Neuro-Oncology. 166, 59-71 (2024)
work page 2024
-
[10]
Park, C., Choi, S., Eom, J., Byun, H., Ahn, S., Chang, J., Kim, S., Lee, S., Park, Y . & Y oon, H. An interpretable radiomics model to select patients for radiotherapy after surgery for WHO grade 2 meni ngiomas. Radiation Oncology. 17, 147 (2022)
work page 2022
-
[11]
& Other s Targeted gene expression profiling predicts meningioma out comes and radiotherapy responses
Chen, W., Choudhury, A., Y oungblood, M., Polley, M., Lu cas, C., Mirchia, K., Maas, S., Suwala, A., Won, M., Bayley, J. & Other s Targeted gene expression profiling predicts meningioma out comes and radiotherapy responses. Nature Medicine. 29, 3067-3076 (2023)
work page 2023
-
[12]
, Lee, G., Rogers, L., Zuccato, J., V oisin, M., Munoz, D
Wang, J., Landry, A., Nassiri, F., Merali, Z., Patel, Z. , Lee, G., Rogers, L., Zuccato, J., V oisin, M., Munoz, D. & Others Outcomes and predictors of response to fractionated radiotherapy as primary treatm ent for in- tracranial meningiomas. Clinical And Translational Radiation Oncology. 41 pp. 100631 (2023)
work page 2023
- [13]
-
[14]
Speckter, H., Bido, J., Hernandez, G., Rivera, D., Suaz o, L., V alenzuela, S., Miches, I., Oviedo, J., Gonzalez, C. & Stoeter, P . Pretreatment texture analysis of routine MR images and shape analysis of the diffu sion tensor for prediction of volumetric response after radiosurgery f or meningioma. Journal Of Neurosurgery . 129, 31-37 (2018)
work page 2018
-
[15]
Speckter, H., Radulovic, M., Trivodaliev, K., Vranes, V ., Joaquin, J., Hernandez, W., Mota, A., Bido, J., Hernandez, G., Rivera, D. & Others MRI radiomics in the prediction of the volumetric res ponse in meningiomas after gamma knife radiosurgery. Journal Of Neuro- Oncology. 159, 281-291 (2022)
work page 2022
-
[16]
Seoni, S., Shahini, A., Meiburger, K., Marzola, F., Rot unno, G., Acharya, U., Molinari, F. & Salvi, M. All you need is data preparation: A sys- tematic review of image harmonization techniques in Multi- center/device studies for medical support systems. Computer Methods And Programs In Biomedicine . 250 pp. 108200 (2024)
work page 2024
- [17]
-
[18]
, Aucoin, N., Narayan, V ., Beets-Tan, R., Fillion-Robin, J., Pieper, S
V an Griethuysen, J., Fedorov, A., Parmar, C., Hosny, A. , Aucoin, N., Narayan, V ., Beets-Tan, R., Fillion-Robin, J., Pieper, S. & Aerts, H. Computational radiomics system to decode the radiographic phenotype. Cancer Research. 77, e104-e107 (2017)
work page 2017
-
[19]
V arma, S. & Simon, R. Bias in error estimation when using cross- validation for model selection. BMC Bioinformatics . 7, 91 (2006)
work page 2006
-
[20]
Cawley, G. & Talbot, N. On over-fitting in model selectio n and subse- quent selection bias in performance evaluation. The Journal Of Machine Learning Research. 11 pp. 2079-2107 (2010)
work page 2079
-
[21]
Jain, A. & Zongker, D. Feature selection: Evaluation, a pplication, and small sample performance. IEEE Transactions On Pattern Analysis And Machine Intelligence. 19, 153-158 (2002)
work page 2002
-
[22]
Saroh, S., Pendem, S., Prakashini, K., Nayak, S., Menon , G., Divya, B. & Others Machine learning based radiomics approach for ou tcome prediction of meningioma–a systematic review. F1000Research. 14 pp. 330 (2025)
work page 2025
- [23]
-
[24]
XGBoost: A Scalable Tree Boosting System
Chen, T. XGBoost: A Scalable Tree Boosting System. Cornell Univer- sity. (2016)
work page 2016
-
[25]
Prokhorenkova, L., Gusev, G., V orobev, A., Dorogush, A . & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances In Neural Information Processing Systems . 31 (2018)
work page 2018
-
[26]
Hollmann, N., M¨ uller, S., Purucker, L., Krishnakumar , A., K¨ orfer, M., Hoo, S., Schirrmeister, R. & Hutter, F. Accurate prediction s on small data with a tabular foundation model. Nature. 637, 319-326 (2025)
work page 2025
-
[27]
Chawla, N., Bowyer, K., Hall, L. & Kegelmeyer, W. SMOTE: synthetic minority over-sampling technique. Journal Of Artificial Intelligence Research. 16 pp. 321-357 (2002)
work page 2002
-
[28]
Klontzas, M., Kocak, B. & Cuocolo, R. Sample size estima tion for radiomics studies: an overlooked problem. European Radiology. pp. 1-2 (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.