pith. sign in

arxiv: 2604.24230 · v1 · submitted 2026-04-27 · 💻 cs.CV

Radiomics- and Clinical Feature-Driven Prediction of Volumetric Response in Skull-Base Meningioma after CyberKnife Radiosurgery

Pith reviewed 2026-05-08 04:37 UTC · model grok-4.3

classification 💻 cs.CV
keywords radiomicsmeningiomaCyberKnifevolumetric responsemachine learningprediction modelMRIskull base
0
0 comments X

The pith

Radiomic features from pre-treatment MRI combined with clinical variables predict volumetric response to CyberKnife radiosurgery in skull-base meningiomas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests a framework that extracts radiomic features from baseline MRI scans of skull-base meningiomas and merges them with clinical information to forecast which tumors will shrink after CyberKnife treatment. It applies six machine-learning models inside a nested cross-validation loop on data from 104 patients, finding that the TabPFN model reaches an AUC of 0.81. The goal is to give clinicians an early, non-invasive way to identify likely responders, because skull-base tumors sit near critical structures and not every patient benefits equally from radiosurgery. If the approach holds, treatment decisions could shift from uniform application toward personalized selection based on imaging-derived signatures.

Core claim

Pre-treatment MRI radiomic features plus clinical variables contain enough information to classify volumetric response after CyberKnife radiosurgery; when modeled with TabPFN under nested cross-validation, this combination yields an AUC of 0.81 along with favorable sensitivity and specificity metrics, outperforming the other five tested algorithms on the 104-patient cohort.

What carries the argument

Radiomics-plus-clinical feature set fed into TabPFN (Tabular Prior-Data Fitted Network) inside a nested cross-validation scheme that separates feature selection, model training, and performance estimation to guard against overfitting in the high-dimensional, small-sample regime.

If this is right

  • Clinicians could obtain a probability score for volumetric response before deciding on radiosurgery versus other options.
  • The same feature set and validation protocol could be applied to stratify patients for alternative radiation doses or follow-up schedules.
  • High-performing models like TabPFN reduce the need for manual feature engineering while still handling the small-sample, high-feature problem common in radiomics.
  • Volumetric response becomes a measurable, image-based endpoint that can be predicted earlier than progression-free survival.
  • The nested cross-validation workflow provides a reproducible template for other single-center radiomics studies facing similar data constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the model generalizes, it could be embedded in treatment-planning software to flag low-response cases for closer monitoring or alternative therapies.
  • Extending the pipeline to include post-treatment scans might allow early detection of non-responders and adaptive re-planning.
  • The approach might transfer to other skull-base lesions or different radiosurgery platforms, provided the MRI acquisition protocol remains comparable.
  • Combining this prediction with genomic or molecular markers could further improve accuracy, though that step lies outside the current study.

Load-bearing premise

The extracted radiomic features and clinical variables together carry a genuine, generalizable signal about treatment response rather than noise or cohort-specific artifacts.

What would settle it

A prospective test on an independent set of at least 50 new patients where the same radiomic pipeline and TabPFN model produces an AUC below 0.70.

Figures

Figures reproduced from arXiv: 2604.24230 by Alberto Redaelli, Cristiana Pedone, Domenico Aquino, Elena De Martin, Giacomo Conte, Laura Fariselli, Riccardo Barbieri, Simona Ferrante, Yin Lin.

Figure 1
Figure 1. Figure 1: Overview of the proposed radiomics-based pipeline fo view at source ↗
Figure 2
Figure 2. Figure 2: Performance of the evaluated classification models a view at source ↗
read the original abstract

Skull-base meningiomas are often characterized by favorable long-term prognosis, yet their anatomical complexity and proximity to critical neurovascular structures make treatment selection challenging. Stereotactic radiosurgery with CyberKnife represents an effective therapeutic option when surgical resection is not feasible; however, not all patients benefit equally from this treatment. Early identification of patients likely to respond to radiosurgery remains an open clinical problem. In this study, we propose a radiomics- and clinical feature-driven framework for predicting volumetric response in skull-base meningiomas treated with CyberKnife. Unlike most existing approaches that focus on progression-free survival or recurrence, our method targets volumetric response as an indicator of treatment efficacy. Pre-treatment MRI images from 104 patients were processed to extract radiomic features, which were combined with clinical variables and analyzed using six models. To ensure methodological rigor, the entire modeling process was implemented within a nested cross-validation scheme. Among the evaluated models, TabPFN achieved the best overall performance, with an AUC of 0.81 and consistently favorable classification metrics. These results suggest that advanced machine learning architectures, when combined with robust validation strategies, can effectively capture patterns associated with treatment response even in small-sample, high-dimensional settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a radiomics- and clinical feature-driven machine learning framework to predict volumetric response to CyberKnife radiosurgery in skull-base meningiomas. Pre-treatment MRI scans from 104 patients are used to extract radiomic features that are combined with clinical variables; six models are evaluated under a nested cross-validation scheme, with TabPFN achieving the highest AUC of 0.81 and favorable classification metrics. The central goal is to enable early identification of treatment responders in a setting where anatomical complexity complicates decision-making.

Significance. If the performance generalizes, the approach could support personalized treatment selection for skull-base meningiomas, where not all patients respond equally to radiosurgery. The nested cross-validation is a clear methodological strength that reduces overfitting risk in the small-n, high-dimensional regime typical of radiomics. However, the lack of external validation and missing details on feature handling limit the immediate translational significance and the strength of the generalizability claim.

major comments (3)
  1. Abstract: The headline result (TabPFN AUC 0.81) is reported without any information on the number of radiomic features initially extracted, the feature-selection procedure, or the final retained feature count. In a p ≫ n setting with n=104, this omission is load-bearing because it prevents assessment of whether the nested CV truly prevented leakage or selection bias.
  2. Abstract and Methods: No details are provided on the exact volumetric response threshold used to binarize the outcome, the class balance in the 104-patient cohort, or whether feature selection and hyperparameter optimization occurred strictly inside the inner CV loop. These omissions directly affect the interpretability and credibility of the reported classification metrics.
  3. Results: The claim that TabPFN achieved the best overall performance lacks a permutation-test baseline or a clinical-variables-only comparator. Without such controls, it is impossible to determine whether the radiomic features contribute recoverable signal beyond chance or simple clinical predictors.
minor comments (2)
  1. Abstract: Adding the total number of patients and the list of six models evaluated would improve completeness and allow readers to immediately gauge the experimental scope.
  2. Discussion: A dedicated limitations paragraph explicitly addressing the single-center design and absence of external validation would strengthen the manuscript and align with standard reporting expectations for radiomics studies.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the thoughtful comments, which have helped us improve the clarity and rigor of our manuscript. Below we provide point-by-point responses to the major comments. We have revised the manuscript to incorporate additional details and analyses as suggested.

read point-by-point responses
  1. Referee: Abstract: The headline result (TabPFN AUC 0.81) is reported without any information on the number of radiomic features initially extracted, the feature-selection procedure, or the final retained feature count. In a p ≫ n setting with n=104, this omission is load-bearing because it prevents assessment of whether the nested CV truly prevented leakage or selection bias.

    Authors: We acknowledge that the abstract lacked these specifics, which are crucial for evaluating the methodology in a high-dimensional setting. The full methods section describes the extraction of radiomic features from pre-treatment MRI using standard software and the subsequent combination with clinical variables. To address this concern directly, we have updated the abstract to report the initial number of radiomic features extracted, the feature selection approach employed, and the number of features retained in the final models. Additionally, we have clarified that all feature selection steps were confined to the inner loop of the nested cross-validation to avoid any information leakage. revision: yes

  2. Referee: Abstract and Methods: No details are provided on the exact volumetric response threshold used to binarize the outcome, the class balance in the 104-patient cohort, or whether feature selection and hyperparameter optimization occurred strictly inside the inner CV loop. These omissions directly affect the interpretability and credibility of the reported classification metrics.

    Authors: Thank you for highlighting these omissions. We have revised both the abstract and the methods section to specify the volumetric response threshold used for binarization, the resulting class distribution in the cohort, and to explicitly state that feature selection and hyperparameter optimization were performed strictly within the inner cross-validation loop. This ensures full transparency regarding the modeling pipeline and supports the credibility of the performance metrics. revision: yes

  3. Referee: Results: The claim that TabPFN achieved the best overall performance lacks a permutation-test baseline or a clinical-variables-only comparator. Without such controls, it is impossible to determine whether the radiomic features contribute recoverable signal beyond chance or simple clinical predictors.

    Authors: We agree that additional controls would strengthen the interpretation of our results. While the comparison among the six models provides some context, we did not originally include a permutation test or an explicit clinical-variables-only baseline. In the revised manuscript, we have added a clinical-variables-only model for comparison and performed a permutation test to assess whether the performance exceeds what would be expected by chance. These additions demonstrate that the inclusion of radiomic features provides meaningful improvement over clinical variables alone and over random baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML performance via nested CV on held-out data

full rationale

The paper reports an AUC of 0.81 for TabPFN as an empirical result obtained through nested cross-validation on a 104-patient cohort, with radiomic and clinical features as inputs and volumetric response as the target label. No equations, derivations, or self-referential steps are described that would reduce the reported performance to parameters fitted on the same outcome by construction. The modeling pipeline is presented as a standard supervised learning workflow with internal validation; no self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the central claim. This is a self-contained empirical evaluation whose validity can be assessed against external benchmarks or replication, rather than being tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework assumes radiomic features capture biologically relevant tumor properties that correlate with radiosensitivity; no new physical entities are introduced, but the claim depends on the unproven generalizability of the learned patterns beyond the 104-patient set.

free parameters (1)
  • radiomic feature selection and model hyperparameters
    Feature extraction yields hundreds of variables whose selection and the internal parameters of TabPFN and competing models are tuned on the data.
axioms (1)
  • domain assumption Pre-treatment MRI radiomic features plus clinical variables contain predictive information about post-treatment volumetric change
    Invoked by the decision to extract and combine these features for modeling.

pith-pipeline@v0.9.0 · 5549 in / 1301 out tokens · 29707 ms · 2026-05-08T04:37:35.461115+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    & Helseth, E

    Meling, T., Da Broi, M., Scheie, D. & Helseth, E. Meningio mas: skull base versus non-skull base. Neurosurgical Review. 42, 163-173 (2019)

  2. [2]

    & Park, M

    Ilyas, A., Przybylowski, C., Chen, C., Ding, D., Foreman , P ., Buell, T., Taylor, D., Kalani, M. & Park, M. Preoperative embolizat ion of skull base meningiomas: a systematic review. Journal Of Clinical Neuroscience. 59 pp. 259-264 (2019)

  3. [3]

    & Trejo, J

    El´ ıas, J., Cacho, A., Luj´ an, A., L ´ opez, J. & Trejo, J. E fficacy and Safety of Stereotactic Radiosurgery in Patients With Large -V olume Meningiomas ≥ 10cm3: A Systematic Review and Single-Arm Meta- Analysis. Cureus. 17 (2025)

  4. [4]

    & Meling, T

    Lem´ ee, J., Corniola, M., Da Broi, M., Joswig, H., Scheie , D., Schaller, K., Helseth, E. & Meling, T. Extent of resection in meningiom a: predictive factors and clinical implications. Scientific Reports . 9, 5944 (2019)

  5. [5]

    & Ferrante, S

    Lin, Y ., Barbieri, R., Aquino, D., Lauria, G., Grisoli, M ., De Momi, E., Redaelli, A. & Ferrante, S. Glioblastoma Overall Survival P rediction With Vision Transformers. 2025 47th Annual International Conference Of The IEEE Engineering In Medicine And Biology Society (EMB C). pp. 1-4 (2025)

  6. [6]

    & Ferrante, S

    Lin, Y ., Aquino, D., Lauria, G., Grisoli, M., Redaelli, A ., Barbieri, R. & Ferrante, S. Lightweight ensemble vision transformer fra mework for non-invasive survival prediction in glioblastoma. Neurocomputing. pp. 133303 (2026)

  7. [7]

    & Francescon, P

    Colombo, F., Casentini, L., Cavedon, C., Scalchi, P ., Co ra, S. & Francescon, P . Cyberknife radiosurgery for benign meningi omas: short- term results in 199 patients. Neurosurgery. 64, A7-A13 (2009)

  8. [8]

    & Helmy, A

    Abualnaja, S., Morris, J., Rashid, H., Cook, W. & Helmy, A . Machine learning for predicting post-operative outcomes in mening iomas: a systematic review and meta-analysis. Acta Neurochirurgica. 166, 505 (2024)

  9. [9]

    & Others The developm ent of a combined clinico-radiomics model for predicting post- operative recurrence in atypical meningiomas: a multicenter study

    Ren, L., Chen, J., Deng, J., Qing, X., Cheng, H., Wang, D., Ji, J., Chen, H., Juratli, T., Wakimoto, H. & Others The developm ent of a combined clinico-radiomics model for predicting post- operative recurrence in atypical meningiomas: a multicenter study. Journal Of Neuro-Oncology. 166, 59-71 (2024)

  10. [10]

    & Y oon, H

    Park, C., Choi, S., Eom, J., Byun, H., Ahn, S., Chang, J., Kim, S., Lee, S., Park, Y . & Y oon, H. An interpretable radiomics model to select patients for radiotherapy after surgery for WHO grade 2 meni ngiomas. Radiation Oncology. 17, 147 (2022)

  11. [11]

    & Other s Targeted gene expression profiling predicts meningioma out comes and radiotherapy responses

    Chen, W., Choudhury, A., Y oungblood, M., Polley, M., Lu cas, C., Mirchia, K., Maas, S., Suwala, A., Won, M., Bayley, J. & Other s Targeted gene expression profiling predicts meningioma out comes and radiotherapy responses. Nature Medicine. 29, 3067-3076 (2023)

  12. [12]

    , Lee, G., Rogers, L., Zuccato, J., V oisin, M., Munoz, D

    Wang, J., Landry, A., Nassiri, F., Merali, Z., Patel, Z. , Lee, G., Rogers, L., Zuccato, J., V oisin, M., Munoz, D. & Others Outcomes and predictors of response to fractionated radiotherapy as primary treatm ent for in- tracranial meningiomas. Clinical And Translational Radiation Oncology. 41 pp. 100631 (2023)

  13. [13]

    & Zhou, J

    Han, T., Liu, X. & Zhou, J. Progression/recurrence of me ningioma: an imaging review based on magnetic resonance imaging. W orld Neuro- surgery. 186 pp. 98-107 (2024)

  14. [14]

    & Stoeter, P

    Speckter, H., Bido, J., Hernandez, G., Rivera, D., Suaz o, L., V alenzuela, S., Miches, I., Oviedo, J., Gonzalez, C. & Stoeter, P . Pretreatment texture analysis of routine MR images and shape analysis of the diffu sion tensor for prediction of volumetric response after radiosurgery f or meningioma. Journal Of Neurosurgery . 129, 31-37 (2018)

  15. [15]

    & Others MRI radiomics in the prediction of the volumetric res ponse in meningiomas after gamma knife radiosurgery

    Speckter, H., Radulovic, M., Trivodaliev, K., Vranes, V ., Joaquin, J., Hernandez, W., Mota, A., Bido, J., Hernandez, G., Rivera, D. & Others MRI radiomics in the prediction of the volumetric res ponse in meningiomas after gamma knife radiosurgery. Journal Of Neuro- Oncology. 159, 281-291 (2022)

  16. [16]

    & Salvi, M

    Seoni, S., Shahini, A., Meiburger, K., Marzola, F., Rot unno, G., Acharya, U., Molinari, F. & Salvi, M. All you need is data preparation: A sys- tematic review of image harmonization techniques in Multi- center/device studies for medical support systems. Computer Methods And Programs In Biomedicine . 250 pp. 108200 (2024)

  17. [17]

    & Gee, J

    Tustison, N., Avants, B., Cook, P ., Zheng, Y ., Egan, A., Y ushkevich, P . & Gee, J. N4ITK: improved N3 bias correction. IEEE Transactions On Medical Imaging . 29, 1310-1320 (2010)

  18. [18]

    , Aucoin, N., Narayan, V ., Beets-Tan, R., Fillion-Robin, J., Pieper, S

    V an Griethuysen, J., Fedorov, A., Parmar, C., Hosny, A. , Aucoin, N., Narayan, V ., Beets-Tan, R., Fillion-Robin, J., Pieper, S. & Aerts, H. Computational radiomics system to decode the radiographic phenotype. Cancer Research. 77, e104-e107 (2017)

  19. [19]

    & Simon, R

    V arma, S. & Simon, R. Bias in error estimation when using cross- validation for model selection. BMC Bioinformatics . 7, 91 (2006)

  20. [20]

    & Talbot, N

    Cawley, G. & Talbot, N. On over-fitting in model selectio n and subse- quent selection bias in performance evaluation. The Journal Of Machine Learning Research. 11 pp. 2079-2107 (2010)

  21. [21]

    & Zongker, D

    Jain, A. & Zongker, D. Feature selection: Evaluation, a pplication, and small sample performance. IEEE Transactions On Pattern Analysis And Machine Intelligence. 19, 153-158 (2002)

  22. [22]

    & Others Machine learning based radiomics approach for ou tcome prediction of meningioma–a systematic review

    Saroh, S., Pendem, S., Prakashini, K., Nayak, S., Menon , G., Divya, B. & Others Machine learning based radiomics approach for ou tcome prediction of meningioma–a systematic review. F1000Research. 14 pp. 330 (2025)

  23. [23]

    Random forests

    Breiman, L. Random forests. Machine Learning . 45, 5-32 (2001)

  24. [24]

    XGBoost: A Scalable Tree Boosting System

    Chen, T. XGBoost: A Scalable Tree Boosting System. Cornell Univer- sity. (2016)

  25. [25]

    & Gulin, A

    Prokhorenkova, L., Gusev, G., V orobev, A., Dorogush, A . & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances In Neural Information Processing Systems . 31 (2018)

  26. [26]

    & Hutter, F

    Hollmann, N., M¨ uller, S., Purucker, L., Krishnakumar , A., K¨ orfer, M., Hoo, S., Schirrmeister, R. & Hutter, F. Accurate prediction s on small data with a tabular foundation model. Nature. 637, 319-326 (2025)

  27. [27]

    & Kegelmeyer, W

    Chawla, N., Bowyer, K., Hall, L. & Kegelmeyer, W. SMOTE: synthetic minority over-sampling technique. Journal Of Artificial Intelligence Research. 16 pp. 321-357 (2002)

  28. [28]

    & Cuocolo, R

    Klontzas, M., Kocak, B. & Cuocolo, R. Sample size estima tion for radiomics studies: an overlooked problem. European Radiology. pp. 1-2 (2025)