Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

Alberto L\'opez; Anne-Marie George; John Zobolas; Marc Becker; Sebastian Fischer; Tero Aittokallio

arxiv: 2509.02648 · v1 · submitted 2025-09-02 · 🧬 q-bio.GN · cs.LG· q-bio.QM· stat.AP

Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data

John Zobolas , Anne-Marie George , Alberto L\'opez , Sebastian Fischer , Marc Becker , Tero Aittokallio This is my paper

Pith reviewed 2026-05-18 20:00 UTC · model grok-4.3

classification 🧬 q-bio.GN cs.LGq-bio.QMstat.AP

keywords feature selectionmulti-omicspancreatic cancersurvival predictionensemble methodsbiomarker discoveryprognostic modelinghybrid selection

0 comments

The pith

A hybrid ensemble feature selection method finds fewer and more stable prognostic biomarkers in pancreatic cancer multi-omics data than late-fusion CoxLasso while keeping similar performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hybrid ensemble feature selection approach that combines data subsampling, multiple survival models, and both embedded and wrapper strategies to rank and select features for predicting patient survival from high-dimensional multi-omics data. Features are aggregated through a voting-theory-inspired mechanism across models and subsamples, and the optimal number is chosen automatically via a Pareto front that balances accuracy against sparsity. When tested on multi-omics datasets from three pancreatic cancer cohorts, the method produces significantly fewer and more stable biomarkers than conventional late-fusion CoxLasso models without loss of discrimination power. This matters for turning noisy omics measurements into reliable, clinically usable prognostic signatures that avoid overfitting and support validation in new patients.

Core claim

The hybrid ensemble feature selection (hEFS) approach integrates data subsampling with multiple prognostic models using embedded and wrapper-based strategies for survival prediction. Omics features are ranked by a voting-theory-inspired aggregation across models and subsamples, and the optimal feature count is selected via a Pareto front that balances predictive accuracy and model sparsity without user-defined thresholds. Applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers than conventional late-fusion CoxLasso models while maintaining comparable discrimination performance.

What carries the argument

The hEFS method, which ranks features via voting-theory-inspired aggregation across multiple models and data subsamples then selects feature count through Pareto-front optimization balancing accuracy and sparsity.

If this is right

Prognostic models using hEFS-selected features achieve comparable survival discrimination to models using more features from CoxLasso.
The resulting biomarkers exhibit greater consistency across different data subsamples, raising reliability for downstream clinical use.
Automatic Pareto-front selection removes the need for arbitrary user thresholds when choosing how many features to retain.
The method is implemented in the open-source mlr3fselect R package and can be applied to other high-dimensional survival settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hybrid aggregation and Pareto selection steps could be tested on multi-omics data from additional cancer types to check whether stable biomarker reduction generalizes beyond pancreatic cases.
Stability gains from subsampling and voting might translate to improved reproducibility when the selected biomarkers are validated in independent external cohorts.
The Pareto-front idea for trading off accuracy and sparsity could be adapted to feature selection tasks outside survival analysis, such as classification or regression in other high-dimensional biological datasets.

Load-bearing premise

The voting-theory-inspired aggregation across subsamples and models combined with Pareto-front selection will reliably produce fewer and more stable feature sets than late-fusion CoxLasso without hidden dependence on the specific pancreatic cancer cohorts or preprocessing choices.

What would settle it

Applying hEFS and late-fusion CoxLasso to the same three cohorts or to new independent pancreatic cancer multi-omics datasets and finding no significant reduction in feature count or improvement in stability metrics while discrimination performance stays comparable would falsify the central claim.

read the original abstract

Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

hEFS gives a usable open pipeline for sparser biomarker sets in pancreatic cancer survival data, but the stability edge may come mostly from its built-in subsampling rather than the voting or Pareto pieces.

read the letter

The main thing to know is that this paper describes a hybrid ensemble feature selection method, hEFS, that runs data subsampling through several survival models, aggregates rankings with a voting step, and picks the final feature count via Pareto front optimization. On three pancreatic cancer multi-omics cohorts it reports fewer and more stable biomarkers than late-fusion CoxLasso while holding similar discrimination performance. The code ships inside the mlr3fselect package, which is a concrete plus for anyone who wants to test or reuse it. That combination of subsampling, ensemble ranking, and threshold-free selection is presented as a fresh packaging for high-dimensional survival work, and the open implementation lets others check the claims directly. For readers who routinely deal with unstable biomarker lists from omics data, this offers a ready-to-run alternative that reduces manual tuning. The practical focus on pancreatic cancer cohorts and the emphasis on sparsity plus stability line up with recurring needs in computational oncology. On the weaker side, the abstract gives no concrete stability numbers, overlap metrics, or cross-validation details, so the size of the reported gains is still unclear. The stress-test concern is reasonable: hEFS explicitly resamples the data, yet the CoxLasso baseline description does not show equivalent subsampling. If stability improves simply because more resampling is done, then the voting aggregation and Pareto step may not be carrying the main load. The full paper should clarify whether the baseline received the same resampling treatment so the novel components can be isolated. This work is aimed at statisticians and bioinformaticians who build prognostic models from high-dimensional survival data. A reader who needs a concrete, open method for feature selection in this setting will find usable material here. It has enough technical grounding and public code to merit sending out for peer review, though the comparisons will likely need tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a hybrid ensemble feature selection (hEFS) pipeline for survival prediction from multi-omics data in pancreatic cancer. The method integrates data subsampling, multiple prognostic models, a voting-theory-inspired aggregation for feature ranking, and Pareto-front optimization to select the number of features without user-defined thresholds. When applied to three pancreatic cancer cohorts, the authors report that hEFS yields significantly fewer and more stable biomarkers than late-fusion CoxLasso while preserving comparable discrimination performance. The approach is implemented as an open-source extension in the mlr3fselect R package.

Significance. If the reported gains in sparsity and stability are robustly attributable to the voting aggregation and Pareto selection rather than the embedded subsampling, the work would offer a practical, interpretable tool for high-dimensional prognostic modeling in oncology. The open-source implementation and focus on clinically relevant endpoints (survival discrimination plus biomarker reliability) strengthen its potential utility.

major comments (2)

[Results] Results section: the abstract and main text claim 'significantly fewer and more stable biomarkers' with 'comparable discrimination performance,' yet no numerical values are supplied for biomarker counts, stability metrics (e.g., Jaccard index or overlap across folds/cohorts), discrimination metrics (C-index or AUC), or the statistical tests used to support significance. This absence prevents evaluation of whether the improvements are load-bearing for the central claim.
[Methods] Methods section: hEFS explicitly incorporates data subsampling across models, while the late-fusion CoxLasso baseline is described without mention of equivalent bootstrap or subsample aggregation. Because stability is typically increased by any internal resampling, the comparison does not isolate the contribution of the voting-theory aggregation or Pareto-front step; a re-run of CoxLasso with matched subsampling is required to substantiate that the novel components drive the reported stability and sparsity advantages.

minor comments (2)

[Abstract] Abstract: the phrase 'without any user-defined thresholds' for Pareto selection should be clarified, as the front itself may still depend on the choice of objective functions or normalization.
The manuscript would benefit from explicit cross-validation details (number of folds, outer/inner loops) and cohort-specific preprocessing steps to allow full reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments in detail below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Results] Results section: the abstract and main text claim 'significantly fewer and more stable biomarkers' with 'comparable discrimination performance,' yet no numerical values are supplied for biomarker counts, stability metrics (e.g., Jaccard index or overlap across folds/cohorts), discrimination metrics (C-index or AUC), or the statistical tests used to support significance. This absence prevents evaluation of whether the improvements are load-bearing for the central claim.

Authors: We acknowledge that the manuscript as submitted does not provide the specific numerical values or statistical details necessary to fully evaluate the claims. In the revised version, we will add a dedicated results subsection or table that reports: (i) the exact number of biomarkers selected by hEFS versus late-fusion CoxLasso in each of the three cohorts; (ii) stability metrics including Jaccard indices for feature overlap across cross-validation folds and across cohorts; (iii) discrimination performance via C-index (or time-dependent AUC) with 95% confidence intervals; and (iv) the statistical tests (e.g., Wilcoxon signed-rank tests for paired comparisons) and associated p-values used to assess significance. These additions will make the central claims quantitatively verifiable. revision: yes
Referee: [Methods] Methods section: hEFS explicitly incorporates data subsampling across models, while the late-fusion CoxLasso baseline is described without mention of equivalent bootstrap or subsample aggregation. Because stability is typically increased by any internal resampling, the comparison does not isolate the contribution of the voting-theory aggregation or Pareto-front step; a re-run of CoxLasso with matched subsampling is required to substantiate that the novel components drive the reported stability and sparsity advantages.

Authors: This is a fair criticism of the experimental design. To isolate the effects of the voting-theory aggregation and Pareto-front optimization, we will conduct additional experiments in which the late-fusion CoxLasso baseline is also subjected to the same subsampling procedure used in hEFS. We will then directly compare the resulting biomarker counts, stability (e.g., Jaccard overlap), and discrimination performance between the subsampled CoxLasso and the complete hEFS pipeline. The revised manuscript will include these results and a discussion of how much of the observed advantage is attributable to the novel components versus subsampling alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an algorithmic pipeline (hEFS) that combines explicit subsampling, multiple survival models, voting aggregation, and Pareto-front selection of feature count. The central empirical claim—that hEFS yields fewer and more stable biomarkers than late-fusion CoxLasso while preserving discrimination—is presented as an observed outcome on three external cohorts rather than a quantity derived by construction from fitted parameters inside the same equations. No self-citation chain, uniqueness theorem, or ansatz smuggling is invoked to justify the core method; the approach is implemented in the external mlr3fselect package. The derivation chain therefore remains self-contained and does not reduce to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard survival-analysis assumptions and the effectiveness of the described ensemble procedure; no new free parameters, invented entities, or ad-hoc axioms beyond conventional Cox-model and machine-learning practice are introduced in the abstract.

axioms (1)

domain assumption Cox proportional hazards model assumptions hold for the prognostic models used inside the ensemble.
The abstract refers to prognostic models and late-fusion CoxLasso, which presuppose the standard Cox model.

pith-pipeline@v0.9.0 · 5726 in / 1403 out tokens · 48105 ms · 2026-05-18T20:00:14.164573+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

E., Wang, J., Mitchell, H., Webb-Robertson, B

[McDermott2013] McDermott, J. E., Wang, J., Mitchell, H., Webb-Robertson, B. J., Hafen, R., Ramey, J., & Rodland, K. D. (2013). Challenges in biomarker discovery: Combining expert insights with statistical analysis of complex omics data. Expert Opinion on Medical Diagnostics , 7 (1), 37–51. https://doi.org/10.1517/17530059.2012.718329 [Rufeng2022] Li, R.,...

work page doi:10.1517/17530059.2012.718329 2013
[2]

J., Lyssiotis, C

https://doi.org/10.3390/IJMS20194781 [Halbrook2023] Halbrook, C. J., Lyssiotis, C. A., Pasca di Magliano, M., & Maitra, A. (2023). Pancreatic cancer: Advances and challenges. Cell , 186 (8), 1729–1754. https://doi.org/10.1016/J.CELL.2023.02.014 [Tripathi2024] Tripathi, S., Tabari, A., Mansur, A., Dabbara, H., Bridge, C. P., & Daye, D. (2024). From Machine...

work page doi:10.3390/ijms20194781 2023
[3]

G., Diehn, M., André, F., Roy-Chowdhuri, S., Mountzios, G., Wistuba, I

https://doi.org/10.3390/DIAGNOSTICS14020174 [Passaro2024] Passaro, A., Al Bakir, M., Hamilton, E. G., Diehn, M., André, F., Roy-Chowdhuri, S., Mountzios, G., Wistuba, I. I., Swanton, C., & Peters, S. (2024). Cancer biomarkers: Emerging trends and clinical implications for personalized treatment. Cell , 187 (7), 1617–1635. https://doi.org/10.1016/J.CELL.20...

work page doi:10.3390/diagnostics14020174 2024
[4]

https://doi.org/10.1093/NARGAB/LQAE079 [Zhao2024] Zhao, Z., Zobolas, J., Zucknick, M., & Aittokallio, T. (2024). Tutorial on survival modeling with applications to omics data. Bioinformatics . https://doi.org/10.1093/BIOINFORMATICS/BTAE132 [Ding2022] Ding, D. Y., Li, S., Narasimhan, B., & Tibshirani, R. (2022). Cooperative learning for multiview analysis....

work page doi:10.1093/nargab/lqae079 2024
[5]

Applied Predictive Modeling

https://doi.org/10.21105/JOSS.01903 [Hastie2009] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction . Springer. [Kuhn2013] Kuhn, M., Johnson, K. (2013). “Applied Predictive Modeling.” In chapter Over-Fitting and Model Tuning, 61–92. Springer New York, New York, NY. ISBN 978-1-461...

work page doi:10.21105/joss.01903 2009
[6]

J., Hruban, R

[Raphael2017] Raphael, B. J., Hruban, R. H., Aguirre, A. J., Moffitt, R. A., Yeh, J. J., Stewart, C., Robertson, A. G., Cherniack, A. D., Gupta, M., Getz, G., Gabriel, S. B., Meyerson, M., Cibulskis, C., Fei, S. S., Hinoue, T., Shen, H., Laird, P. W., Ling, S., Lu, Y., … Zenklusen, J. C. (2017). Integrated Genomic Characterization of Pancreatic Ductal Ade...

work page doi:10.1016/j.ccell.2017.07.007 2017
[7]

Y., Shivakumar, M., Kim, D., & Honavar, V

https://doi.org/10.3390/CANCERS12113234 [El-Manzalawy2018] El-Manzalawy, Y., Hsieh, T. Y., Shivakumar, M., Kim, D., & Honavar, V. (2018). Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Medical Genomics , 11 . https://doi.org/10.1186/S12920-018-0388-0 [Jaeger2023] Jaeger, B. ...

work page doi:10.3390/cancers12113234 2018
[8]

https://doi.org/10.3322/CAAC.21871 [Pishvaian2020] Pishvaian, M

CA: A Cancer Journal for Clinicians , 75 (1), 10–45. https://doi.org/10.3322/CAAC.21871 [Pishvaian2020] Pishvaian, M. J., Blais, E. M., Brody, J. R., Lyons, E., DeArbeloa, P., Hendifar, A., Mikhail, S., Chung, V., Sahai, V., Sohal, D. P. S., Bellakbira, S., Thach, D., Rahib, L., Madhavan, S., Matrisian, L. M., & Petricoin, E. F. (2020). Overall survival i...

work page doi:10.3322/caac.21871 2020
[9]

B., Jing, Z., Chaudhary, K., Huang, S., & Garmire, L

[Poirion2021] Poirion, O. B., Jing, Z., Chaudhary, K., Huang, S., & Garmire, L. X. (2021). DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Medicine , 13 (1), 1–15. https://doi.org/10.1186/S13073-021-00930-X [Chen2023] Chen, R. J., Lu, M. Y., Williamson, D. F. K., Chen, T. Y., Lipko...

work page doi:10.1186/s13073-021-00930-x 2021
[10]

https://doi.org/10.1093/GENETICS/IYAD031 [Wang2022] Wang, J

Genetics , 224 (1). https://doi.org/10.1093/GENETICS/IYAD031 [Wang2022] Wang, J. H., Li, C. R., & Hou, P. L. (2022). Feature screening for survival trait with application to TCGA high-dimensional genomic data. PeerJ , 10 , e13098. https://doi.org/10.7717/PEERJ.13098 [Giordano2022] Giordano, F., Milito, S., & Restaino, M. (2022). A variable selection metho...

work page doi:10.1093/genetics/iyad031 2022
[11]

& Wickham, H

[Ushey2025] Ushey, K. & Wickham, H. (2025). renv: Project Environments . R package version 1.1.5, https://rstudio.github.io/renv/ [Sonabend2021] Sonabend, R., Király, F. J., Bender, A., Bischl, B., & Lang, M. (2021). mlr3proba: an R package for machine learning in survival analysis. Bioinformatics, 37(17), 2789–2791. https://doi.org/10.1093/BIOINFORMATICS...

work page doi:10.1093/bioinformatics/btab039 2025
[12]

https://doi.org/10.21105/joss.04705 [Chen2016] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 785–794. https://doi.org/10.1145/2939672.2939785 [Barnwal2022] Barnwal, A., Cho, H., & Hocking, T. (2022). Survival Regression with Acceler...

work page doi:10.21105/joss.04705 2016
[13]

https://doi.org/10.21105/JOSS.03010 [WHO2000] World Health Organization. (2000). International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Geneva: World Health Organization

work page doi:10.21105/joss.03010 2000

[1] [1]

E., Wang, J., Mitchell, H., Webb-Robertson, B

[McDermott2013] McDermott, J. E., Wang, J., Mitchell, H., Webb-Robertson, B. J., Hafen, R., Ramey, J., & Rodland, K. D. (2013). Challenges in biomarker discovery: Combining expert insights with statistical analysis of complex omics data. Expert Opinion on Medical Diagnostics , 7 (1), 37–51. https://doi.org/10.1517/17530059.2012.718329 [Rufeng2022] Li, R.,...

work page doi:10.1517/17530059.2012.718329 2013

[2] [2]

J., Lyssiotis, C

https://doi.org/10.3390/IJMS20194781 [Halbrook2023] Halbrook, C. J., Lyssiotis, C. A., Pasca di Magliano, M., & Maitra, A. (2023). Pancreatic cancer: Advances and challenges. Cell , 186 (8), 1729–1754. https://doi.org/10.1016/J.CELL.2023.02.014 [Tripathi2024] Tripathi, S., Tabari, A., Mansur, A., Dabbara, H., Bridge, C. P., & Daye, D. (2024). From Machine...

work page doi:10.3390/ijms20194781 2023

[3] [3]

G., Diehn, M., André, F., Roy-Chowdhuri, S., Mountzios, G., Wistuba, I

https://doi.org/10.3390/DIAGNOSTICS14020174 [Passaro2024] Passaro, A., Al Bakir, M., Hamilton, E. G., Diehn, M., André, F., Roy-Chowdhuri, S., Mountzios, G., Wistuba, I. I., Swanton, C., & Peters, S. (2024). Cancer biomarkers: Emerging trends and clinical implications for personalized treatment. Cell , 187 (7), 1617–1635. https://doi.org/10.1016/J.CELL.20...

work page doi:10.3390/diagnostics14020174 2024

[4] [4]

https://doi.org/10.1093/NARGAB/LQAE079 [Zhao2024] Zhao, Z., Zobolas, J., Zucknick, M., & Aittokallio, T. (2024). Tutorial on survival modeling with applications to omics data. Bioinformatics . https://doi.org/10.1093/BIOINFORMATICS/BTAE132 [Ding2022] Ding, D. Y., Li, S., Narasimhan, B., & Tibshirani, R. (2022). Cooperative learning for multiview analysis....

work page doi:10.1093/nargab/lqae079 2024

[5] [5]

Applied Predictive Modeling

https://doi.org/10.21105/JOSS.01903 [Hastie2009] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction . Springer. [Kuhn2013] Kuhn, M., Johnson, K. (2013). “Applied Predictive Modeling.” In chapter Over-Fitting and Model Tuning, 61–92. Springer New York, New York, NY. ISBN 978-1-461...

work page doi:10.21105/joss.01903 2009

[6] [6]

J., Hruban, R

[Raphael2017] Raphael, B. J., Hruban, R. H., Aguirre, A. J., Moffitt, R. A., Yeh, J. J., Stewart, C., Robertson, A. G., Cherniack, A. D., Gupta, M., Getz, G., Gabriel, S. B., Meyerson, M., Cibulskis, C., Fei, S. S., Hinoue, T., Shen, H., Laird, P. W., Ling, S., Lu, Y., … Zenklusen, J. C. (2017). Integrated Genomic Characterization of Pancreatic Ductal Ade...

work page doi:10.1016/j.ccell.2017.07.007 2017

[7] [7]

Y., Shivakumar, M., Kim, D., & Honavar, V

https://doi.org/10.3390/CANCERS12113234 [El-Manzalawy2018] El-Manzalawy, Y., Hsieh, T. Y., Shivakumar, M., Kim, D., & Honavar, V. (2018). Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Medical Genomics , 11 . https://doi.org/10.1186/S12920-018-0388-0 [Jaeger2023] Jaeger, B. ...

work page doi:10.3390/cancers12113234 2018

[8] [8]

https://doi.org/10.3322/CAAC.21871 [Pishvaian2020] Pishvaian, M

CA: A Cancer Journal for Clinicians , 75 (1), 10–45. https://doi.org/10.3322/CAAC.21871 [Pishvaian2020] Pishvaian, M. J., Blais, E. M., Brody, J. R., Lyons, E., DeArbeloa, P., Hendifar, A., Mikhail, S., Chung, V., Sahai, V., Sohal, D. P. S., Bellakbira, S., Thach, D., Rahib, L., Madhavan, S., Matrisian, L. M., & Petricoin, E. F. (2020). Overall survival i...

work page doi:10.3322/caac.21871 2020

[9] [9]

B., Jing, Z., Chaudhary, K., Huang, S., & Garmire, L

[Poirion2021] Poirion, O. B., Jing, Z., Chaudhary, K., Huang, S., & Garmire, L. X. (2021). DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Medicine , 13 (1), 1–15. https://doi.org/10.1186/S13073-021-00930-X [Chen2023] Chen, R. J., Lu, M. Y., Williamson, D. F. K., Chen, T. Y., Lipko...

work page doi:10.1186/s13073-021-00930-x 2021

[10] [10]

https://doi.org/10.1093/GENETICS/IYAD031 [Wang2022] Wang, J

Genetics , 224 (1). https://doi.org/10.1093/GENETICS/IYAD031 [Wang2022] Wang, J. H., Li, C. R., & Hou, P. L. (2022). Feature screening for survival trait with application to TCGA high-dimensional genomic data. PeerJ , 10 , e13098. https://doi.org/10.7717/PEERJ.13098 [Giordano2022] Giordano, F., Milito, S., & Restaino, M. (2022). A variable selection metho...

work page doi:10.1093/genetics/iyad031 2022

[11] [11]

& Wickham, H

[Ushey2025] Ushey, K. & Wickham, H. (2025). renv: Project Environments . R package version 1.1.5, https://rstudio.github.io/renv/ [Sonabend2021] Sonabend, R., Király, F. J., Bender, A., Bischl, B., & Lang, M. (2021). mlr3proba: an R package for machine learning in survival analysis. Bioinformatics, 37(17), 2789–2791. https://doi.org/10.1093/BIOINFORMATICS...

work page doi:10.1093/bioinformatics/btab039 2025

[12] [12]

https://doi.org/10.21105/joss.04705 [Chen2016] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 785–794. https://doi.org/10.1145/2939672.2939785 [Barnwal2022] Barnwal, A., Cho, H., & Hocking, T. (2022). Survival Regression with Acceler...

work page doi:10.21105/joss.04705 2016

[13] [13]

https://doi.org/10.21105/JOSS.03010 [WHO2000] World Health Organization. (2000). International Classification of Diseases for Oncology, Third Edition (ICD-O-3). Geneva: World Health Organization

work page doi:10.21105/joss.03010 2000