Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR
Pith reviewed 2026-05-24 07:18 UTC · model grok-4.3
The pith
Lightweight model on routine medical history events detects cancer at 4-10 times higher rates than standard screening at equal volume.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Can-SAVE integrates survival model outputs into a gradient-boosting framework to rank population-wide cancer risks solely from medical history events recorded in EHR. On a dataset of 2.5 million adults, a retrospective oncologist-supervised study over 1.9 million patients yields an average precision of 0.228 versus 0.193 for the strongest baseline and 4-10 times higher detection at identical screening volumes. A year-long prospective pilot on 426 thousand patients increases the cancer detection rate by 91 percent and population coverage by 36 percent relative to the national protocol.
What carries the argument
Survival model outputs integrated into a gradient-boosting framework applied to medical history events from electronic health records.
If this is right
- A city-wide population of one million patients can be ranked in under three hours on standard hardware.
- The method nearly doubles the cancer detection rate in a prospective year-long pilot.
- Population coverage increases by 36 percent compared with the existing national screening protocol.
- Detection rates rise 4-10 times at the same total screening volume in retrospective analysis.
Where Pith is reading between the lines
- The same survival-plus-boosting structure could be retrained on EHR from other regions if local data quality is comparable.
- Routine integration into existing health-record systems could shift screening effort toward higher-risk individuals without new diagnostic equipment.
- Extending the variable-construction step to other long-horizon outcomes such as cardiovascular events is a direct technical extension.
Load-bearing premise
EHR events from five Russian regions provide an unbiased and representative signal of long-term cancer risk that generalizes without significant confounding from regional healthcare access differences or data recording practices.
What would settle it
Applying the trained model to EHR data from a different country or healthcare system and observing a large drop in detection performance or average precision would falsify the claim of broad applicability.
Figures
read the original abstract
Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Can-SAVE, a system that extracts survival-analysis variables from EHR event sequences and feeds them into a gradient-boosting model to produce population-scale cancer risk rankings. It reports a retrospective evaluation on 1.9M patients showing 4-10x higher detection rates at fixed screening volumes and AP of 0.228 versus 0.193 for the strongest baseline, plus a prospective pilot on 426K patients yielding +91% detection rate and +36% coverage relative to the national protocol, with claimed scalability to 1M patients in under three hours on standard hardware. Code for training and feature engineering is released.
Significance. If the empirical claims survive scrutiny for confounding, the work would demonstrate that lightweight survival-derived features from routine EHR can materially improve cancer detection at national scale without specialized imaging or lab data. The public code release is a concrete strength supporting reproducibility.
major comments (2)
- [Abstract] Abstract: the headline metrics (4-10x detection, AP 0.228 vs 0.193, +91% prospective lift) are presented without any description of survival-model training procedure, censoring mechanism, feature construction from event sequences, or statistical significance testing; these omissions are load-bearing because the central claim is that the observed lifts reflect genuine pre-symptomatic risk rather than artifacts of model specification.
- [Evaluation] Evaluation sections: no region-holdout experiments, no access-adjusted covariates, and no explicit debiasing for visit-frequency or coding-practice differences across the five Russian regions are reported. Because the skeptic concern (EHR events may encode utilization patterns correlated with screening uptake) directly threatens the generalizability of both the retrospective and prospective results, this gap must be closed for the performance claims to be credible.
minor comments (1)
- [Abstract] The abstract states the study spans 2.5M adults yet reports retrospective results on 1.9M and prospective on 426K; a brief reconciliation of the overlap or exclusion criteria would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. Below we respond point-by-point to the major comments, indicating planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline metrics (4-10x detection, AP 0.228 vs 0.193, +91% prospective lift) are presented without any description of survival-model training procedure, censoring mechanism, feature construction from event sequences, or statistical significance testing; these omissions are load-bearing because the central claim is that the observed lifts reflect genuine pre-symptomatic risk rather than artifacts of model specification.
Authors: We agree the abstract is concise and omits these details. The Methods section of the manuscript specifies a Cox proportional-hazards survival model trained on longitudinal EHR event sequences, with right-censoring at last observation or study end, time-to-event summary statistics as features, and bootstrap-based significance assessment. We will revise the abstract to include a brief clause describing the survival-model training, censoring, and feature construction so that the performance claims are contextualized. revision: yes
-
Referee: [Evaluation] Evaluation sections: no region-holdout experiments, no access-adjusted covariates, and no explicit debiasing for visit-frequency or coding-practice differences across the five Russian regions are reported. Because the skeptic concern (EHR events may encode utilization patterns correlated with screening uptake) directly threatens the generalizability of both the retrospective and prospective results, this gap must be closed for the performance claims to be credible.
Authors: The concern regarding potential confounding from regional differences and visit-frequency patterns is substantive. The reported experiments do not include explicit region-holdout validation or debiasing steps. In revision we will add region-holdout experiments (training on four regions, testing on the held-out region) and incorporate visit-frequency as an access-adjusted covariate with corresponding propensity weighting to mitigate utilization bias. revision: yes
Circularity Check
No circularity; empirical results on held-out retrospective and prospective data
full rationale
The paper presents a gradient-boosting model trained on survival-analysis features extracted from EHR event sequences, with performance measured via direct comparison to baselines on a 1.9M-patient retrospective cohort and a 426K-patient prospective pilot. No derivation, equation, or claim reduces a prediction to its own fitted inputs by construction, invokes self-citation for a uniqueness theorem, or renames a known result; all reported metrics (AP 0.228, +91% detection lift) are externally falsifiable outcomes on held-out data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Electronic health record events are sufficiently predictive of future cancer risk to enable useful ranking
Reference graph
Works this paper leans on
-
[1]
Sung, H, Ferlay, J, Siegel, RL, Laversanne, M, Soerjomataram, I, Jemal, A, Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021: 71: 209-249
work page 2020
-
[2]
Turner, MC, Andersen, ZJ, Baccarelli, A, Diver, WR, Gapstur, SM, Pope, CA, Prada, D, Samet, J, Thurston, G, Cohen, A. Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations. CA Cancer J Clin. 2020: 70: 460- 479
work page 2020
-
[3]
Wang, Y . (2015). Development of cancer diagnostics—from biomarkers to clinical tests. Translational Cancer Research, 4(3), 270-279
work page 2015
-
[4]
Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W. P., Nuzumlalı, M. Y ., et al. (2022). Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review, 46, 100511
work page 2022
-
[5]
Gunter,T.D., & Terry,N.P. (2005). The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. Journal of medical Internet research, 7(1), e383
work page 2005
-
[6]
J., Zhao, X., Huang, X., & Qian, D
Wang, X., Oldani, M. J., Zhao, X., Huang, X., & Qian, D. (2014). A review of cancer risk prediction models with genetic variants. Cancer informatics, 13, CIN-S13788
work page 2014
-
[7]
Zhang, L., Dong, D., Zhang, W., Hao, X., Fang, M., Wang, S., et al. (2020). A deep learning risk prediction model for overall survival in patients with gastric cancer: A multicenter study. Radiotherapy and Oncology, 150, 73-80
work page 2020
-
[8]
Alexander, M., & Burbury, K. (2016). A systematic review of biomarkers for the prediction of thromboembolism in lung cancer—Results, prac- tical issues and proposed strategies for future risk prediction models. thrombosis Research, 148, 63-69
work page 2016
-
[9]
Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380
Michelle F Jacobs (2021). Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380. https://doi.org/10.7554/eLife.73380
-
[10]
Aleksandrova, K., Reichmann, R., Kaaks, R. et al. (2021). Development and validation of a lifestyle-based model for colorectal cancer risk prediction: the LiFeCRC score. BMC Med 19, 1
work page 2021
-
[11]
Notani, P. N. (1988). Role of alcohol in cancers of the upper alimentary tract: use of models in risk assessment. Journal of Epidemiology & Community Health, 42(2), 187-192
work page 1988
-
[12]
Nafiseh Nasirzadeh, Yousef Mohammadian & Yadolah Fakhri (2023) Concentration and cancer risk assessment of asbestos in Middle East countries: a systematic review- meta-analysis, International Journal of Environmental Analytical Chemistry, 103:2, 255-269
work page 2023
-
[13]
Hagar, Y ., Albers, D., Pivovarov, R., Chase, H., Dukic, V ., & Elhadad, N. (2014). Survival analysis with electronic health record data: Experiments with chronic kidney disease. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(5), 385-403
work page 2014
-
[14]
Stark, G. F., Hart, G. R., Nartowt, B. J., & Deng, J. (2019). Predicting breast cancer risk using personal health data and machine learning models. Plos one, 14(12), e0226765
work page 2019
-
[15]
H., Yoo, S., D’Imperio, N., McMahon, B
Dai, X., Park, J. H., Yoo, S., D’Imperio, N., McMahon, B. H., Rentsch, C. T., et al. (2022). Survival analysis of localized prostate cancer with deep learning. Scientific Reports, 12(1), 17821
work page 2022
-
[16]
Birman-Deych,E., Waterman,A.D., Yan,Y ., Nilasena,D.S., Radford,M.J., & Gage,B.F. (2005). Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Medical care, 480-485
work page 2005
-
[17]
Wang, H. H., Wang, Y . H., Liang, C. W., & Li, Y . C. (2019). Assessment of deep learning using nonimaging information and sequential medical records to develop a prediction model for nonmelanoma skin cancer. JAMA dermatology, 155(11), 1277-1283
work page 2019
-
[18]
Wang, S. M., Chang, Y . H., Kuo, L. C., Lai, F., Chen, Y . N., Yu, F. Y ., et al. (2020). Using deep learning for automatic ICD-10 classification from free-text data. European Journal of Biomedical Informatics, 16(1)
work page 2020
-
[19]
Li, Y ., Rao, S., Solares, J. R. A., Hassaine, A., Ramakrishnan, R., Canoy, D., et al. (2020). BEHRT: transformer for electronic health records. Scientific reports, 10(1), 1-12
work page 2020
-
[20]
Lee, E. T., & Wang, J. (2003). Statistical methods for survival data analysis (V ol. 476). John Wiley & Sons
work page 2003
-
[21]
Kaplan, E. L.; Meier, P. (1958). Nonparametric estimation from incom- plete observations. J. Amer. Statist. Assoc. 53 (282): 457–481
work page 1958
-
[22]
Jonathan, B., & Ian, J. (1979). Linear regression with censored data. Biometrika, 66(3), 429-436
work page 1979
-
[23]
Samoylova A.V . Results of control measures of Roszdravnadzor in relation to the implementation of the national project “Healthcare” and regional programs for the modernization of primary healthcare in the subjects of the Russian Federation in 2021 // Vestnik Roszdravnadzora. – 2022. – V ol. 1. – P. 7–15
work page 2021
-
[24]
Recall, precision and average precision
Zhu,M.(2004). Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, 2(30), 6
work page 2004
-
[25]
Kaprin, A. D., Starinskiy, V . V ., Shakhzadova, A. O. (2022). State of oncological care for the population of Russia in 2021. P.A. Herzen Moscow State Medical Research Institute – branch of the Federal State Budgetary Institution ”NMRC of Radiology”. (in Russian)
work page 2022
-
[26]
Lehne, M., Luijten, S., genannt Imbusch, P. V . F., & Thun, S. (2019). The Use of FHIR in Digital Health-A Review of the Scientific Literature. GMDS, (September), 52-58
work page 2019
- [27]
-
[28]
Longato, E., Vettoretti, M., & Di Camillo, B. (2020). A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. Journal of Biomedical Informatics, 108, 103496
work page 2020
-
[29]
Henze, N. (1988). A Multivariate Two-Sample Test Based on the Num- ber of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772–783
work page 1988
-
[30]
Philonenko, P., Postovalov, S. (2019). The new robust two-sample test for randomly right-censored data. Journal of Statistical Computation and Simulation, 89(8), 1357-1375
work page 2019
-
[31]
La Rosa, F., Liso, A., Bianconi, F., Duca, E., & Stracci, F. (2014). Seasonal variation in the month of birth in patients with skin cancer. British journal of cancer, 111(9), 1810-1813
work page 2014
-
[32]
Gonzalez, H., Hagerling, C., & Werb, Z. (2018). Roles of the immune system in cancer: from tumor initiation to metastatic progression. Genes & development, 32(19-20), 1267-1284
work page 2018
-
[33]
Boomsma, L. J., & van Lidth De Jeude, C. P. (2000). ’Number needed to screen’: a tool for assessment of prevention programs. Nederlands Tijdschrift V oor Geneeskunde, 144(49), 2345-2348
work page 2000
-
[34]
Hendrick, R. E., & Helvie, M. A. (2012). Mammography screening: a new estimate of number needed to screen to prevent one breast cancer death. American Journal of Roentgenology, 198(3), 723-728
work page 2012
-
[35]
Arenberg, D. (2019). Update on screening for lung cancer. Translational lung cancer research, 8(Suppl 1), S77
work page 2019
-
[36]
Cenin, D. R., Tinmouth, J., Naber, S. K., Dub ´e, C., McCurdy, B. R., Paszat, L., et al. (2021). Calculation of stop ages for colorectal cancer screening based on comorbidities and screening history. Clinical Gastroenterology and Hepatology, 19(3), 547-555
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.