pith. sign in

arxiv: 2309.15039 · v4 · submitted 2023-09-26 · 💻 cs.LG · cs.AI· stat.AP

Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

Pith reviewed 2026-05-24 07:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.AP
keywords cancer screeningelectronic health recordssurvival analysisgradient boostingrisk predictionpopulation screeningmachine learning in healthcare
0
0 comments X

The pith

Lightweight model on routine medical history events detects cancer at 4-10 times higher rates than standard screening at equal volume.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that cancer risk can be ranked using only events from electronic health records by feeding survival model outputs into a gradient-boosting framework. This matters because conventional screening is costly, labor-intensive, and difficult to apply across entire populations. The approach was tested on real-world data from 2.5 million adults across five Russian regions. In retrospective evaluation on 1.9 million patients it delivered substantially higher detection at fixed screening effort, while a prospective pilot on 426 thousand patients nearly doubled detection and expanded coverage. The system also processes a city of one million patients in under three hours on ordinary hardware.

Core claim

Can-SAVE integrates survival model outputs into a gradient-boosting framework to rank population-wide cancer risks solely from medical history events recorded in EHR. On a dataset of 2.5 million adults, a retrospective oncologist-supervised study over 1.9 million patients yields an average precision of 0.228 versus 0.193 for the strongest baseline and 4-10 times higher detection at identical screening volumes. A year-long prospective pilot on 426 thousand patients increases the cancer detection rate by 91 percent and population coverage by 36 percent relative to the national protocol.

What carries the argument

Survival model outputs integrated into a gradient-boosting framework applied to medical history events from electronic health records.

If this is right

  • A city-wide population of one million patients can be ranked in under three hours on standard hardware.
  • The method nearly doubles the cancer detection rate in a prospective year-long pilot.
  • Population coverage increases by 36 percent compared with the existing national screening protocol.
  • Detection rates rise 4-10 times at the same total screening volume in retrospective analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same survival-plus-boosting structure could be retrained on EHR from other regions if local data quality is comparable.
  • Routine integration into existing health-record systems could shift screening effort toward higher-risk individuals without new diagnostic equipment.
  • Extending the variable-construction step to other long-horizon outcomes such as cardiovascular events is a direct technical extension.

Load-bearing premise

EHR events from five Russian regions provide an unbiased and representative signal of long-term cancer risk that generalizes without significant confounding from regional healthcare access differences or data recording practices.

What would settle it

Applying the trained model to EHR data from a different country or healthcare system and observing a large drop in detection performance or average precision would falsify the claim of broad applicability.

Figures

Figures reproduced from arXiv: 2309.15039 by Pavel Blinov, Petr Philonenko, Vladimir Kokh.

Figure 2
Figure 2. Figure 2: Visualization of the proposed Can-SAVE method’s capability in iden [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Example of a sequence of medical events for the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of censored data. Line length is the depth of the EHR history. Red line (complete observation): the sequence of medical events ends by the C-diagnosis. Green line (randomly right-censored observation): the sequence of medical events ends at the tMAX date while the C-diagnosis has not been occurred. It is assumed that the C-diagnosis would occur at [tMAX date, +∞). B. Survival Models Kaplan-Meier es… view at source ↗
Figure 4
Figure 4. Figure 4: The fitted Kaplan-Meier estimators for males (blue), females (red), [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Proportion of confirmed cancers (Precision@TOP) depending on the [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Can-SAVE, a system that extracts survival-analysis variables from EHR event sequences and feeds them into a gradient-boosting model to produce population-scale cancer risk rankings. It reports a retrospective evaluation on 1.9M patients showing 4-10x higher detection rates at fixed screening volumes and AP of 0.228 versus 0.193 for the strongest baseline, plus a prospective pilot on 426K patients yielding +91% detection rate and +36% coverage relative to the national protocol, with claimed scalability to 1M patients in under three hours on standard hardware. Code for training and feature engineering is released.

Significance. If the empirical claims survive scrutiny for confounding, the work would demonstrate that lightweight survival-derived features from routine EHR can materially improve cancer detection at national scale without specialized imaging or lab data. The public code release is a concrete strength supporting reproducibility.

major comments (2)
  1. [Abstract] Abstract: the headline metrics (4-10x detection, AP 0.228 vs 0.193, +91% prospective lift) are presented without any description of survival-model training procedure, censoring mechanism, feature construction from event sequences, or statistical significance testing; these omissions are load-bearing because the central claim is that the observed lifts reflect genuine pre-symptomatic risk rather than artifacts of model specification.
  2. [Evaluation] Evaluation sections: no region-holdout experiments, no access-adjusted covariates, and no explicit debiasing for visit-frequency or coding-practice differences across the five Russian regions are reported. Because the skeptic concern (EHR events may encode utilization patterns correlated with screening uptake) directly threatens the generalizability of both the retrospective and prospective results, this gap must be closed for the performance claims to be credible.
minor comments (1)
  1. [Abstract] The abstract states the study spans 2.5M adults yet reports retrospective results on 1.9M and prospective on 426K; a brief reconciliation of the overlap or exclusion criteria would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we respond point-by-point to the major comments, indicating planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline metrics (4-10x detection, AP 0.228 vs 0.193, +91% prospective lift) are presented without any description of survival-model training procedure, censoring mechanism, feature construction from event sequences, or statistical significance testing; these omissions are load-bearing because the central claim is that the observed lifts reflect genuine pre-symptomatic risk rather than artifacts of model specification.

    Authors: We agree the abstract is concise and omits these details. The Methods section of the manuscript specifies a Cox proportional-hazards survival model trained on longitudinal EHR event sequences, with right-censoring at last observation or study end, time-to-event summary statistics as features, and bootstrap-based significance assessment. We will revise the abstract to include a brief clause describing the survival-model training, censoring, and feature construction so that the performance claims are contextualized. revision: yes

  2. Referee: [Evaluation] Evaluation sections: no region-holdout experiments, no access-adjusted covariates, and no explicit debiasing for visit-frequency or coding-practice differences across the five Russian regions are reported. Because the skeptic concern (EHR events may encode utilization patterns correlated with screening uptake) directly threatens the generalizability of both the retrospective and prospective results, this gap must be closed for the performance claims to be credible.

    Authors: The concern regarding potential confounding from regional differences and visit-frequency patterns is substantive. The reported experiments do not include explicit region-holdout validation or debiasing steps. In revision we will add region-holdout experiments (training on four regions, testing on the held-out region) and incorporate visit-frequency as an access-adjusted covariate with corresponding propensity weighting to mitigate utilization bias. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on held-out retrospective and prospective data

full rationale

The paper presents a gradient-boosting model trained on survival-analysis features extracted from EHR event sequences, with performance measured via direct comparison to baselines on a 1.9M-patient retrospective cohort and a 426K-patient prospective pilot. No derivation, equation, or claim reduces a prediction to its own fitted inputs by construction, invokes self-citation for a uniqueness theorem, or renames a known result; all reported metrics (AP 0.228, +91% detection lift) are externally falsifiable outcomes on held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the approach rests on the domain assumption that EHR events carry predictive signal for future cancer diagnoses, with no explicit free parameters, axioms, or invented entities detailed.

axioms (1)
  • domain assumption Electronic health record events are sufficiently predictive of future cancer risk to enable useful ranking
    The entire system is built on using medical history events as input for risk assessment.

pith-pipeline@v0.9.0 · 5849 in / 1361 out tokens · 44694 ms · 2026-05-24T07:18:51.440534+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

    Sung, H, Ferlay, J, Siegel, RL, Laversanne, M, Soerjomataram, I, Jemal, A, Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021: 71: 209-249

  2. [2]

    Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations

    Turner, MC, Andersen, ZJ, Baccarelli, A, Diver, WR, Gapstur, SM, Pope, CA, Prada, D, Samet, J, Thurston, G, Cohen, A. Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations. CA Cancer J Clin. 2020: 70: 460- 479

  3. [3]

    Wang, Y . (2015). Development of cancer diagnostics—from biomarkers to clinical tests. Translational Cancer Research, 4(3), 270-279

  4. [4]

    P., Nuzumlalı, M

    Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W. P., Nuzumlalı, M. Y ., et al. (2022). Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review, 46, 100511

  5. [5]

    Gunter,T.D., & Terry,N.P. (2005). The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. Journal of medical Internet research, 7(1), e383

  6. [6]

    J., Zhao, X., Huang, X., & Qian, D

    Wang, X., Oldani, M. J., Zhao, X., Huang, X., & Qian, D. (2014). A review of cancer risk prediction models with genetic variants. Cancer informatics, 13, CIN-S13788

  7. [7]

    Zhang, L., Dong, D., Zhang, W., Hao, X., Fang, M., Wang, S., et al. (2020). A deep learning risk prediction model for overall survival in patients with gastric cancer: A multicenter study. Radiotherapy and Oncology, 150, 73-80

  8. [8]

    Alexander, M., & Burbury, K. (2016). A systematic review of biomarkers for the prediction of thromboembolism in lung cancer—Results, prac- tical issues and proposed strategies for future risk prediction models. thrombosis Research, 148, 63-69

  9. [9]

    Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380

    Michelle F Jacobs (2021). Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380. https://doi.org/10.7554/eLife.73380

  10. [10]

    Aleksandrova, K., Reichmann, R., Kaaks, R. et al. (2021). Development and validation of a lifestyle-based model for colorectal cancer risk prediction: the LiFeCRC score. BMC Med 19, 1

  11. [11]

    Notani, P. N. (1988). Role of alcohol in cancers of the upper alimentary tract: use of models in risk assessment. Journal of Epidemiology & Community Health, 42(2), 187-192

  12. [12]

    Nafiseh Nasirzadeh, Yousef Mohammadian & Yadolah Fakhri (2023) Concentration and cancer risk assessment of asbestos in Middle East countries: a systematic review- meta-analysis, International Journal of Environmental Analytical Chemistry, 103:2, 255-269

  13. [13]

    Hagar, Y ., Albers, D., Pivovarov, R., Chase, H., Dukic, V ., & Elhadad, N. (2014). Survival analysis with electronic health record data: Experiments with chronic kidney disease. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(5), 385-403

  14. [14]

    F., Hart, G

    Stark, G. F., Hart, G. R., Nartowt, B. J., & Deng, J. (2019). Predicting breast cancer risk using personal health data and machine learning models. Plos one, 14(12), e0226765

  15. [15]

    H., Yoo, S., D’Imperio, N., McMahon, B

    Dai, X., Park, J. H., Yoo, S., D’Imperio, N., McMahon, B. H., Rentsch, C. T., et al. (2022). Survival analysis of localized prostate cancer with deep learning. Scientific Reports, 12(1), 17821

  16. [16]

    Birman-Deych,E., Waterman,A.D., Yan,Y ., Nilasena,D.S., Radford,M.J., & Gage,B.F. (2005). Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Medical care, 480-485

  17. [17]

    H., Wang, Y

    Wang, H. H., Wang, Y . H., Liang, C. W., & Li, Y . C. (2019). Assessment of deep learning using nonimaging information and sequential medical records to develop a prediction model for nonmelanoma skin cancer. JAMA dermatology, 155(11), 1277-1283

  18. [18]

    M., Chang, Y

    Wang, S. M., Chang, Y . H., Kuo, L. C., Lai, F., Chen, Y . N., Yu, F. Y ., et al. (2020). Using deep learning for automatic ICD-10 classification from free-text data. European Journal of Biomedical Informatics, 16(1)

  19. [19]

    Li, Y ., Rao, S., Solares, J. R. A., Hassaine, A., Ramakrishnan, R., Canoy, D., et al. (2020). BEHRT: transformer for electronic health records. Scientific reports, 10(1), 1-12

  20. [20]

    T., & Wang, J

    Lee, E. T., & Wang, J. (2003). Statistical methods for survival data analysis (V ol. 476). John Wiley & Sons

  21. [21]

    L.; Meier, P

    Kaplan, E. L.; Meier, P. (1958). Nonparametric estimation from incom- plete observations. J. Amer. Statist. Assoc. 53 (282): 457–481

  22. [22]

    Jonathan, B., & Ian, J. (1979). Linear regression with censored data. Biometrika, 66(3), 429-436

  23. [23]

    Healthcare

    Samoylova A.V . Results of control measures of Roszdravnadzor in relation to the implementation of the national project “Healthcare” and regional programs for the modernization of primary healthcare in the subjects of the Russian Federation in 2021 // Vestnik Roszdravnadzora. – 2022. – V ol. 1. – P. 7–15

  24. [24]

    Recall, precision and average precision

    Zhu,M.(2004). Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, 2(30), 6

  25. [25]

    D., Starinskiy, V

    Kaprin, A. D., Starinskiy, V . V ., Shakhzadova, A. O. (2022). State of oncological care for the population of Russia in 2021. P.A. Herzen Moscow State Medical Research Institute – branch of the Federal State Budgetary Institution ”NMRC of Radiology”. (in Russian)

  26. [26]

    Lehne, M., Luijten, S., genannt Imbusch, P. V . F., & Thun, S. (2019). The Use of FHIR in Digital Health-A Review of the Scientific Literature. GMDS, (September), 52-58

  27. [27]

    Blinov, P., & Kokh, V . (2021). Patient Embeddings in Healthcare and Insurance Applications. arXiv preprint arXiv:2107.03913

  28. [28]

    Longato, E., Vettoretti, M., & Di Camillo, B. (2020). A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. Journal of Biomedical Informatics, 108, 103496

  29. [29]

    Henze, N. (1988). A Multivariate Two-Sample Test Based on the Num- ber of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772–783

  30. [30]

    Philonenko, P., Postovalov, S. (2019). The new robust two-sample test for randomly right-censored data. Journal of Statistical Computation and Simulation, 89(8), 1357-1375

  31. [31]

    La Rosa, F., Liso, A., Bianconi, F., Duca, E., & Stracci, F. (2014). Seasonal variation in the month of birth in patients with skin cancer. British journal of cancer, 111(9), 1810-1813

  32. [32]

    Gonzalez, H., Hagerling, C., & Werb, Z. (2018). Roles of the immune system in cancer: from tumor initiation to metastatic progression. Genes & development, 32(19-20), 1267-1284

  33. [33]

    J., & van Lidth De Jeude, C

    Boomsma, L. J., & van Lidth De Jeude, C. P. (2000). ’Number needed to screen’: a tool for assessment of prevention programs. Nederlands Tijdschrift V oor Geneeskunde, 144(49), 2345-2348

  34. [34]

    E., & Helvie, M

    Hendrick, R. E., & Helvie, M. A. (2012). Mammography screening: a new estimate of number needed to screen to prevent one breast cancer death. American Journal of Roentgenology, 198(3), 723-728

  35. [35]

    Arenberg, D. (2019). Update on screening for lung cancer. Translational lung cancer research, 8(Suppl 1), S77

  36. [36]

    R., Tinmouth, J., Naber, S

    Cenin, D. R., Tinmouth, J., Naber, S. K., Dub ´e, C., McCurdy, B. R., Paszat, L., et al. (2021). Calculation of stop ages for colorectal cancer screening based on comorbidities and screening history. Clinical Gastroenterology and Hepatology, 19(3), 547-555