Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

Pavel Blinov; Petr Philonenko; Vladimir Kokh

arxiv: 2309.15039 · v4 · submitted 2023-09-26 · 💻 cs.LG · cs.AI· stat.AP

Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

Petr Philonenko , Vladimir Kokh , Pavel Blinov This is my paper

Pith reviewed 2026-05-24 07:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.AP

keywords cancer screeningelectronic health recordssurvival analysisgradient boostingrisk predictionpopulation screeningmachine learning in healthcare

0 comments

The pith

Lightweight model on routine medical history events detects cancer at 4-10 times higher rates than standard screening at equal volume.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that cancer risk can be ranked using only events from electronic health records by feeding survival model outputs into a gradient-boosting framework. This matters because conventional screening is costly, labor-intensive, and difficult to apply across entire populations. The approach was tested on real-world data from 2.5 million adults across five Russian regions. In retrospective evaluation on 1.9 million patients it delivered substantially higher detection at fixed screening effort, while a prospective pilot on 426 thousand patients nearly doubled detection and expanded coverage. The system also processes a city of one million patients in under three hours on ordinary hardware.

Core claim

Can-SAVE integrates survival model outputs into a gradient-boosting framework to rank population-wide cancer risks solely from medical history events recorded in EHR. On a dataset of 2.5 million adults, a retrospective oncologist-supervised study over 1.9 million patients yields an average precision of 0.228 versus 0.193 for the strongest baseline and 4-10 times higher detection at identical screening volumes. A year-long prospective pilot on 426 thousand patients increases the cancer detection rate by 91 percent and population coverage by 36 percent relative to the national protocol.

What carries the argument

Survival model outputs integrated into a gradient-boosting framework applied to medical history events from electronic health records.

If this is right

A city-wide population of one million patients can be ranked in under three hours on standard hardware.
The method nearly doubles the cancer detection rate in a prospective year-long pilot.
Population coverage increases by 36 percent compared with the existing national screening protocol.
Detection rates rise 4-10 times at the same total screening volume in retrospective analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same survival-plus-boosting structure could be retrained on EHR from other regions if local data quality is comparable.
Routine integration into existing health-record systems could shift screening effort toward higher-risk individuals without new diagnostic equipment.
Extending the variable-construction step to other long-horizon outcomes such as cardiovascular events is a direct technical extension.

Load-bearing premise

EHR events from five Russian regions provide an unbiased and representative signal of long-term cancer risk that generalizes without significant confounding from regional healthcare access differences or data recording practices.

What would settle it

Applying the trained model to EHR data from a different country or healthcare system and observing a large drop in detection performance or average precision would falsify the claim of broad applicability.

Figures

Figures reproduced from arXiv: 2309.15039 by Pavel Blinov, Petr Philonenko, Vladimir Kokh.

**Figure 1.** Figure 1: Example of a sequence of medical events for the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: Example of censored data. Line length is the depth of the EHR history. Red line (complete observation): the sequence of medical events ends by the C-diagnosis. Green line (randomly right-censored observation): the sequence of medical events ends at the tMAX date while the C-diagnosis has not been occurred. It is assumed that the C-diagnosis would occur at [tMAX date, +∞). B. Survival Models Kaplan-Meier es… view at source ↗

**Figure 4.** Figure 4: The fitted Kaplan-Meier estimators for males (blue), females (red), [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Proportion of confirmed cancers (Precision@TOP) depending on the [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Conventional medical cancer screening methods are costly, labor-intensive, and extremely difficult to scale. Although AI can improve cancer detection, most systems rely on complex or specialized medical data, making them impractical for large-scale screening. We introduce Can-SAVE, a lightweight AI system that ranks population-wide cancer risks solely based on medical history events. By integrating survival model outputs into a gradient-boosting framework, our approach detects subtle, long-term patient risk patterns - often well before clinical symptoms manifest. Can-SAVE was rigorously evaluated on a real-world dataset of 2.5 million adults spanning five Russian regions, marking the study as one of the largest and most comprehensive deployments of AI-driven cancer risk assessment. In a retrospective oncologist-supervised study over 1.9M patients, Can-SAVE achieves a 4-10x higher detection rate at identical screening volumes and an Average Precision (AP) of 0.228 vs. 0.193 for the best baseline (LoRA-tuned Qwen3-Embeddings via DeepSeek-R1 summarization). In a year-long prospective pilot (426K patients), our method almost doubled the cancer detection rate (+91%) and increased population coverage by 36% over the national screening protocol. The system demonstrates practical scalability: a city-wide population of 1 million patients can be processed in under three hours using standard hardware, enabling seamless clinical integration. This work proves that Can-SAVE achieves nationally significant cancer detection improvements while adhering to real-world public healthcare constraints, offering immediate clinical utility and a replicable framework for population-wide screening. Code for training and feature engineering is available at https://github.com/sb-ai-lab/Can-SAVE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Can-SAVE reports large detection gains on 2.5M Russian EHR patients via survival outputs plus gradient boosting, but thin methods leave room for regional data confounders.

read the letter

The core takeaway is that this system claims 4-10x higher retrospective detection and +91% prospective lift over baselines on 1.9M and 426K patients respectively, using only routine medical events, with code released and a scalability note for city-scale runs in hours. That scale and the prospective pilot are the parts worth noting first. The integration step itself is straightforward but executed at a size that adds a data point to EHR-based risk work. The public code for training and features helps anyone wanting to inspect or replicate the pipeline. The soft spots sit in the missing pieces. No description appears of survival model fitting, censoring treatment, or feature construction, and nothing addresses whether visit density or coding practices vary enough across the five regions to let the model pick up utilization instead of biology. The stress-test concern about access confounders therefore stays live on the given information. If the full paper contains region-holdout tests or explicit debiasing steps, that would tighten the claims; without them the numbers are internally consistent but harder to trust for broader use. This paper is aimed at groups doing applied EHR prediction or public-health screening pilots. Readers who need concrete deployment numbers at national scale will find the reported metrics and pilot design useful even if they want more methodological transparency. It should go to peer review because the dataset size and prospective results are substantial enough to merit referee scrutiny on the methods and bias checks.

Referee Report

2 major / 1 minor

Summary. The paper introduces Can-SAVE, a system that extracts survival-analysis variables from EHR event sequences and feeds them into a gradient-boosting model to produce population-scale cancer risk rankings. It reports a retrospective evaluation on 1.9M patients showing 4-10x higher detection rates at fixed screening volumes and AP of 0.228 versus 0.193 for the strongest baseline, plus a prospective pilot on 426K patients yielding +91% detection rate and +36% coverage relative to the national protocol, with claimed scalability to 1M patients in under three hours on standard hardware. Code for training and feature engineering is released.

Significance. If the empirical claims survive scrutiny for confounding, the work would demonstrate that lightweight survival-derived features from routine EHR can materially improve cancer detection at national scale without specialized imaging or lab data. The public code release is a concrete strength supporting reproducibility.

major comments (2)

[Abstract] Abstract: the headline metrics (4-10x detection, AP 0.228 vs 0.193, +91% prospective lift) are presented without any description of survival-model training procedure, censoring mechanism, feature construction from event sequences, or statistical significance testing; these omissions are load-bearing because the central claim is that the observed lifts reflect genuine pre-symptomatic risk rather than artifacts of model specification.
[Evaluation] Evaluation sections: no region-holdout experiments, no access-adjusted covariates, and no explicit debiasing for visit-frequency or coding-practice differences across the five Russian regions are reported. Because the skeptic concern (EHR events may encode utilization patterns correlated with screening uptake) directly threatens the generalizability of both the retrospective and prospective results, this gap must be closed for the performance claims to be credible.

minor comments (1)

[Abstract] The abstract states the study spans 2.5M adults yet reports retrospective results on 1.9M and prospective on 426K; a brief reconciliation of the overlap or exclusion criteria would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we respond point-by-point to the major comments, indicating planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the headline metrics (4-10x detection, AP 0.228 vs 0.193, +91% prospective lift) are presented without any description of survival-model training procedure, censoring mechanism, feature construction from event sequences, or statistical significance testing; these omissions are load-bearing because the central claim is that the observed lifts reflect genuine pre-symptomatic risk rather than artifacts of model specification.

Authors: We agree the abstract is concise and omits these details. The Methods section of the manuscript specifies a Cox proportional-hazards survival model trained on longitudinal EHR event sequences, with right-censoring at last observation or study end, time-to-event summary statistics as features, and bootstrap-based significance assessment. We will revise the abstract to include a brief clause describing the survival-model training, censoring, and feature construction so that the performance claims are contextualized. revision: yes
Referee: [Evaluation] Evaluation sections: no region-holdout experiments, no access-adjusted covariates, and no explicit debiasing for visit-frequency or coding-practice differences across the five Russian regions are reported. Because the skeptic concern (EHR events may encode utilization patterns correlated with screening uptake) directly threatens the generalizability of both the retrospective and prospective results, this gap must be closed for the performance claims to be credible.

Authors: The concern regarding potential confounding from regional differences and visit-frequency patterns is substantive. The reported experiments do not include explicit region-holdout validation or debiasing steps. In revision we will add region-holdout experiments (training on four regions, testing on the held-out region) and incorporate visit-frequency as an access-adjusted covariate with corresponding propensity weighting to mitigate utilization bias. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on held-out retrospective and prospective data

full rationale

The paper presents a gradient-boosting model trained on survival-analysis features extracted from EHR event sequences, with performance measured via direct comparison to baselines on a 1.9M-patient retrospective cohort and a 426K-patient prospective pilot. No derivation, equation, or claim reduces a prediction to its own fitted inputs by construction, invokes self-citation for a uniqueness theorem, or renames a known result; all reported metrics (AP 0.228, +91% detection lift) are externally falsifiable outcomes on held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the approach rests on the domain assumption that EHR events carry predictive signal for future cancer diagnoses, with no explicit free parameters, axioms, or invented entities detailed.

axioms (1)

domain assumption Electronic health record events are sufficiently predictive of future cancer risk to enable useful ranking
The entire system is built on using medical history events as input for risk assessment.

pith-pipeline@v0.9.0 · 5849 in / 1361 out tokens · 44694 ms · 2026-05-24T07:18:51.440534+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

Sung, H, Ferlay, J, Siegel, RL, Laversanne, M, Soerjomataram, I, Jemal, A, Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021: 71: 209-249

work page 2020
[2]

Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations

Turner, MC, Andersen, ZJ, Baccarelli, A, Diver, WR, Gapstur, SM, Pope, CA, Prada, D, Samet, J, Thurston, G, Cohen, A. Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations. CA Cancer J Clin. 2020: 70: 460- 479

work page 2020
[3]

Wang, Y . (2015). Development of cancer diagnostics—from biomarkers to clinical tests. Translational Cancer Research, 4(3), 270-279

work page 2015
[4]

P., Nuzumlalı, M

Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W. P., Nuzumlalı, M. Y ., et al. (2022). Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review, 46, 100511

work page 2022
[5]

Gunter,T.D., & Terry,N.P. (2005). The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. Journal of medical Internet research, 7(1), e383

work page 2005
[6]

J., Zhao, X., Huang, X., & Qian, D

Wang, X., Oldani, M. J., Zhao, X., Huang, X., & Qian, D. (2014). A review of cancer risk prediction models with genetic variants. Cancer informatics, 13, CIN-S13788

work page 2014
[7]

Zhang, L., Dong, D., Zhang, W., Hao, X., Fang, M., Wang, S., et al. (2020). A deep learning risk prediction model for overall survival in patients with gastric cancer: A multicenter study. Radiotherapy and Oncology, 150, 73-80

work page 2020
[8]

Alexander, M., & Burbury, K. (2016). A systematic review of biomarkers for the prediction of thromboembolism in lung cancer—Results, prac- tical issues and proposed strategies for future risk prediction models. thrombosis Research, 148, 63-69

work page 2016
[9]

Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380

Michelle F Jacobs (2021). Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380. https://doi.org/10.7554/eLife.73380

work page doi:10.7554/elife.73380 2021
[10]

Aleksandrova, K., Reichmann, R., Kaaks, R. et al. (2021). Development and validation of a lifestyle-based model for colorectal cancer risk prediction: the LiFeCRC score. BMC Med 19, 1

work page 2021
[11]

Notani, P. N. (1988). Role of alcohol in cancers of the upper alimentary tract: use of models in risk assessment. Journal of Epidemiology & Community Health, 42(2), 187-192

work page 1988
[12]

Nafiseh Nasirzadeh, Yousef Mohammadian & Yadolah Fakhri (2023) Concentration and cancer risk assessment of asbestos in Middle East countries: a systematic review- meta-analysis, International Journal of Environmental Analytical Chemistry, 103:2, 255-269

work page 2023
[13]

Hagar, Y ., Albers, D., Pivovarov, R., Chase, H., Dukic, V ., & Elhadad, N. (2014). Survival analysis with electronic health record data: Experiments with chronic kidney disease. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(5), 385-403

work page 2014
[14]

F., Hart, G

Stark, G. F., Hart, G. R., Nartowt, B. J., & Deng, J. (2019). Predicting breast cancer risk using personal health data and machine learning models. Plos one, 14(12), e0226765

work page 2019
[15]

H., Yoo, S., D’Imperio, N., McMahon, B

Dai, X., Park, J. H., Yoo, S., D’Imperio, N., McMahon, B. H., Rentsch, C. T., et al. (2022). Survival analysis of localized prostate cancer with deep learning. Scientific Reports, 12(1), 17821

work page 2022
[16]

Birman-Deych,E., Waterman,A.D., Yan,Y ., Nilasena,D.S., Radford,M.J., & Gage,B.F. (2005). Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Medical care, 480-485

work page 2005
[17]

H., Wang, Y

Wang, H. H., Wang, Y . H., Liang, C. W., & Li, Y . C. (2019). Assessment of deep learning using nonimaging information and sequential medical records to develop a prediction model for nonmelanoma skin cancer. JAMA dermatology, 155(11), 1277-1283

work page 2019
[18]

M., Chang, Y

Wang, S. M., Chang, Y . H., Kuo, L. C., Lai, F., Chen, Y . N., Yu, F. Y ., et al. (2020). Using deep learning for automatic ICD-10 classification from free-text data. European Journal of Biomedical Informatics, 16(1)

work page 2020
[19]

Li, Y ., Rao, S., Solares, J. R. A., Hassaine, A., Ramakrishnan, R., Canoy, D., et al. (2020). BEHRT: transformer for electronic health records. Scientific reports, 10(1), 1-12

work page 2020
[20]

T., & Wang, J

Lee, E. T., & Wang, J. (2003). Statistical methods for survival data analysis (V ol. 476). John Wiley & Sons

work page 2003
[21]

L.; Meier, P

Kaplan, E. L.; Meier, P. (1958). Nonparametric estimation from incom- plete observations. J. Amer. Statist. Assoc. 53 (282): 457–481

work page 1958
[22]

Jonathan, B., & Ian, J. (1979). Linear regression with censored data. Biometrika, 66(3), 429-436

work page 1979
[23]

Healthcare

Samoylova A.V . Results of control measures of Roszdravnadzor in relation to the implementation of the national project “Healthcare” and regional programs for the modernization of primary healthcare in the subjects of the Russian Federation in 2021 // Vestnik Roszdravnadzora. – 2022. – V ol. 1. – P. 7–15

work page 2021
[24]

Recall, precision and average precision

Zhu,M.(2004). Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, 2(30), 6

work page 2004
[25]

D., Starinskiy, V

Kaprin, A. D., Starinskiy, V . V ., Shakhzadova, A. O. (2022). State of oncological care for the population of Russia in 2021. P.A. Herzen Moscow State Medical Research Institute – branch of the Federal State Budgetary Institution ”NMRC of Radiology”. (in Russian)

work page 2022
[26]

Lehne, M., Luijten, S., genannt Imbusch, P. V . F., & Thun, S. (2019). The Use of FHIR in Digital Health-A Review of the Scientific Literature. GMDS, (September), 52-58

work page 2019
[27]

Blinov, P., & Kokh, V . (2021). Patient Embeddings in Healthcare and Insurance Applications. arXiv preprint arXiv:2107.03913

work page arXiv 2021
[28]

Longato, E., Vettoretti, M., & Di Camillo, B. (2020). A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. Journal of Biomedical Informatics, 108, 103496

work page 2020
[29]

Henze, N. (1988). A Multivariate Two-Sample Test Based on the Num- ber of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772–783

work page 1988
[30]

Philonenko, P., Postovalov, S. (2019). The new robust two-sample test for randomly right-censored data. Journal of Statistical Computation and Simulation, 89(8), 1357-1375

work page 2019
[31]

La Rosa, F., Liso, A., Bianconi, F., Duca, E., & Stracci, F. (2014). Seasonal variation in the month of birth in patients with skin cancer. British journal of cancer, 111(9), 1810-1813

work page 2014
[32]

Gonzalez, H., Hagerling, C., & Werb, Z. (2018). Roles of the immune system in cancer: from tumor initiation to metastatic progression. Genes & development, 32(19-20), 1267-1284

work page 2018
[33]

J., & van Lidth De Jeude, C

Boomsma, L. J., & van Lidth De Jeude, C. P. (2000). ’Number needed to screen’: a tool for assessment of prevention programs. Nederlands Tijdschrift V oor Geneeskunde, 144(49), 2345-2348

work page 2000
[34]

E., & Helvie, M

Hendrick, R. E., & Helvie, M. A. (2012). Mammography screening: a new estimate of number needed to screen to prevent one breast cancer death. American Journal of Roentgenology, 198(3), 723-728

work page 2012
[35]

Arenberg, D. (2019). Update on screening for lung cancer. Translational lung cancer research, 8(Suppl 1), S77

work page 2019
[36]

R., Tinmouth, J., Naber, S

Cenin, D. R., Tinmouth, J., Naber, S. K., Dub ´e, C., McCurdy, B. R., Paszat, L., et al. (2021). Calculation of stop ages for colorectal cancer screening based on comorbidities and screening history. Clinical Gastroenterology and Hepatology, 19(3), 547-555

work page 2021

[1] [1]

Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

Sung, H, Ferlay, J, Siegel, RL, Laversanne, M, Soerjomataram, I, Jemal, A, Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021: 71: 209-249

work page 2020

[2] [2]

Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations

Turner, MC, Andersen, ZJ, Baccarelli, A, Diver, WR, Gapstur, SM, Pope, CA, Prada, D, Samet, J, Thurston, G, Cohen, A. Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations. CA Cancer J Clin. 2020: 70: 460- 479

work page 2020

[3] [3]

Wang, Y . (2015). Development of cancer diagnostics—from biomarkers to clinical tests. Translational Cancer Research, 4(3), 270-279

work page 2015

[4] [4]

P., Nuzumlalı, M

Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W. P., Nuzumlalı, M. Y ., et al. (2022). Neural Natural Language Processing for unstructured data in electronic health records: A review. Computer Science Review, 46, 100511

work page 2022

[5] [5]

Gunter,T.D., & Terry,N.P. (2005). The emergence of national electronic health record architectures in the United States and Australia: models, costs, and questions. Journal of medical Internet research, 7(1), e383

work page 2005

[6] [6]

J., Zhao, X., Huang, X., & Qian, D

Wang, X., Oldani, M. J., Zhao, X., Huang, X., & Qian, D. (2014). A review of cancer risk prediction models with genetic variants. Cancer informatics, 13, CIN-S13788

work page 2014

[7] [7]

Zhang, L., Dong, D., Zhang, W., Hao, X., Fang, M., Wang, S., et al. (2020). A deep learning risk prediction model for overall survival in patients with gastric cancer: A multicenter study. Radiotherapy and Oncology, 150, 73-80

work page 2020

[8] [8]

Alexander, M., & Burbury, K. (2016). A systematic review of biomarkers for the prediction of thromboembolism in lung cancer—Results, prac- tical issues and proposed strategies for future risk prediction models. thrombosis Research, 148, 63-69

work page 2016

[9] [9]

Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380

Michelle F Jacobs (2021). Risk Modeling: Predicting cancer risk based on family history, eLife, 10:e73380. https://doi.org/10.7554/eLife.73380

work page doi:10.7554/elife.73380 2021

[10] [10]

Aleksandrova, K., Reichmann, R., Kaaks, R. et al. (2021). Development and validation of a lifestyle-based model for colorectal cancer risk prediction: the LiFeCRC score. BMC Med 19, 1

work page 2021

[11] [11]

Notani, P. N. (1988). Role of alcohol in cancers of the upper alimentary tract: use of models in risk assessment. Journal of Epidemiology & Community Health, 42(2), 187-192

work page 1988

[12] [12]

Nafiseh Nasirzadeh, Yousef Mohammadian & Yadolah Fakhri (2023) Concentration and cancer risk assessment of asbestos in Middle East countries: a systematic review- meta-analysis, International Journal of Environmental Analytical Chemistry, 103:2, 255-269

work page 2023

[13] [13]

Hagar, Y ., Albers, D., Pivovarov, R., Chase, H., Dukic, V ., & Elhadad, N. (2014). Survival analysis with electronic health record data: Experiments with chronic kidney disease. Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(5), 385-403

work page 2014

[14] [14]

F., Hart, G

Stark, G. F., Hart, G. R., Nartowt, B. J., & Deng, J. (2019). Predicting breast cancer risk using personal health data and machine learning models. Plos one, 14(12), e0226765

work page 2019

[15] [15]

H., Yoo, S., D’Imperio, N., McMahon, B

Dai, X., Park, J. H., Yoo, S., D’Imperio, N., McMahon, B. H., Rentsch, C. T., et al. (2022). Survival analysis of localized prostate cancer with deep learning. Scientific Reports, 12(1), 17821

work page 2022

[16] [16]

Birman-Deych,E., Waterman,A.D., Yan,Y ., Nilasena,D.S., Radford,M.J., & Gage,B.F. (2005). Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Medical care, 480-485

work page 2005

[17] [17]

H., Wang, Y

Wang, H. H., Wang, Y . H., Liang, C. W., & Li, Y . C. (2019). Assessment of deep learning using nonimaging information and sequential medical records to develop a prediction model for nonmelanoma skin cancer. JAMA dermatology, 155(11), 1277-1283

work page 2019

[18] [18]

M., Chang, Y

Wang, S. M., Chang, Y . H., Kuo, L. C., Lai, F., Chen, Y . N., Yu, F. Y ., et al. (2020). Using deep learning for automatic ICD-10 classification from free-text data. European Journal of Biomedical Informatics, 16(1)

work page 2020

[19] [19]

Li, Y ., Rao, S., Solares, J. R. A., Hassaine, A., Ramakrishnan, R., Canoy, D., et al. (2020). BEHRT: transformer for electronic health records. Scientific reports, 10(1), 1-12

work page 2020

[20] [20]

T., & Wang, J

Lee, E. T., & Wang, J. (2003). Statistical methods for survival data analysis (V ol. 476). John Wiley & Sons

work page 2003

[21] [21]

L.; Meier, P

Kaplan, E. L.; Meier, P. (1958). Nonparametric estimation from incom- plete observations. J. Amer. Statist. Assoc. 53 (282): 457–481

work page 1958

[22] [22]

Jonathan, B., & Ian, J. (1979). Linear regression with censored data. Biometrika, 66(3), 429-436

work page 1979

[23] [23]

Healthcare

Samoylova A.V . Results of control measures of Roszdravnadzor in relation to the implementation of the national project “Healthcare” and regional programs for the modernization of primary healthcare in the subjects of the Russian Federation in 2021 // Vestnik Roszdravnadzora. – 2022. – V ol. 1. – P. 7–15

work page 2021

[24] [24]

Recall, precision and average precision

Zhu,M.(2004). Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, 2(30), 6

work page 2004

[25] [25]

D., Starinskiy, V

Kaprin, A. D., Starinskiy, V . V ., Shakhzadova, A. O. (2022). State of oncological care for the population of Russia in 2021. P.A. Herzen Moscow State Medical Research Institute – branch of the Federal State Budgetary Institution ”NMRC of Radiology”. (in Russian)

work page 2022

[26] [26]

Lehne, M., Luijten, S., genannt Imbusch, P. V . F., & Thun, S. (2019). The Use of FHIR in Digital Health-A Review of the Scientific Literature. GMDS, (September), 52-58

work page 2019

[27] [27]

Blinov, P., & Kokh, V . (2021). Patient Embeddings in Healthcare and Insurance Applications. arXiv preprint arXiv:2107.03913

work page arXiv 2021

[28] [28]

Longato, E., Vettoretti, M., & Di Camillo, B. (2020). A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. Journal of Biomedical Informatics, 108, 103496

work page 2020

[29] [29]

Henze, N. (1988). A Multivariate Two-Sample Test Based on the Num- ber of Nearest Neighbor Type Coincidences. The Annals of Statistics, 16(2), 772–783

work page 1988

[30] [30]

Philonenko, P., Postovalov, S. (2019). The new robust two-sample test for randomly right-censored data. Journal of Statistical Computation and Simulation, 89(8), 1357-1375

work page 2019

[31] [31]

La Rosa, F., Liso, A., Bianconi, F., Duca, E., & Stracci, F. (2014). Seasonal variation in the month of birth in patients with skin cancer. British journal of cancer, 111(9), 1810-1813

work page 2014

[32] [32]

Gonzalez, H., Hagerling, C., & Werb, Z. (2018). Roles of the immune system in cancer: from tumor initiation to metastatic progression. Genes & development, 32(19-20), 1267-1284

work page 2018

[33] [33]

J., & van Lidth De Jeude, C

Boomsma, L. J., & van Lidth De Jeude, C. P. (2000). ’Number needed to screen’: a tool for assessment of prevention programs. Nederlands Tijdschrift V oor Geneeskunde, 144(49), 2345-2348

work page 2000

[34] [34]

E., & Helvie, M

Hendrick, R. E., & Helvie, M. A. (2012). Mammography screening: a new estimate of number needed to screen to prevent one breast cancer death. American Journal of Roentgenology, 198(3), 723-728

work page 2012

[35] [35]

Arenberg, D. (2019). Update on screening for lung cancer. Translational lung cancer research, 8(Suppl 1), S77

work page 2019

[36] [36]

R., Tinmouth, J., Naber, S

Cenin, D. R., Tinmouth, J., Naber, S. K., Dub ´e, C., McCurdy, B. R., Paszat, L., et al. (2021). Calculation of stop ages for colorectal cancer screening based on comorbidities and screening history. Clinical Gastroenterology and Hepatology, 19(3), 547-555

work page 2021