Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya

Carrie B. Dolan; Haipeng Chen; Jimmy Bach; John Sankok; Julius N. Odhiambo; Rose Kimani; Yang Li; Yaqi Liu

arxiv: 2604.08902 · v1 · submitted 2026-04-10 · 💻 cs.LG

Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya

Jimmy Bach , Yang Li , Yaqi Liu , John Sankok , Rose Kimani , Carrie B. Dolan , Julius N. Odhiambo , Haipeng Chen This is my paper

Pith reviewed 2026-05-10 16:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords synthetic datamachine learningvaccination predictionchildhood immunizationprivacy preservationKenyarisk classificationTabSyn

0 comments

The pith

Machine learning models trained on synthetic data can accurately predict which children are at risk of missing vaccines in Kenya while protecting privacy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that logistic regression and XGBoost models applied to eight years of digitized child vaccination records can flag those likely to miss key doses with recall, precision, and F1 scores above 90 percent for several vaccines. It further demonstrates that replacing the original records with synthetic versions generated by TabSyn maintains this performance level exactly. This combination directly tackles the shortage of usable data in nomadic communities and the heightened privacy needs around sensitive health information. The approach supports targeted interventions and better resource planning for immunization programs.

Core claim

Classification models trained on MOH 510 vaccination records from Narok County can reliably identify children at risk of missing key vaccines. Training the same models on TabSyn-generated synthetic data produces equivalent predictive results, allowing the use of data for forecasting without exposing individual patient details in a vulnerable population.

What carries the argument

TabSyn tabular diffusion-based synthetic data generation used to train Logistic Regression and XGBoost classifiers for identifying vaccination risk on real and generated records.

If this is right

Targeted interventions can reach children predicted at highest risk of missing vaccines to raise overall coverage rates.
Clinics with limited digital systems can still run scalable forecasts of immunization needs using synthetic records.
Privacy concerns in nomadic and low-resource populations no longer block the use of health data for prediction.
Resource allocation for vaccine delivery can rely on evidence from models that do not require sharing original patient files.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could extend to predicting other health service gaps in populations where real data sharing is restricted.
Mobile tools for community health workers might incorporate these risk scores to prioritize home visits.
Testing the same workflow on vaccination datasets from different regions would show whether TabSyn performs consistently across schedules and cultures.

Load-bearing premise

The synthetic data accurately reproduces the statistical distributions and relationships in the original vaccination records without introducing biases that would affect predictions for at-risk children.

What would settle it

Train separate models on real records and on TabSyn synthetic records, then test both on a fresh hold-out set of actual vaccination records; a meaningful drop in recall or precision for the synthetic-trained model would disprove the claim of no performance loss.

Figures

Figures reproduced from arXiv: 2604.08902 by Carrie B. Dolan, Haipeng Chen, Jimmy Bach, John Sankok, Julius N. Odhiambo, Rose Kimani, Yang Li, Yaqi Liu.

**Figure 1.** Figure 1: Data Preprocessing Steps were removed, reducing the sample size to 7,517 patients. After this, we removed the multicollinear latitude and longitude predictors (as these are perfectly correlated with the village predictor). 2. Numeric Data. Numeric features in our dataset included the child’s age and the first visit day for vaccination. Each of these variables was subsetted to avoid unreliable observations.… view at source ↗

**Figure 2.** Figure 2: Number of Individuals Within Each Clinic Registry [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Distances Traveled by Individuals to the Nearest [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Feature Importance Analysis: DPT3 Real Data [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Feature Importance Analysis: DPT3 Synthetic Data [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Background: Limited data utilization in low-resource settings poses a barrier to the vaccine delivery ecosystem, undermining efforts to achieve equitable immunization coverage. In nomadic populations, individuals face an increased risk of missing crucial vaccination doses as children. One such population is the Maasai in Narok County, Kenya, where the absence of high-volume, quality data hampers accurate coverage estimates, impedes efficient resource allocation, and weakens the ability to deliver timely interventions. Additionally, data privacy concerns are heightened in groups with limited sensitive data. Objectives: First, we aim to identify children at risk of missing key vaccines across a large population to provide timely, evidence-based interventions that support increased vaccination coverage. Second, we aim to better protect the privacy of sensitive health data in a vulnerable population. Methods: We digitized 8 years of child vaccination records from the MOH 510 registry (n=6,913) and applied machine learning models (Logistic Regression and XGBoost) to identify children at risk. Additionally, we utilize a novel approach to tabular diffusion-based synthetic data generation (TabSyn) to protect patient privacy within the models. Results: Our findings show that classification techniques can reliably and successfully predict children at risk of missing a vaccine, with recall, precision, and F1-scores exceeding 90% for some vaccines modeled. Additionally, training these models with synthetic data rather than real data, thus preserving the privacy of individuals within the original dataset, does not lead to a loss in predictive performance. Conclusion: These results support the use of synthetic data implementation in health informatics strategies for clinics with limited digital infrastructure, enabling privacy-preserving, scalable forecasting for childhood immunization coverage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward applied study on using TabSyn synthetic data to train vaccination-risk predictors for a Kenyan nomadic population, with high reported scores but thin validation details.

read the letter

The punchline is that this is a practical application of ML and synthetic data to childhood vaccination records from Narok, Kenya, claiming strong prediction and privacy preservation, but the evidence for those claims is not fully spelled out in the abstract. They digitized 6913 child records from the MOH 510 registry over eight years and used logistic regression plus XGBoost to predict who might miss key vaccines. They then trained the same models on TabSyn-generated synthetic data and reported no loss in recall, precision, or F1, with some metrics above 90 percent. What is new here is the dataset and the population. Applying these methods to Maasai nomadic communities in Kenya adds a real-world case where data scarcity and privacy are acute issues. The pipeline is straightforward and aimed at clinics with limited infrastructure. The paper does a good job laying out why better forecasting matters for equitable coverage and timely interventions. The privacy angle through synthetic data is a sensible response to concerns in vulnerable groups. Soft spots center on the evaluation. The results look promising on paper, but there is no information about train-test splits, cross-validation, or tuning. More importantly, there are no reported checks on whether the synthetic data keeps the statistical relationships that matter for the low-prevalence at-risk children. With likely imbalance in the data, standard diffusion models can distort those tails, and the downstream metrics alone do not rule that out. This work is for people doing applied health ML in low-resource settings or exploring synthetic data for tabular records. A reader looking for an example of privacy-preserving prediction in immunization programs would get value from it. It deserves a serious referee. The application is grounded enough and the stakes for coverage equity are high enough that feedback on the methods would improve it. I recommend sending it for peer review, with specific requests for validation details and synthetic fidelity metrics.

Referee Report

3 major / 2 minor

Summary. The manuscript describes the application of machine learning classifiers (Logistic Regression and XGBoost) to predict children at risk of missing vaccinations using a dataset of 6,913 records from the MOH 510 registry in Narok, Kenya. It further explores the use of TabSyn for generating synthetic tabular data to preserve privacy and shows that models trained on this synthetic data achieve comparable performance to those trained on real data, with some models reporting recall, precision, and F1-scores above 90%.

Significance. If validated, these results would demonstrate the feasibility of using synthetic data for privacy-preserving predictive modeling in public health, particularly in low-resource and nomadic populations where data sensitivity is high. This could support better resource allocation for vaccination programs. The approach addresses both predictive accuracy and ethical data use, which is valuable for health informatics in similar contexts. However, the absence of key methodological details currently prevents a full evaluation of the claims' robustness.

major comments (3)

[Methods] Methods: The description of the experimental setup lacks details on data partitioning (e.g., train/validation/test splits), cross-validation procedures, and hyperparameter optimization for the Logistic Regression and XGBoost models. These are essential to evaluate whether the reported performance metrics (recall, precision, F1 >90%) reflect true predictive capability or potential overfitting, especially with a dataset of n=6,913 and likely class imbalance.
[Results] Results: There are no reported checks on the fidelity of the TabSyn-generated synthetic data, such as comparisons of statistical distributions, class-conditional metrics, or preservation of correlations for the at-risk (minority) class. Given the potential for distortion in rare-event tails with tabular diffusion models, this omission undermines the claim that synthetic data training does not lead to loss in predictive performance.
[Results] Results: No statistical significance testing or confidence intervals are provided for the performance metrics or the comparison between real and synthetic training setups. This is particularly important to substantiate the equivalence claim.

minor comments (2)

[Methods] Methods: Provide more details on how the target labels for 'at risk' were defined from the vaccination records.
[Abstract] Abstract: The abstract mentions 'some vaccines modeled' but does not specify which ones achieved the high scores; this should be clarified for context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which highlights important areas for improving the transparency and robustness of our work. We address each major comment point by point below, indicating the revisions we will incorporate to strengthen the manuscript.

read point-by-point responses

Referee: [Methods] Methods: The description of the experimental setup lacks details on data partitioning (e.g., train/validation/test splits), cross-validation procedures, and hyperparameter optimization for the Logistic Regression and XGBoost models. These are essential to evaluate whether the reported performance metrics (recall, precision, F1 >90%) reflect true predictive capability or potential overfitting, especially with a dataset of n=6,913 and likely class imbalance.

Authors: We agree that these methodological details are essential for reproducibility and to allow proper assessment of overfitting risks given the dataset size and potential class imbalance. In the revised manuscript, we will expand the Methods section to explicitly describe: a stratified 70/15/15 train/validation/test split to preserve class distributions; 5-fold stratified cross-validation for model evaluation and tuning; and the hyperparameter optimization procedure (grid search over regularization parameters for Logistic Regression and learning rate, max depth, and estimators for XGBoost, with early stopping). We will also report the selected hyperparameters and any regularization techniques applied. revision: yes
Referee: [Results] Results: There are no reported checks on the fidelity of the TabSyn-generated synthetic data, such as comparisons of statistical distributions, class-conditional metrics, or preservation of correlations for the at-risk (minority) class. Given the potential for distortion in rare-event tails with tabular diffusion models, this omission undermines the claim that synthetic data training does not lead to a loss in predictive performance.

Authors: We acknowledge that fidelity validation would provide stronger support for the equivalence claim, particularly for the minority class. While our evaluation centered on downstream predictive performance, we will add a dedicated subsection in Results presenting fidelity checks, including marginal distribution comparisons (means, variances, histograms), correlation matrix preservation, and class-conditional statistics for the at-risk group. We will also note limitations of tabular diffusion models regarding rare-event tails and their potential impact on generalizability. revision: yes
Referee: [Results] Results: No statistical significance testing or confidence intervals are provided for the performance metrics or the comparison between real and synthetic training setups. This is particularly important to substantiate the equivalence claim.

Authors: We agree that statistical tests and confidence intervals are needed to rigorously support the claim of no performance loss. In the revision, we will report 95% bootstrap confidence intervals for all metrics (precision, recall, F1) based on multiple runs. We will also include statistical comparisons (e.g., paired t-tests on cross-validation fold metrics or McNemar's test) between real and synthetic models, with p-values, to assess whether observed differences are significant. Updated tables and text will present these results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical ML application with measured outcomes

full rationale

The paper applies standard classifiers (Logistic Regression, XGBoost) to real and TabSyn-generated vaccination records, then reports recall/precision/F1 on test splits. No derivations, uniqueness theorems, or self-referential equations exist; performance figures are direct empirical measurements rather than quantities forced by construction from fitted inputs or prior self-citations. The analysis is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that standard supervised classification and diffusion-based synthetic data generation can be applied directly to tabular health records; no additional free parameters, axioms, or invented entities are introduced beyond those implicit in the chosen ML algorithms and TabSyn method.

pith-pipeline@v0.9.0 · 5625 in / 1189 out tokens · 64968 ms · 2026-05-10T16:57:20.885540+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

A decade of progress and challenges in government support for routine immunization in east and southern africa (2015-2024).Pan Afr Med J, 51(22), 2025

Manyanga D, Byabamazima C, Masvikeni B, Ochieng M, and Wanyoike S. A decade of progress and challenges in government support for routine immunization in east and southern africa (2015-2024).Pan Afr Med J, 51(22), 2025

work page 2015
[2]

Unicef annual report 2024: staying and delivering for children

UNICEF. Unicef annual report 2024: staying and delivering for children. Technical report, United Nations Children’s Fund, 2025. Accessed October 13, 2025

work page 2024
[3]

Using public data to predict demand for mobile health clinics

Chen H, Ghosh S, Fan G, Behari N, Biswas A, Williams M, Oriol NE, and Tambe M. Using public data to predict demand for mobile health clinics. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12461–12467, 2022

work page 2022
[4]

Sequential vaccine allocation with delayed feedback

Xiao Y , Ou HC, Chen H, Nguyen VT, and Tran-Thanh L. Sequential vaccine allocation with delayed feedback. InProceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI 2022), pages 5199–5205, 2022

work page 2022
[5]

Immunization agenda 2030: a global strategy to leave no one behind

World Health Organization. Immunization agenda 2030: a global strategy to leave no one behind. Technical report, World Health Organization, 2020. Accessed March 9, 2026

work page 2030
[6]

Gavi 6.0: the alliance’s strategy 2026-2030

Gavi, The Vaccine Alliance. Gavi 6.0: the alliance’s strategy 2026-2030. Technical report, Gavi, 2024. Accessed March 9, 2026

work page 2026
[7]

The global action plan for healthy lives and well-being for all

World Health Organization. The global action plan for healthy lives and well-being for all. Technical report, World Health Organization, 2019. Accessed March 9, 2026

work page 2019
[8]

Haeuser E, Byrne S, Nguyen J, Raggi C, McLaughlin SA, Bisignano C, Harris AA, Smith AE, Lindstedt PA, and Smith G. Global, regional, and national trends in routine childhood vaccination coverage from 1980 to 2023 with forecasts to 2030: a systematic analysis for the global burden of disease study 2023.Lancet, 2025

work page 1980
[9]

Active screening for recurrent diseases: a reinforcement learning approach

Ou HC, Chen H, Jabbari S, and Tambe M. Active screening for recurrent diseases: a reinforcement learning approach. InProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 992–1000, 2021

work page 2021
[10]

Management of medical records for better healthcare service delivery: a case study of narok county referral hospital, kenya.Hum Resour Leadersh J, 7(1), 2022

Orwa B. Management of medical records for better healthcare service delivery: a case study of narok county referral hospital, kenya.Hum Resour Leadersh J, 7(1), 2022

work page 2022
[11]

Big data and personal information privacy in developing countries: insights from kenya.Front Big Data, 8:1532362, 2025

Masinde J, Mugambi F, and Muthee DW. Big data and personal information privacy in developing countries: insights from kenya.Front Big Data, 8:1532362, 2025

work page 2025
[12]

Auto-Encoding Variational Bayes

Kingma DP and Welling M. Auto-encoding variational bayes.arXiv, 2022. arXiv:1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2022
[13]

Generative adversarial nets.Adv Neural Inf Process Syst, 27, 2014

Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, and Ozair S. Generative adversarial nets.Adv Neural Inf Process Syst, 27, 2014

work page 2014
[14]

Denoising diffusion probabilistic models.Adv Neural Inf Process Syst, 33:6840–6851, 2020

Ho J, Jain A, and Abbeel P. Denoising diffusion probabilistic models.Adv Neural Inf Process Syst, 33:6840–6851, 2020

work page 2020
[15]

Population aware diffusion for time series generation

Li Y , Meng H, Bi Z, Urnes IT, and Chen H. Population aware diffusion for time series generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18520–18529, 2025

work page 2025
[16]

Mixed-type tabular data synthesis with score-based diffusion in latent space

Zhang H, Zhang J, Shen Z, Srinivasan B, Qin X, Faloutsos C, Rangwala H, and Karypis G. Mixed-type tabular data synthesis with score-based diffusion in latent space. InProceedings of the International Conference on Learning Representations (ICLR), 2024

work page 2024
[17]

Strengthening the evidence base on the use of digital health technologies to accelerate progress towards universal health coverage.Oxford Open Digit Health, 2:oqae033, 2024

Forslund M, Mathieson K, Djibo Y , Mbindyo C, Lugangira N, and Balasubrama- niam P. Strengthening the evidence base on the use of digital health technologies to accelerate progress towards universal health coverage.Oxford Open Digit Health, 2:oqae033, 2024

work page 2024
[18]

Micronutrient defi- ciency prediction via publicly available satellite data

Bondi E, Chen H, Golden CD, Behari N, and Tambe M. Micronutrient defi- ciency prediction via publicly available satellite data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12454–12460, 2022

work page 2022
[19]

Predicting mi- cronutrient deficiency with publicly available satellite data.AI Mag, 44(1):30–40, 2023

Bondi-Kelly E, Chen H, Golden CD, Behari N, and Tambe M. Predicting mi- cronutrient deficiency with publicly available satellite data.AI Mag, 44(1):30–40, 2023

work page 2023
[20]

The role of artificial intelligence in pandemic responses: from epidemiological modeling to vaccine development.Mol Biomed, 6(1):1, 2025

Gawande MS, Zade N, Kumar P, Gundewar S, Weerarathna IN, and Verma P. The role of artificial intelligence in pandemic responses: from epidemiological modeling to vaccine development.Mol Biomed, 6(1):1, 2025

work page 2025
[21]

Predictive modeling of vaccination uptake in us counties: a machine learning-based approach

Cheong Q, Au-Yeung M, Quon S, Concepcion K, and Kong JD. Predictive modeling of vaccination uptake in us counties: a machine learning-based approach. J Med Internet Res, 23(11):e33231, 2021

work page 2021
[22]

Using machine learning algorithms to predict covid-19 vaccine uptake: a year after the introduction of covid-19 vaccines in ghana.Vaccine X, 18:100466, 2024

Dodoo CC, Hanson-Yamoah E, Adedia D, Erzuah I, Yamoah P, Brobbey F, Cob- bold C, and Mensah J. Using machine learning algorithms to predict covid-19 vaccine uptake: a year after the introduction of covid-19 vaccines in ghana.Vaccine X, 18:100466, 2024

work page 2024
[23]

Associating measles vaccine uptake classification and its underlying factors using an ensemble of machine learning models.IEEE Access, 9:119613–119628, 2021

Hasan MK, Jawad MT, Dutta A, Awal MA, Islam MA, Masud M, and Al-Amri JF. Associating measles vaccine uptake classification and its underlying factors using an ensemble of machine learning models.IEEE Access, 9:119613–119628, 2021

work page 2021
[24]

Leveraging ensemble machine learning approaches to predict measles vaccination status among children under five: insights from the 2019 zimbabwe mics

Mbunge E. Leveraging ensemble machine learning approaches to predict measles vaccination status among children under five: insights from the 2019 zimbabwe mics. InComput Sci On-line Conf, pages 310–324, 2025

work page 2019
[25]

Demsash AW, Chereka AA, Walle AD, Kassie SY , Bekele F, and Bekana T. Ma- chine learning algorithms’ application to predict childhood vaccination among children aged 12-23 months in ethiopia: evidence from the 2016 ethiopian demo- graphic and health survey dataset.PLoS One, 18(10):e0288867, 2023

work page 2016
[26]

Determinants of childhood vaccination uptake: a machine learning approach using a decision tree classifier.J Inform, 5(1), 2025

Kalegele K and Lubua EW. Determinants of childhood vaccination uptake: a machine learning approach using a decision tree classifier.J Inform, 5(1), 2025

work page 2025
[27]

Challenges and solutions for transforming health ecosystems in low- and middle-income countries through artificial intelligence.Front Med, 9:958097, 2022

López DM, Rico-Olarte C, Blobel B, and Hullin C. Challenges and solutions for transforming health ecosystems in low- and middle-income countries through artificial intelligence.Front Med, 9:958097, 2022

work page 2022
[28]

Faketables: using gans to generate functional dependency preserving tables with bounded real data

Chen H, Jajodia S, Liu J, Park N, Sokolov V , and Subrahmanian VS. Faketables: using gans to generate functional dependency preserving tables with bounded real data. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2074–2080, 2019

work page 2074
[29]

Medgan: medical image translation using gans.Comput Med Imaging Graph, 79:101684, 2020

Armanious K, Jiang C, Fischer M, Küstner T, Hepp T, Nikolaou K, Gatidis S, and Yang B. Medgan: medical image translation using gans.Comput Med Imaging Graph, 79:101684, 2020. Green Lab 2020/2021, September–October, 2020, Amsterdam, The Netherlands Jimmy Bach, Y ang Li, M.S., Y aqi Liu, B.S., John Sankok, Rose Kimani, Carrie B. Dolan, PhD, Julius N. Odhiam...

work page 2020
[30]

Synthesizing electronic health records using improved generative adversarial networks.J Am Med Inform Assoc, 26(3):228–241, 2019

Baowaly MK, Lin CC, Liu CL, and Chen KT. Synthesizing electronic health records using improved generative adversarial networks.J Am Med Inform Assoc, 26(3):228–241, 2019

work page 2019
[31]

Eva: generating longitudinal electronic health records using conditional variational autoencoders

Biswal S, Ghosh S, Duke J, Malin B, Stewart W, Xiao C, and Sun J. Eva: generating longitudinal electronic health records using conditional variational autoencoders. InProceedings of Machine Learning for Healthcare Conference, pages 260–282, 2021

work page 2021
[32]

Score- based generative modeling through stochastic differential equations

Song Y , Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, and Poole B. Score- based generative modeling through stochastic differential equations. InProceed- ings of the International Conference on Learning Representations (ICLR), 2021

work page 2021
[33]

High-resolution image synthesis with latent diffusion models

Rombach R, Blattmann A, Lorenz D, Esser P, and Ommer B. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684– 10695, 2022

work page 2022
[34]

Diffusion-ts: interpretable diffusion for general time series generation

Yuan X and Qiao Y . Diffusion-ts: interpretable diffusion for general time series generation. InProceedings of the Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024
[35]

Tabddpm: modelling tabular data with diffusion models

Kotelnikov A, Baranchuk D, Rubachev I, and Babenko A. Tabddpm: modelling tabular data with diffusion models. InProc Int Conf Mach Learn, pages 17564– 17579, 2023

work page 2023
[36]

Scoehr: generating synthetic electronic health records using continuous-time diffusion models

Naseer AA, Walker B, Landon C, Ambrosy A, Fudim M, Wysham N, Toro B, Swaminathan S, and Lyons T. Scoehr: generating synthetic electronic health records using continuous-time diffusion models. InProceedings of Machine Learning for Healthcare Conference, 2023

work page 2023
[37]

Xgboost: a scalable tree boosting system.Cornell University, 2016

Chen T. Xgboost: a scalable tree boosting system.Cornell University, 2016

work page 2016
[38]

Determinants of effective vaccine coverage in low and middle-income countries: a systematic review and interpretive synthesis.BMC Health Serv Res, 17(1):681, 2017

Phillips DE, Dieleman JL, Lim SS, and Shearer J. Determinants of effective vaccine coverage in low and middle-income countries: a systematic review and interpretive synthesis.BMC Health Serv Res, 17(1):681, 2017

work page 2017
[39]

Scikit-learn: machine learning in python.J Mach Learn Res, 12:2825–2830, 2011

Pedregosa F, Varoquaux G, Gramfort A, Michel V , Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, and Dubourg V . Scikit-learn: machine learning in python.J Mach Learn Res, 12:2825–2830, 2011. Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya Green Lab 2020/2021, September–October, 2020, Amsterdam, Th...

work page 2011

[1] [1]

A decade of progress and challenges in government support for routine immunization in east and southern africa (2015-2024).Pan Afr Med J, 51(22), 2025

Manyanga D, Byabamazima C, Masvikeni B, Ochieng M, and Wanyoike S. A decade of progress and challenges in government support for routine immunization in east and southern africa (2015-2024).Pan Afr Med J, 51(22), 2025

work page 2015

[2] [2]

Unicef annual report 2024: staying and delivering for children

UNICEF. Unicef annual report 2024: staying and delivering for children. Technical report, United Nations Children’s Fund, 2025. Accessed October 13, 2025

work page 2024

[3] [3]

Using public data to predict demand for mobile health clinics

Chen H, Ghosh S, Fan G, Behari N, Biswas A, Williams M, Oriol NE, and Tambe M. Using public data to predict demand for mobile health clinics. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12461–12467, 2022

work page 2022

[4] [4]

Sequential vaccine allocation with delayed feedback

Xiao Y , Ou HC, Chen H, Nguyen VT, and Tran-Thanh L. Sequential vaccine allocation with delayed feedback. InProceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI 2022), pages 5199–5205, 2022

work page 2022

[5] [5]

Immunization agenda 2030: a global strategy to leave no one behind

World Health Organization. Immunization agenda 2030: a global strategy to leave no one behind. Technical report, World Health Organization, 2020. Accessed March 9, 2026

work page 2030

[6] [6]

Gavi 6.0: the alliance’s strategy 2026-2030

Gavi, The Vaccine Alliance. Gavi 6.0: the alliance’s strategy 2026-2030. Technical report, Gavi, 2024. Accessed March 9, 2026

work page 2026

[7] [7]

The global action plan for healthy lives and well-being for all

World Health Organization. The global action plan for healthy lives and well-being for all. Technical report, World Health Organization, 2019. Accessed March 9, 2026

work page 2019

[8] [8]

Haeuser E, Byrne S, Nguyen J, Raggi C, McLaughlin SA, Bisignano C, Harris AA, Smith AE, Lindstedt PA, and Smith G. Global, regional, and national trends in routine childhood vaccination coverage from 1980 to 2023 with forecasts to 2030: a systematic analysis for the global burden of disease study 2023.Lancet, 2025

work page 1980

[9] [9]

Active screening for recurrent diseases: a reinforcement learning approach

Ou HC, Chen H, Jabbari S, and Tambe M. Active screening for recurrent diseases: a reinforcement learning approach. InProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 992–1000, 2021

work page 2021

[10] [10]

Management of medical records for better healthcare service delivery: a case study of narok county referral hospital, kenya.Hum Resour Leadersh J, 7(1), 2022

Orwa B. Management of medical records for better healthcare service delivery: a case study of narok county referral hospital, kenya.Hum Resour Leadersh J, 7(1), 2022

work page 2022

[11] [11]

Big data and personal information privacy in developing countries: insights from kenya.Front Big Data, 8:1532362, 2025

Masinde J, Mugambi F, and Muthee DW. Big data and personal information privacy in developing countries: insights from kenya.Front Big Data, 8:1532362, 2025

work page 2025

[12] [12]

Auto-Encoding Variational Bayes

Kingma DP and Welling M. Auto-encoding variational bayes.arXiv, 2022. arXiv:1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2022

[13] [13]

Generative adversarial nets.Adv Neural Inf Process Syst, 27, 2014

Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, and Ozair S. Generative adversarial nets.Adv Neural Inf Process Syst, 27, 2014

work page 2014

[14] [14]

Denoising diffusion probabilistic models.Adv Neural Inf Process Syst, 33:6840–6851, 2020

Ho J, Jain A, and Abbeel P. Denoising diffusion probabilistic models.Adv Neural Inf Process Syst, 33:6840–6851, 2020

work page 2020

[15] [15]

Population aware diffusion for time series generation

Li Y , Meng H, Bi Z, Urnes IT, and Chen H. Population aware diffusion for time series generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18520–18529, 2025

work page 2025

[16] [16]

Mixed-type tabular data synthesis with score-based diffusion in latent space

Zhang H, Zhang J, Shen Z, Srinivasan B, Qin X, Faloutsos C, Rangwala H, and Karypis G. Mixed-type tabular data synthesis with score-based diffusion in latent space. InProceedings of the International Conference on Learning Representations (ICLR), 2024

work page 2024

[17] [17]

Strengthening the evidence base on the use of digital health technologies to accelerate progress towards universal health coverage.Oxford Open Digit Health, 2:oqae033, 2024

Forslund M, Mathieson K, Djibo Y , Mbindyo C, Lugangira N, and Balasubrama- niam P. Strengthening the evidence base on the use of digital health technologies to accelerate progress towards universal health coverage.Oxford Open Digit Health, 2:oqae033, 2024

work page 2024

[18] [18]

Micronutrient defi- ciency prediction via publicly available satellite data

Bondi E, Chen H, Golden CD, Behari N, and Tambe M. Micronutrient defi- ciency prediction via publicly available satellite data. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12454–12460, 2022

work page 2022

[19] [19]

Predicting mi- cronutrient deficiency with publicly available satellite data.AI Mag, 44(1):30–40, 2023

Bondi-Kelly E, Chen H, Golden CD, Behari N, and Tambe M. Predicting mi- cronutrient deficiency with publicly available satellite data.AI Mag, 44(1):30–40, 2023

work page 2023

[20] [20]

The role of artificial intelligence in pandemic responses: from epidemiological modeling to vaccine development.Mol Biomed, 6(1):1, 2025

Gawande MS, Zade N, Kumar P, Gundewar S, Weerarathna IN, and Verma P. The role of artificial intelligence in pandemic responses: from epidemiological modeling to vaccine development.Mol Biomed, 6(1):1, 2025

work page 2025

[21] [21]

Predictive modeling of vaccination uptake in us counties: a machine learning-based approach

Cheong Q, Au-Yeung M, Quon S, Concepcion K, and Kong JD. Predictive modeling of vaccination uptake in us counties: a machine learning-based approach. J Med Internet Res, 23(11):e33231, 2021

work page 2021

[22] [22]

Using machine learning algorithms to predict covid-19 vaccine uptake: a year after the introduction of covid-19 vaccines in ghana.Vaccine X, 18:100466, 2024

Dodoo CC, Hanson-Yamoah E, Adedia D, Erzuah I, Yamoah P, Brobbey F, Cob- bold C, and Mensah J. Using machine learning algorithms to predict covid-19 vaccine uptake: a year after the introduction of covid-19 vaccines in ghana.Vaccine X, 18:100466, 2024

work page 2024

[23] [23]

Associating measles vaccine uptake classification and its underlying factors using an ensemble of machine learning models.IEEE Access, 9:119613–119628, 2021

Hasan MK, Jawad MT, Dutta A, Awal MA, Islam MA, Masud M, and Al-Amri JF. Associating measles vaccine uptake classification and its underlying factors using an ensemble of machine learning models.IEEE Access, 9:119613–119628, 2021

work page 2021

[24] [24]

Leveraging ensemble machine learning approaches to predict measles vaccination status among children under five: insights from the 2019 zimbabwe mics

Mbunge E. Leveraging ensemble machine learning approaches to predict measles vaccination status among children under five: insights from the 2019 zimbabwe mics. InComput Sci On-line Conf, pages 310–324, 2025

work page 2019

[25] [25]

Demsash AW, Chereka AA, Walle AD, Kassie SY , Bekele F, and Bekana T. Ma- chine learning algorithms’ application to predict childhood vaccination among children aged 12-23 months in ethiopia: evidence from the 2016 ethiopian demo- graphic and health survey dataset.PLoS One, 18(10):e0288867, 2023

work page 2016

[26] [26]

Determinants of childhood vaccination uptake: a machine learning approach using a decision tree classifier.J Inform, 5(1), 2025

Kalegele K and Lubua EW. Determinants of childhood vaccination uptake: a machine learning approach using a decision tree classifier.J Inform, 5(1), 2025

work page 2025

[27] [27]

Challenges and solutions for transforming health ecosystems in low- and middle-income countries through artificial intelligence.Front Med, 9:958097, 2022

López DM, Rico-Olarte C, Blobel B, and Hullin C. Challenges and solutions for transforming health ecosystems in low- and middle-income countries through artificial intelligence.Front Med, 9:958097, 2022

work page 2022

[28] [28]

Faketables: using gans to generate functional dependency preserving tables with bounded real data

Chen H, Jajodia S, Liu J, Park N, Sokolov V , and Subrahmanian VS. Faketables: using gans to generate functional dependency preserving tables with bounded real data. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 2074–2080, 2019

work page 2074

[29] [29]

Medgan: medical image translation using gans.Comput Med Imaging Graph, 79:101684, 2020

Armanious K, Jiang C, Fischer M, Küstner T, Hepp T, Nikolaou K, Gatidis S, and Yang B. Medgan: medical image translation using gans.Comput Med Imaging Graph, 79:101684, 2020. Green Lab 2020/2021, September–October, 2020, Amsterdam, The Netherlands Jimmy Bach, Y ang Li, M.S., Y aqi Liu, B.S., John Sankok, Rose Kimani, Carrie B. Dolan, PhD, Julius N. Odhiam...

work page 2020

[30] [30]

Synthesizing electronic health records using improved generative adversarial networks.J Am Med Inform Assoc, 26(3):228–241, 2019

Baowaly MK, Lin CC, Liu CL, and Chen KT. Synthesizing electronic health records using improved generative adversarial networks.J Am Med Inform Assoc, 26(3):228–241, 2019

work page 2019

[31] [31]

Eva: generating longitudinal electronic health records using conditional variational autoencoders

Biswal S, Ghosh S, Duke J, Malin B, Stewart W, Xiao C, and Sun J. Eva: generating longitudinal electronic health records using conditional variational autoencoders. InProceedings of Machine Learning for Healthcare Conference, pages 260–282, 2021

work page 2021

[32] [32]

Score- based generative modeling through stochastic differential equations

Song Y , Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, and Poole B. Score- based generative modeling through stochastic differential equations. InProceed- ings of the International Conference on Learning Representations (ICLR), 2021

work page 2021

[33] [33]

High-resolution image synthesis with latent diffusion models

Rombach R, Blattmann A, Lorenz D, Esser P, and Ommer B. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684– 10695, 2022

work page 2022

[34] [34]

Diffusion-ts: interpretable diffusion for general time series generation

Yuan X and Qiao Y . Diffusion-ts: interpretable diffusion for general time series generation. InProceedings of the Twelfth International Conference on Learning Representations (ICLR), 2024

work page 2024

[35] [35]

Tabddpm: modelling tabular data with diffusion models

Kotelnikov A, Baranchuk D, Rubachev I, and Babenko A. Tabddpm: modelling tabular data with diffusion models. InProc Int Conf Mach Learn, pages 17564– 17579, 2023

work page 2023

[36] [36]

Scoehr: generating synthetic electronic health records using continuous-time diffusion models

Naseer AA, Walker B, Landon C, Ambrosy A, Fudim M, Wysham N, Toro B, Swaminathan S, and Lyons T. Scoehr: generating synthetic electronic health records using continuous-time diffusion models. InProceedings of Machine Learning for Healthcare Conference, 2023

work page 2023

[37] [37]

Xgboost: a scalable tree boosting system.Cornell University, 2016

Chen T. Xgboost: a scalable tree boosting system.Cornell University, 2016

work page 2016

[38] [38]

Determinants of effective vaccine coverage in low and middle-income countries: a systematic review and interpretive synthesis.BMC Health Serv Res, 17(1):681, 2017

Phillips DE, Dieleman JL, Lim SS, and Shearer J. Determinants of effective vaccine coverage in low and middle-income countries: a systematic review and interpretive synthesis.BMC Health Serv Res, 17(1):681, 2017

work page 2017

[39] [39]

Scikit-learn: machine learning in python.J Mach Learn Res, 12:2825–2830, 2011

Pedregosa F, Varoquaux G, Gramfort A, Michel V , Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, and Dubourg V . Scikit-learn: machine learning in python.J Mach Learn Res, 12:2825–2830, 2011. Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya Green Lab 2020/2021, September–October, 2020, Amsterdam, Th...

work page 2011