Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift

Antoine Vacavant; David Niyukuri; Ding-Geng Chen; Lansana Hassim Kallon; Marcellin Atemkeng; Samuel Saidu; Yusuf Brima

arxiv: 2605.26589 · v1 · pith:CMKFHDZ6new · submitted 2026-05-26 · 💻 cs.LG · cs.AI· stat.ML

Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift

Yusuf Brima , Marcellin Atemkeng , Lansana Hassim Kallon , David Niyukuri , Antoine Vacavant , Samuel Saidu , Ding-Geng Chen This is my paper

Pith reviewed 2026-06-29 19:51 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords childhood anemiatabular foundation modelsdistribution shiftfew-shot learningcross-country generalizationTabPFNDHS data

0 comments

The pith

TabPFN outperforms classical models in low-data regimes for childhood anemia prediction across countries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates transformer-based tabular foundation models against classical methods for predicting childhood anemia using data from 16 countries under conditions of limited data and cross-country distribution shifts. It shows that TabPFN achieves better performance when training samples are fewer than 200, with the lowest Brier score and expected calibration error across settings. Performance differences are larger between countries than between models, indicating population variation as the main driver. This setup tests generalization in leave-one-country-out and few-shot scenarios on DHS survey data. The findings suggest foundation models can aid predictions in data-scarce global health contexts.

Core claim

TabPFN v2.6 outperformed Logistic Regression, XGBoost, and LightGBM in low-data regimes with higher discrimination and better calibration, achieving the lowest Brier score of 0.042 and ECE of 0.203 across countries. In full-data settings AUC-ROC ranged 0.59-0.76 with small model differences. LOCO performance was stable at 0.58-0.69 driven by country context, with asymmetric transfer in reverse-LOCO. Subgroup performance was consistent without systematic bias, and SHAP identified child age, altitude, and height-for-age z-score as dominant predictors. Performance is driven more by population variation than model choice.

What carries the argument

TabPFN, a transformer-based foundation model for tabular data, evaluated via leave-one-country-out and few-shot protocols on DHS data for anemia prediction.

If this is right

TabPFN can be applied in new countries with limited local training data for improved anemia prediction.
Efforts in anemia modeling should focus on capturing population-specific factors rather than model complexity.
Models show consistent performance across demographic subgroups, supporting broad application.
Key predictors like child age, altitude, and HAZ can inform targeted health interventions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar advantages for foundation models may appear in predictions of other childhood health conditions with scarce data.
Validation on data from countries outside the original 16 would test the robustness of the observed cross-country generalization.
Adding features beyond the current DHS set might reduce the dominance of population variation in performance.

Load-bearing premise

The leave-one-country-out and few-shot protocols capture the relevant distribution shifts that occur when deploying to a new country without additional unmeasured confounding factors.

What would settle it

A new country dataset where TabPFN fails to maintain its calibration and discrimination advantages would falsify the claim of superior low-data generalization.

Figures

Figures reproduced from arXiv: 2605.26589 by Antoine Vacavant, David Niyukuri, Ding-Geng Chen, Lansana Hassim Kallon, Marcellin Atemkeng, Samuel Saidu, Yusuf Brima.

**Figure 1.** Figure 1: Few-shot performance curves (AUC-ROC) by country and model as a function of within-country training sample size. Each panel shows AUC-ROC on the held-out test set as a function of the number of within-country labeled training samples (n-shot) for all models. The x-axis scale differs across panels, reflecting variation in country-specific analytic sample sizes. TabPFN leverages in-context learning and does … view at source ↗

**Figure 2.** Figure 2: Country-stratified calibration curves for all four predictive models. Each panel shows observed Anemia frequency (y-axis) against mean predicted probability (x-axis) across ten equal-width probability bins for each model. The dashed diagonal represents perfect calibration. Curves above the diagonal indicate underestimation of risk; curves below indicate overestimation. Calibration was assessed where each m… view at source ↗

**Figure 3.** Figure 3: Within-country discriminative performance (AUC-ROC) of four predictive models across 16 study populations. AUC-ROC from stratified five-fold cross-validation within each country for the models, with error bars denoting 95% bootstrap confidence intervals. Differences between models within any single country are modest and confidence intervals overlap substantially; variation across countries exceeds variati… view at source ↗

**Figure 4.** Figure 4: External discriminative performance (AUC-ROC) under LOCO validation across 16 study populations. Each country was iteratively held out as an external test set while models were trained on the pooled data from the remaining 15 countries. AUC-ROC is shown for each model with error bars denoting 95% bootstrap confidence intervals. Performance declines relative to within-country cross-validation across all mo… view at source ↗

**Figure 5.** Figure 5: Cross-country transferability of predictive models under reverse LOCO validation. Each panel shows AUC-ROC for a given model (A: Logistic Regression; B: LightGBM; C: XGBoost; D: TabPFN v2.6) across all 240 directed train–test country pairs. Rows denote the training country; columns denote the held-out test country; diagonal cells are blank (same-country pairs excluded). Color intensity reflects AUC-ROC on … view at source ↗

**Figure 6.** Figure 6: Subgroup discriminative performance (AUC-ROC) across demographic strata, countries, and models. Each panel shows AUCROC for a given model across available demographic subgroups (columns) and countries (rows), evaluated under leave-one-country-out validation. Subgroups include child age group in months, maternal education level (no education, primary, secondary, higher), residence type (rural, urban), chil… view at source ↗

**Figure 7.** Figure 7: Country-stratified decision curve analysis for four predictive models of childhood Anemia. Net benefit is shown as a function of threshold probability for all models alongside reference strategies of treating all children (dashed grey) and treating none (dotted black). Each panel represents one of the 16 study countries, ordered by continental group. All models outperform the treat-none strategy across a w… view at source ↗

**Figure 8.** Figure 8: Aggregated feature importance across the four predictive models. Mean SHAP importance values (with standard deviation) are shown for all models, averaged across all 16 study countries. Features are ordered by descending mean importance within each model panel. Variable codes correspond to DHS predictors described in [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Country-level feature importance for TabPFN v2.6 across 16 study populations. Each panel shows importance for each predictor within a single country, estimated by measuring the decline in AUC-ROC when each feature column is randomly permuted on the held-out test set under within-country evaluation. Features are ordered consistently across panels to facilitate comparison. Child age (hw1) was the dominant pr… view at source ↗

read the original abstract

Childhood anemia affects around 40% of children aged 6-59 months globally and arises from heterogeneous factors, limiting model generalizability. We evaluate a transformer-based tabular foundation model against classical supervised methods under cross-country and data-scarce settings. We used DHS data from 16 countries across Africa, Asia, Latin America, the Caucasus, and the Middle East (n=68,856). We compared Logistic Regression, XGBoost, LightGBM, and TabPFN v2.6. Performance was assessed using AUC-ROC, Brier score, and ECE. Generalization was evaluated using leave-one-country-out (LOCO), reverse-LOCO, and few-shot settings. Subgroup analyses included sex, age, residence, maternal education, and wealth. Feature importance was estimated using SHAP. TabPFN outperformed classical models in low-data regimes (<200 samples), showing higher discrimination and better calibration. Across countries, it achieved the lowest Brier score (0.042) and ECE (0.203). Under full-data settings, AUC-ROC ranged from 0.59-0.76 with small between-model differences ($\leq 0.05$). LOCO performance was stable (0.58-0.69), driven by country context. Reverse-LOCO showed asymmetric transferability. Subgroup performance was consistent with no systematic demographic bias. SHAP identified child age, altitude, and height-for-age z-score as dominant predictors, followed by wealth and maternal education. Performance in childhood anemia prediction is driven more by population variation than model choice. TabPFN provides advantages in low-resource settings through improved discrimination and calibration, highlighting foundation models as promising tools for data-scarce global health prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TabPFN shows a modest low-data edge on this anemia task, but the claim that population variation drives results more than model choice is undercut by unaddressed DHS measurement differences across countries.

read the letter

TabPFN beats the classical models when samples fall below 200, with better discrimination and calibration on childhood anemia. The paper's main takeaway that performance gaps trace more to country-level population differences than to model choice, however, sits on thin evidence because the leave-one-country-out design does not isolate those differences from survey implementation artifacts.

The concrete work is an empirical head-to-head: DHS records from 16 countries, TabPFN v2.6 versus logistic regression, XGBoost, and LightGBM, evaluated on AUC, Brier score, and ECE under full-data, LOCO, reverse-LOCO, and few-shot regimes. It also runs subgroup checks and SHAP. The numbers show stable LOCO AUCs around 0.58-0.69, small full-data gaps of at most 0.05, and TabPFN posting the lowest Brier (0.042) and ECE (0.203). That is the incremental extension worth noting.

The setup is reasonable for the domain and the calibration metrics are a plus. Subgroup consistency is also useful to see.

The soft spots are not minor. The abstract supplies no hyperparameter protocol, no statistical tests, and no preprocessing steps. More critically, the stress-test concern lands: hemoglobin assay methods, altitude adjustments, and sampling frames can differ systematically by country and may correlate with the top SHAP features (age, height-for-age z-score). If they do, the attribution of stable performance to population variation rather than data artifacts does not hold. The paper would need to address or bound that possibility.

This is for researchers applying tabular foundation models to global health data. A reader wanting concrete numbers on few-shot transfer will find something to look at; anyone planning to cite or extend the work will need tighter methods.

It deserves a serious referee because the question is practical and the data are real, even though the current draft requires more rigor on reproducibility and confounds before the central claim can be trusted.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates TabPFN v2.6 against Logistic Regression, XGBoost, and LightGBM for childhood anemia prediction on DHS data from 16 countries (n=68,856). It reports TabPFN advantages in few-shot regimes (<200 samples) with superior discrimination and calibration, lowest overall Brier score (0.042) and ECE (0.203), full-data AUC-ROC of 0.59-0.76 with small model differences (≤0.05), and stable LOCO performance (0.58-0.69) driven primarily by country-level population variation rather than model choice. Subgroup and SHAP analyses are also presented.

Significance. If the empirical comparisons hold after methodological clarification, the work provides useful evidence on the relative importance of data heterogeneity versus model architecture in tabular health prediction under distribution shift. The multi-country scale, use of calibration metrics alongside AUC, and explicit few-shot/LOCO protocols are strengths that could inform foundation-model deployment in low-resource global health settings.

major comments (2)

[Abstract] Abstract and methods description: performance numbers (Brier 0.042, ECE 0.203, AUC ranges) and claims of outperformance are presented without any information on hyperparameter search, statistical testing, preprocessing pipelines, or class-imbalance handling. These omissions make the central numerical claims unverifiable and load-bearing for the reported superiority of TabPFN.
[Abstract] Abstract: the claim that 'performance is driven more by population variation than model choice' rests on LOCO results showing stable AUC (0.58-0.69) and small between-model gaps (≤0.05). The experiments do not address or control for potential cross-country differences in DHS implementation (hemoglobin assay methods, altitude adjustments, sampling frames, or anemia threshold application) that may confound the intended population shifts and correlate with dominant SHAP features (child age, height-for-age z-score).

minor comments (2)

[Abstract] The abstract would benefit from explicit reporting of per-country sample sizes and anemia prevalence to contextualize the LOCO stability claim.
Consider adding a brief statement on how the few-shot subsets (<200 samples) were constructed (random, stratified, or otherwise) to allow replication of the low-data regime results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on verifiability and potential confounders. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract and methods description: performance numbers (Brier 0.042, ECE 0.203, AUC ranges) and claims of outperformance are presented without any information on hyperparameter search, statistical testing, preprocessing pipelines, or class-imbalance handling. These omissions make the central numerical claims unverifiable and load-bearing for the reported superiority of TabPFN.

Authors: We agree that additional methodological details are needed for full verifiability. The full manuscript methods section describes the models and metrics but does not explicitly detail hyperparameter procedures, statistical tests, or imbalance handling. In revision we will expand the methods to specify: (i) hyperparameter search (TabPFN used defaults per v2.6; classical models used scikit-learn/XGBoost defaults with limited grid search on learning rate and depth); (ii) statistical testing (bootstrap 95% CIs on AUC/Brier and DeLong tests for pairwise comparisons); (iii) preprocessing (median imputation for missing values, z-score standardization for LR, one-hot encoding for categoricals); and (iv) imbalance handling (class weights in LR/XGBoost/LightGBM; TabPFN's built-in handling). A concise summary will be added to the abstract if space allows. These additions will make the reported numbers transparent without altering results. revision: yes
Referee: [Abstract] Abstract: the claim that 'performance is driven more by population variation than model choice' rests on LOCO results showing stable AUC (0.58-0.69) and small between-model gaps (≤0.05). The experiments do not address or control for potential cross-country differences in DHS implementation (hemoglobin assay methods, altitude adjustments, sampling frames, or anemia threshold application) that may confound the intended population shifts and correlate with dominant SHAP features (child age, height-for-age z-score).

Authors: The LOCO design intentionally captures the net effect of all country-level factors (including any unmeasured DHS implementation differences) on performance. The observed pattern—larger AUC variation across countries (0.58-0.69) than across models (≤0.05)—still indicates that population context dominates model architecture. We acknowledge that the paper does not explicitly control for assay methods, sampling frames, or threshold variations, as these metadata are not uniformly available in the public DHS files. Altitude is included as a covariate and appears in SHAP rankings, partially addressing one listed factor. In revision we will add an explicit limitations paragraph discussing these potential confounders and noting that the small model gaps persist even under the observed heterogeneity. No new experiments are feasible without external data sources. revision: partial

Circularity Check

0 steps flagged

Purely empirical comparison with no derivation chain

full rationale

This is a standard empirical ML benchmarking paper that trains and evaluates models (LR, XGBoost, LightGBM, TabPFN) on DHS tabular data under LOCO, reverse-LOCO, and few-shot protocols, reporting AUC, Brier, ECE, and SHAP values. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear anywhere in the provided text. All performance claims reduce directly to measured outcomes on held-out country subsets rather than to any internal definition or prior author result, so the analysis is self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The study rests on standard supervised learning assumptions and the representativeness of DHS survey data. No new entities are introduced.

free parameters (1)

model hyperparameters
Hyperparameters for XGBoost, LightGBM, and TabPFN are fitted during training; exact values and search procedure not stated in abstract.

axioms (1)

domain assumption DHS survey responses provide accurate labels for anemia status and risk factors across the sampled countries
Used as ground truth for all training and evaluation.

pith-pipeline@v0.9.1-grok · 5884 in / 1342 out tokens · 33044 ms · 2026-06-29T19:51:00.130439+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 2 canonical work pages

[1]

Anaemia; 2025

World Health Organization. Anaemia; 2025. Fact sheet, updated 10 February 2025.https://www.who. int/news-room/fact-sheets/detail/anaemia

2025
[2]

Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: findings from the Global Burden of Disease Study 2021

Gardner WM, Razo C, McHugh TA, Hagins H, Vilchis-Tella VM, Hennessy C, et al. Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: findings from the Global Burden of Disease Study 2021. The Lancet Haematology. 2023;10(9):e713-34

1990
[3]

Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016

Tesfaye SH, Seboka BT, Sisay D. Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016. Plos one. 2024;19(4):e0300172

2016
[4]

Dataset Types; 2026

The DHS Program. Dataset Types; 2026. Accessed February 19, 2026.https://dhsprogram.com/ data/dataset-types.cfm

2026
[5]

Machine learning algorithms to predict the childhood anemia in Bangladesh

Khan JR, Chowdhury S, Islam H, Raheem E. Machine learning algorithms to predict the childhood anemia in Bangladesh. Journal of Data Science. 2019;17(1):195-218

2019
[6]

Optimizing Predictive Analytics for Childhood Anaemia: A Machine Learning Model Approach

Das B, Barman MP, Kotoky MJ. Optimizing Predictive Analytics for Childhood Anaemia: A Machine Learning Model Approach. Clinical Epidemiology and Global Health. 2025:102275

2025
[7]

Predicting Childhood Anaemia in Nigeria: A Machine Learning Approach to Uncover Key Risk Factors

Ja’afar IK, Uthman OA. Predicting Childhood Anaemia in Nigeria: A Machine Learning Approach to Uncover Key Risk Factors. Public Health Challenges. 2025;4(4):e70135

2025
[8]

Predicting childhood anaemia in Ghana with ex- plainable machine learning: A national survey analysis

Hassan YSA, Omar MA, Karikari JK, Ali AS, Ahmed MM. Predicting childhood anaemia in Ghana with ex- plainable machine learning: A national survey analysis. Digital Health. 2026;12:20552076261437179

2026
[9]

Accurate predictions on small data with a tabular foundation model

Hollmann N, Müller S, Purucker L, Krishnakumar A, Körfer M, Hoo SB, et al. Accurate predictions on small data with a tabular foundation model. Nature. 2025;637(8045):319-26

2025
[10]

A closer look at TabPFN v2: Understanding its strengths and extending its capabilities

Ye HJ, Liu SY , Chao WL. A closer look at TabPFN v2: Understanding its strengths and extending its capabilities. arXiv preprint arXiv:250217361. 2025. 19

2025
[11]

Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emer- gency and Critical Care

Brima Y , Atemkeng M. Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emer- gency and Critical Care. arXiv preprint arXiv:251221602. 2025

2025
[12]

TRIPOD+ AI statement: up- dated guidance for reporting clinical prediction models that use regression or machine learning methods

Collins GS, Moons KG, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+ AI statement: up- dated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj. 2024;385

2024
[13]

health surveys (various)[Datasets]

Demographic I. health surveys (various)[Datasets]. Funded by USAID. 2017:2014-8

2017
[14]

Using Datasets for Analysis; 2026

The DHS Program. Using Datasets for Analysis; 2026. Accessed February 19, 2026.https:// dhsprogram.com/data/using-datasets-for-analysis.cfm

2026
[15]

Anaemia, children aged 6–59 months with haemoglobin concen- tration less than 110 g/L, adjusted for altitude; n.d

World Health Organization. Anaemia, children aged 6–59 months with haemoglobin concen- tration less than 110 g/L, adjusted for altitude; n.d. Accessed 2026-02-25. Available from: https://www.who.int/data/gho/indicator-metadata-registry/imr-details/ number-of-children-aged-6-59-months-with-anaemia

2026
[16]

Guideline on Haemoglobin Cutoffs to Define Anaemia in Individuals and Pop- ulations; 2024

World Health Organization. Guideline on Haemoglobin Cutoffs to Define Anaemia in Individuals and Pop- ulations; 2024. Accessed February 25, 2026.https://www.who.int/publications/i/item/ 9789240088542

2024
[17]

Guideline on haemoglobin cutoffs to define anaemia in individuals and populations

Organization WH. Guideline on haemoglobin cutoffs to define anaemia in individuals and populations. World Health Organization; 2024

2024
[18]

Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data

Yimer A, Yesuf HA, Ahmed S, Zemariam AB, Mussa E, Sirage N, et al. Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data. BMC pediatrics. 2025;25(1):311

2025
[19]

Hybrid Machine Learning Model for the Prediction of Anaemia

Said RO, Tunga M. Hybrid Machine Learning Model for the Prediction of Anaemia. Machine Learning with Applications. 2025:100741

2025
[20]

Regression modeling strategies

Nunez E, Steyerberg EW, Nunez J. Regression modeling strategies. Revista Española de Cardiología (English Edition). 2011;64(6):501-7

2011
[21]

Xgboost: A scalable tree boosting system

Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. p. 785-94

2016
[22]

Lightgbm: A highly efficient gradient boosting decision tree

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems. 2017;30

2017
[23]

Optuna: A next-generation hyperparameter optimization frame- work

Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A next-generation hyperparameter optimization frame- work. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining; 2019. p. 2623-31

2019
[24]

Statistical modeling: The two cultures (with comments and a rejoinder by the author)

Breiman L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science. 2001;16(3):199-231

2001
[25]

A unified approach to interpreting model predictions

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30

2017
[26]

shapiq: Shapley Interactions for Machine Learning

Muschalik M, Baniecki H, Fumagalli F, Kolpaczki P, Hammer B, Hüllermeier E. shapiq: Shapley Interactions for Machine Learning. In: Advances in Neural Information Processing Systems. vol. 37; 2024. p. 130324-57. Available from:https://openreview.net/forum?id=knxGmi6SJi

2024
[27]

Interpretable Machine Learning for TabPFN

Rundel D, Kobialka J, von Crailsheim C, Feurer M, Nagler T, Rügamer D. Interpretable Machine Learning for TabPFN. In: Explainable Artificial Intelligence; 2024. p. 465-76. Available from:https://link. springer.com/chapter/10.1007/978-3-031-63797-1_23

work page doi:10.1007/978-3-031-63797-1_23 2024
[28]

External validation of clinical predic- tion models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges

Riley RD, Ensor J, Snell KI, Debray TP, Altman DG, Moons KG, et al. External validation of clinical predic- tion models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. bmj. 2016;353. 20

2016
[29]

Shortcut learning in deep neural networks

Geirhos R, Jacobsen JH, Michaelis C, Zemel R, Brendel W, Bethge M, et al. Shortcut learning in deep neural networks. Nature Machine Intelligence. 2020;2(11):665-73

2020
[30]

Anemia” and “no Anemia

Ong Ly C, Unnikrishnan B, Tadic T, Patel T, Duhamel J, Kandel S, et al. Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data. NPJ digital medicine. 2024;7(1):124. 21 Appendix A. Preprocessing, modelling and evaluation Appendix A.1. Preprocessing pipeline Cohort derivation Records extra...

work page arXiv 2024

[1] [1]

Anaemia; 2025

World Health Organization. Anaemia; 2025. Fact sheet, updated 10 February 2025.https://www.who. int/news-room/fact-sheets/detail/anaemia

2025

[2] [2]

Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: findings from the Global Burden of Disease Study 2021

Gardner WM, Razo C, McHugh TA, Hagins H, Vilchis-Tella VM, Hennessy C, et al. Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: findings from the Global Burden of Disease Study 2021. The Lancet Haematology. 2023;10(9):e713-34

1990

[3] [3]

Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016

Tesfaye SH, Seboka BT, Sisay D. Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016. Plos one. 2024;19(4):e0300172

2016

[4] [4]

Dataset Types; 2026

The DHS Program. Dataset Types; 2026. Accessed February 19, 2026.https://dhsprogram.com/ data/dataset-types.cfm

2026

[5] [5]

Machine learning algorithms to predict the childhood anemia in Bangladesh

Khan JR, Chowdhury S, Islam H, Raheem E. Machine learning algorithms to predict the childhood anemia in Bangladesh. Journal of Data Science. 2019;17(1):195-218

2019

[6] [6]

Optimizing Predictive Analytics for Childhood Anaemia: A Machine Learning Model Approach

Das B, Barman MP, Kotoky MJ. Optimizing Predictive Analytics for Childhood Anaemia: A Machine Learning Model Approach. Clinical Epidemiology and Global Health. 2025:102275

2025

[7] [7]

Predicting Childhood Anaemia in Nigeria: A Machine Learning Approach to Uncover Key Risk Factors

Ja’afar IK, Uthman OA. Predicting Childhood Anaemia in Nigeria: A Machine Learning Approach to Uncover Key Risk Factors. Public Health Challenges. 2025;4(4):e70135

2025

[8] [8]

Predicting childhood anaemia in Ghana with ex- plainable machine learning: A national survey analysis

Hassan YSA, Omar MA, Karikari JK, Ali AS, Ahmed MM. Predicting childhood anaemia in Ghana with ex- plainable machine learning: A national survey analysis. Digital Health. 2026;12:20552076261437179

2026

[9] [9]

Accurate predictions on small data with a tabular foundation model

Hollmann N, Müller S, Purucker L, Krishnakumar A, Körfer M, Hoo SB, et al. Accurate predictions on small data with a tabular foundation model. Nature. 2025;637(8045):319-26

2025

[10] [10]

A closer look at TabPFN v2: Understanding its strengths and extending its capabilities

Ye HJ, Liu SY , Chao WL. A closer look at TabPFN v2: Understanding its strengths and extending its capabilities. arXiv preprint arXiv:250217361. 2025. 19

2025

[11] [11]

Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emer- gency and Critical Care

Brima Y , Atemkeng M. Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emer- gency and Critical Care. arXiv preprint arXiv:251221602. 2025

2025

[12] [12]

TRIPOD+ AI statement: up- dated guidance for reporting clinical prediction models that use regression or machine learning methods

Collins GS, Moons KG, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+ AI statement: up- dated guidance for reporting clinical prediction models that use regression or machine learning methods. bmj. 2024;385

2024

[13] [13]

health surveys (various)[Datasets]

Demographic I. health surveys (various)[Datasets]. Funded by USAID. 2017:2014-8

2017

[14] [14]

Using Datasets for Analysis; 2026

The DHS Program. Using Datasets for Analysis; 2026. Accessed February 19, 2026.https:// dhsprogram.com/data/using-datasets-for-analysis.cfm

2026

[15] [15]

Anaemia, children aged 6–59 months with haemoglobin concen- tration less than 110 g/L, adjusted for altitude; n.d

World Health Organization. Anaemia, children aged 6–59 months with haemoglobin concen- tration less than 110 g/L, adjusted for altitude; n.d. Accessed 2026-02-25. Available from: https://www.who.int/data/gho/indicator-metadata-registry/imr-details/ number-of-children-aged-6-59-months-with-anaemia

2026

[16] [16]

Guideline on Haemoglobin Cutoffs to Define Anaemia in Individuals and Pop- ulations; 2024

World Health Organization. Guideline on Haemoglobin Cutoffs to Define Anaemia in Individuals and Pop- ulations; 2024. Accessed February 25, 2026.https://www.who.int/publications/i/item/ 9789240088542

2024

[17] [17]

Guideline on haemoglobin cutoffs to define anaemia in individuals and populations

Organization WH. Guideline on haemoglobin cutoffs to define anaemia in individuals and populations. World Health Organization; 2024

2024

[18] [18]

Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data

Yimer A, Yesuf HA, Ahmed S, Zemariam AB, Mussa E, Sirage N, et al. Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data. BMC pediatrics. 2025;25(1):311

2025

[19] [19]

Hybrid Machine Learning Model for the Prediction of Anaemia

Said RO, Tunga M. Hybrid Machine Learning Model for the Prediction of Anaemia. Machine Learning with Applications. 2025:100741

2025

[20] [20]

Regression modeling strategies

Nunez E, Steyerberg EW, Nunez J. Regression modeling strategies. Revista Española de Cardiología (English Edition). 2011;64(6):501-7

2011

[21] [21]

Xgboost: A scalable tree boosting system

Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. p. 785-94

2016

[22] [22]

Lightgbm: A highly efficient gradient boosting decision tree

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems. 2017;30

2017

[23] [23]

Optuna: A next-generation hyperparameter optimization frame- work

Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A next-generation hyperparameter optimization frame- work. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining; 2019. p. 2623-31

2019

[24] [24]

Statistical modeling: The two cultures (with comments and a rejoinder by the author)

Breiman L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science. 2001;16(3):199-231

2001

[25] [25]

A unified approach to interpreting model predictions

Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30

2017

[26] [26]

shapiq: Shapley Interactions for Machine Learning

Muschalik M, Baniecki H, Fumagalli F, Kolpaczki P, Hammer B, Hüllermeier E. shapiq: Shapley Interactions for Machine Learning. In: Advances in Neural Information Processing Systems. vol. 37; 2024. p. 130324-57. Available from:https://openreview.net/forum?id=knxGmi6SJi

2024

[27] [27]

Interpretable Machine Learning for TabPFN

Rundel D, Kobialka J, von Crailsheim C, Feurer M, Nagler T, Rügamer D. Interpretable Machine Learning for TabPFN. In: Explainable Artificial Intelligence; 2024. p. 465-76. Available from:https://link. springer.com/chapter/10.1007/978-3-031-63797-1_23

work page doi:10.1007/978-3-031-63797-1_23 2024

[28] [28]

External validation of clinical predic- tion models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges

Riley RD, Ensor J, Snell KI, Debray TP, Altman DG, Moons KG, et al. External validation of clinical predic- tion models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. bmj. 2016;353. 20

2016

[29] [29]

Shortcut learning in deep neural networks

Geirhos R, Jacobsen JH, Michaelis C, Zemel R, Brendel W, Bethge M, et al. Shortcut learning in deep neural networks. Nature Machine Intelligence. 2020;2(11):665-73

2020

[30] [30]

Anemia” and “no Anemia

Ong Ly C, Unnikrishnan B, Tadic T, Patel T, Duhamel J, Kandel S, et al. Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data. NPJ digital medicine. 2024;7(1):124. 21 Appendix A. Preprocessing, modelling and evaluation Appendix A.1. Preprocessing pipeline Cohort derivation Records extra...

work page arXiv 2024