Context-Conditioned Generative Models Enable Subnational Refinement of Sparse Humanitarian Surveys

Daniela Paolotti; Duccio Piovani; Federica Sibilla; Kyriacos Koupparis; Kyriaki Kalimeri; Rossano Schifanella; Vasiliki Voukelatou

arxiv: 2605.31489 · v2 · pith:V6T7OMKXnew · submitted 2026-05-29 · 💻 cs.CY

Context-Conditioned Generative Models Enable Subnational Refinement of Sparse Humanitarian Surveys

Federica Sibilla , Vasiliki Voukelatou , Duccio Piovani , Kyriacos Koupparis , Daniela Paolotti , Rossano Schifanella , Kyriaki Kalimeri This is my paper

Pith reviewed 2026-06-28 20:29 UTC · model grok-4.3

classification 💻 cs.CY

keywords normalizing flowsgenerative modelshumanitarian surveysdata scarcitysub-national estimatescontext conditioningsurvey augmentation

0 comments

The pith

Context-conditioned normalizing flows refine sub-national distributions from sparse humanitarian surveys as conditioning information grows richer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests normalizing flows, a type of generative model, conditioned on exogenous contextual features to improve sub-national estimates drawn from very limited household survey samples. Experiments across eight datasets from six low- and middle-income countries show that the models produce more accurate fine-scale distributions under severe data scarcity, with accuracy rising as more contextual covariates are supplied. A reader would care because humanitarian decisions on aid and resource allocation often depend on sub-national detail that sparse surveys alone cannot supply. The central principle is that such augmentation succeeds when the original sample still covers the population and the covariates reflect genuine local differences. By modeling full conditional distributions instead of single-point predictions, the approach yields richer evidence than standard imputation methods.

Core claim

Across eight household survey datasets spanning six low-income or middle-income countries, context-conditioned generative models can refine sub-national survey distributions under severe data scarcity, and performance increases systematically with the richness of the conditioning information. These findings support a general principle for survey data augmentation: generative models can improve sub-national estimates when the sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity. By learning full conditional distributions rather than point estimates, the approach provides fine-grained evidence for humanitarian decision-making and resource alloc

What carries the argument

Context-conditioned normalizing flows that learn the full conditional distribution of survey variables given exogenous contextual features.

If this is right

Sub-national estimates become usable for targeted humanitarian resource allocation even when raw samples are sparse.
Accuracy gains scale directly with the amount and relevance of supplied contextual information.
Full conditional distributions, rather than point estimates, become available for downstream decision models.
Survey augmentation is feasible only when the original sample still covers the population and covariates track local variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning strategy could be tested on sparse spatial data outside humanitarian surveys, such as public-health or environmental indicators.
If chosen covariates fail to capture heterogeneity, the models risk producing plausible but inaccurate conditional distributions.

Load-bearing premise

The sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity.

What would settle it

On a held-out portion of one of the eight datasets, adding richer contextual conditioning to the normalizing flow produces no reduction in sub-national distribution error measured by metrics such as Wasserstein distance between the generated and true distributions.

Figures

Figures reproduced from arXiv: 2605.31489 by Daniela Paolotti, Duccio Piovani, Federica Sibilla, Kyriacos Koupparis, Kyriaki Kalimeri, Rossano Schifanella, Vasiliki Voukelatou.

**Figure 7.** Figure 7: Feature attribution via Shapley values To assess the contribution of individual contextual variables to the improvement achieved by the fully context-informed cNF model, we compute Shapley values (44) for the performance gain Δ. In this analysis, Δ measures the added value of exogenous geospatial context relative to the NF+sector model, rather than relative to the oversampling baseline. We defineΔ = 𝐸𝑟𝑟NF+… view at source ↗

read the original abstract

Data scarcity limits inference in many scientific and policy domains. Survey data are essential for decision-making, but sparse samples often fail to capture fine spatial granularities. We evaluate normalizing flows, a generative model that learns complex data distributions and can be conditioned on exogenous contextual features, in controlled data scarcity scenarios. Across eight household survey datasets spanning six low-income or middle-income countries in the humanitarian domain, we show that context-conditioned generative models can refine sub-national survey distributions under severe data scarcity, and that performance increases systematically with the richness of the conditioning information. These findings support a general principle for survey data augmentation: generative models can improve sub-national estimates when the sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity. By learning full conditional distributions rather than point estimates, the approach provides fine-grained evidence for humanitarian decision-making and resource allocation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper evaluates context-conditioned normalizing flows on eight humanitarian datasets and shows gains from richer context, but leaves the key 'sufficient support' condition unquantified.

read the letter

The new piece here is an empirical check of normalizing flows for sub-national refinement of sparse household surveys in six low- and middle-income countries. They run controlled scarcity experiments across eight real datasets and report that performance scales with how much contextual information is supplied. That matches the practical need in humanitarian work where samples are thin but auxiliary data like satellite or census layers are available.

The choice to model full conditional distributions rather than point estimates is the right one for downstream allocation decisions. It avoids the usual problem of over-confident means when the underlying distribution is multimodal or skewed.

The main gap is that the central claim still rests on the sparse sample having 'sufficient support' without any operational definition or test. No minimum observations per stratum, no coverage metric, and no sensitivity runs that deliberately push the sample below that threshold. As a result the reported systematic improvement with conditioning richness could be limited to cases where the original data already roughly match the conditional distribution. The abstract also gives no numbers, no baselines, and no error bars, so the size of the gains is impossible to judge from the summary alone.

This is useful reading for applied researchers who already work with DHS-style surveys and want to try generative augmentation. It is less useful for methodologists looking for a crisp test of when these models actually extrapolate versus interpolate. The experiments are on external data and the framing is straightforward, so the paper clears the bar for a serious referee even though the support condition needs tightening and the results section needs quantitative detail.

Referee Report

1 major / 1 minor

Summary. The manuscript evaluates context-conditioned normalizing flows as a generative approach to refine sub-national distributions from sparse humanitarian household surveys. Across eight datasets spanning six low- and middle-income countries, it reports that performance improves systematically with richer conditioning information and claims this supports a general principle for survey augmentation when the sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity.

Significance. If the empirical results hold under the required sensitivity checks, the work would provide a practical method for increasing spatial granularity in data-scarce humanitarian settings, moving beyond point estimates to full conditional distributions. The multi-country evaluation on real survey data is a positive feature that grounds the general principle in diverse contexts.

major comments (1)

[Abstract] Abstract: The central claim is explicitly conditioned on the sparse sample 'retain[ing] sufficient support,' yet no quantitative criterion is supplied (minimum observations per stratum, effective sample size after conditioning, or coverage of the target support) and the controlled scarcity experiments contain no sensitivity analysis that varies this support level. This renders the reported systematic gains with conditioning richness non-operational and prevents distinguishing true extrapolation from reproduction of already-captured conditional structure.

minor comments (1)

[Abstract] Abstract: No quantitative metrics, baselines, error bars, or effect sizes are reported despite the claim of results 'across eight datasets,' making the magnitude and robustness of the improvements impossible to assess from the summary alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim is explicitly conditioned on the sparse sample 'retain[ing] sufficient support,' yet no quantitative criterion is supplied (minimum observations per stratum, effective sample size after conditioning, or coverage of the target support) and the controlled scarcity experiments contain no sensitivity analysis that varies this support level. This renders the reported systematic gains with conditioning richness non-operational and prevents distinguishing true extrapolation from reproduction of already-captured conditional structure.

Authors: We agree that the absence of an explicit quantitative criterion for 'sufficient support' limits the operational value of the central claim and that sensitivity analyses varying support levels would strengthen the distinction between extrapolation and reproduction of captured structure. The current manuscript motivates the condition qualitatively in the methods and discussion but does not supply a numeric threshold or perform the requested sensitivity checks. In the revision we will define 'sufficient support' via a concrete metric (e.g., minimum effective sample size per sub-national stratum after conditioning) and add controlled-scarcity experiments that systematically vary this threshold, reporting performance as a function of support level. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external datasets with independent performance metrics

full rationale

The paper reports an empirical application of normalizing flows conditioned on contextual covariates, evaluated across eight real household survey datasets from six countries. Claims about refinement under scarcity and systematic improvement with conditioning richness are presented as outcomes of controlled experiments on held-out data, not as derivations that reduce to fitted parameters or self-referential definitions. The stated caveat ('when the sparse sample retains sufficient support') functions as a scope condition rather than a load-bearing premise that is itself derived from the model. No self-citation chains, ansatzes smuggled via prior work, or renamings of known results appear in the provided text; the central results remain falsifiable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the work relies on standard normalizing flows and survey covariates whose relevance is assumed.

pith-pipeline@v0.9.1-grok · 5704 in / 974 out tokens · 23526 ms · 2026-06-28T20:29:33.711251+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 20 canonical work pages · 3 internal anchors

[1]

Elbers, J

C. Elbers, J. O. Lanjouw, P. Lanjouw, Micro-Level Estimation of Poverty and Inequality. Econometrica71(1), 355–364 (2003), doi:10.1111/1468-0262.00399

work page doi:10.1111/1468-0262.00399 2003
[2]

J. Wakefield,et al., Estimating under-five mortality in space and time in a developing world context.Statistical Methods in Medical Research28(9), 2614–2634 (2019), doi:10.1177/ 0962280218767988

2019
[3]

R. E. Fay III, R. A. Herriot, Estimates of income for small places: an application of James-Stein procedures to census data.Journal of the American Statistical Association74(366a), 269–277 (1979)

1979
[4]

G. E. Battese, R. M. Harter, W. A. Fuller, An error-components model for prediction of county crop areas using survey and satellite data.Journal of the American Statistical Association 83(401), 28–36 (1988)

1988
[5]

manuals.wfp.org/docs/food-security-assessments, published 2025-08-12, updated 2025-10-10, accessed 2026-04-01

World Food Programme, Food Security Assessments (2025),https://vamresources. manuals.wfp.org/docs/food-security-assessments, published 2025-08-12, updated 2025-10-10, accessed 2026-04-01

2025
[6]

National Bureau of Statistics (Nigeria), United Nations Children’s Fund (UNICEF),Nigeria Multiple Indicator Cluster Survey and National Immunization Coverage Survey 2021, Survey findings report, United Nations Children’s Fund (UNICEF), New York, USA (2022),https: //l1nq.com/f9hy44k

2021
[7]

Bourou, A

S. Bourou, A. El Saer, T. H. Velivassaki, A. Voulkidis, T. Zahariadis, A Review of Tabular Data Synthesis Using GANs on an IDS Dataset.Information12(9), 375 (2021), doi:10.3390/ info12090375,https://www.mdpi.com/2078-2489/12/9/375

2021
[8]

Wang,et al., A Comprehensive Survey on Data Augmentation.arXiv preprint arXiv:2405.09591(2024), doi:10.48550/arXiv.2405.09591,https://arxiv.org/abs/ 2405.09591

Z. Wang,et al., A Comprehensive Survey on Data Augmentation.arXiv preprint arXiv:2405.09591(2024), doi:10.48550/arXiv.2405.09591,https://arxiv.org/abs/ 2405.09591. 31

work page doi:10.48550/arxiv.2405.09591 2024
[9]

L. Xu, M. Skoularidou, A. Cuesta-Infante, K. Veeramachaneni, Modeling Tabular Data using Conditional GAN.arXiv preprint arXiv:1907.00503(2019), doi:10.48550/arXiv.1907.00503, https://arxiv.org/abs/1907.00503

work page doi:10.48550/arxiv.1907.00503 1907
[10]

A. X. Wang, B. P. Nguyen, TTV AE: Transformer-based generative modeling for tabular data generation.Artificial Intelligence340(C), 104292 (2025), doi:10.1016/j.artint.2025.104292, https://doi.org/10.1016/j.artint.2025.104292

work page doi:10.1016/j.artint.2025.104292 2025
[11]

D. J. Rezende, S. Mohamed, Variational Inference with Normalizing Flows, inProceedings of the 32nd International Conference on Machine Learning (ICML), F. Bach, D. Blei, Eds. (PMLR), vol. 37 ofProceedings of Machine Learning Research(2015), pp. 1530–1538,https: //proceedings.mlr.press/v37/rezende15.html

2015
[12]

L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using Real-NVP, inInternational Conference on Learning Representations (ICLR)(2017)

2017
[13]

Durkan, A

C. Durkan, A. Bekasov, I. Murray, G. Papamakarios, Neural spline flows.Advances in Neural Information Processing Systems32(2019)

2019
[14]

Jiang, S

Y. Jiang, S. Liang, J. Choi, Synthetic Survey Data Generation and Evaluation, inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’25 (ACM) (2025), pp. 2292–2302, doi:10.1145/3690624.3709421,https://dl.acm.org/doi/ 10.1145/3690624.3709421

work page doi:10.1145/3690624.3709421 2025
[15]

T. Liu, Z. Qian, J. Berrevoets, M. van der Schaar, GOGGLE: Generative Modelling for Tabular Data by Learning Relational Structure, inProceedings of the International Confer- ence on Learning Representations (ICLR)(2023),https://openreview.net/forum?id= goggle-iclr2023, poster Presentation

2023
[16]

S. Y. Lim, H. Yun, P. Bansal, D.-K. Kim, E.-J. Kim, A Large Language Model for Feasi- ble and Diverse Population Synthesis.arXiv preprint arXiv:2505.04196(2025), submitted to Transportation Research Part C: Emerging Technologies, doi:10.48550/arXiv.2505.04196, https://doi.org/10.48550/arXiv.2505.04196. 32

work page doi:10.48550/arxiv.2505.04196 2025
[17]

Johnsen, O

M. Johnsen, O. Brandt, S. Garrido, F. Pereira, Population synthesis for urban resident modeling using deep generative models.Neural Computing and Applications34, 4677–4692 (2022), doi: 10.1007/s00521-021-06634-7,https://doi.org/10.1007/s00521-021-06634-7

work page doi:10.1007/s00521-021-06634-7 2022
[18]

Tanton, K

R. Tanton, K. Edwards,Spatial Microsimulation: A Reference Guide for Users, vol. 6 (Springer Science & Business Media) (2012)

2012
[19]

Liu,et al., Synthetic Data Generation for Augmenting Small Samples.arXiv preprint arXiv:2501.18741(2025)

D. Liu,et al., Synthetic Data Generation for Augmenting Small Samples.arXiv preprint arXiv:2501.18741(2025)

work page arXiv 2025
[20]

Manousakas, S

D. Manousakas, S. Ayd ¨ore, On the Usefulness of Synthetic Tabular Data Generation.arXiv preprint arXiv:2306.15636(2023), data-centric Machine Learning Research (DMLR) Work- shop at the 40th International Conference on Machine Learning (ICML),https://doi.org/ 10.48550/arXiv.2306.15636

work page doi:10.48550/arxiv.2306.15636 2023
[21]

URL https: //doi.org/10.1038/s41586-024-07566-y

I. Shumailov,et al., AI models collapse when trained on recursively generated data.Nature 631, 755–759 (2024), doi:10.1038/s41586-024-07566-y

work page doi:10.1038/s41586-024-07566-y 2024
[22]

arXiv preprint arXiv:2205.03257 , year=

J. Jordon,et al.,Synthetic Data – what, why and how?, Tech. rep., The Royal Society (2022), https://arxiv.org/abs/2205.03257, arXiv:2205.03257

work page arXiv 2022
[23]

Materials and methods are available as supplementary material
[24]

Jean,et al., Combining satellite imagery and machine learning to predict poverty.Science 353(6301), 790–794 (2016), doi:10.1126/science.aaf7894

N. Jean,et al., Combining satellite imagery and machine learning to predict poverty.Science 353(6301), 790–794 (2016), doi:10.1126/science.aaf7894

work page doi:10.1126/science.aaf7894 2016
[25]

Yeh,et al., Using publicly available satellite imagery and deep learning to understand economic well-being in Africa.Nature Communications11(1), 2583 (2020)

C. Yeh,et al., Using publicly available satellite imagery and deep learning to understand economic well-being in Africa.Nature Communications11(1), 2583 (2020)

2020
[26]

G. Chi, H. Fang, S. Chatterjee, J. E. Blumenstock, Microestimates of wealth for all low- and middle-income countries.Proceedings of the National Academy of Sciences119(3), e2113658119 (2022)

2022
[27]

Voukelatou,et al., Predicting risk of inadequate micronutrient intake with transferable ma- chine learning models.Scientific Reports16, 4104 (2026), doi:10.1038/s41598-025-26179-7

V. Voukelatou,et al., Predicting risk of inadequate micronutrient intake with transferable ma- chine learning models.Scientific Reports16, 4104 (2026), doi:10.1038/s41598-025-26179-7. 33

work page doi:10.1038/s41598-025-26179-7 2026
[28]

Continental-scale assessment of spatial food market accessibility in Africa using open geospatial data

R. Benassai-Dalmau,et al., Unequal journeys to food markets: Continental-scale evidence from open data in Africa.arXiv preprint arXiv:2505.07913(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Huang, L

S. Huang, L. Tang, J. P. Hupy, Y. Wang, G. Shao, A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing.Journal of forestry research32(1), 1–6 (2021)

2021
[30]

Winkler, D

C. Winkler, D. E. Worrall, E. Hoogeboom, M. Welling, Learning Likelihoods with Conditional Normalizing Flows.arXiv preprint arXiv:1912.00042(2019), doi:10.48550/arXiv.1912.00042, https://arxiv.org/abs/1912.00042

work page doi:10.48550/arxiv.1912.00042 1912
[31]

M. S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, S. Gelly, Assessing Generative Models via Precision and Recall.arXiv preprint arXiv:1806.00035(2018), neurIPS 2018,https: //doi.org/10.48550/arXiv.1806.00035

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1806.00035 2018
[32]

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log- likelihood function.Journal of Statistical Planning and Inference90(2), 227–244 (2000), doi:10.1016/S0378-3758(00)00115-4

work page doi:10.1016/s0378-3758(00)00115-4 2000
[33]

General Geospatial Inference with a Population Dynamics Foundation Model

M. Agarwal,et al., General Geospatial Inference with a Population Dynamics Foundation Model.arXiv preprint arXiv:2411.07207(2024),https://arxiv.org/abs/2411.07207

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

rep., World Food Programme (WFP) (n.d.),https://docs.wfp.org/api/documents/WFP-0000168485/ download/, accessed: 2026-02-05

World Food Programme,WFP Document WFP-0000168485, Tech. rep., World Food Programme (WFP) (n.d.),https://docs.wfp.org/api/documents/WFP-0000168485/ download/, accessed: 2026-02-05

2026
[35]

Central Statistics Agency of Ethiopia, Ethiopia - Socioeconomic Survey 2018-2019 (ESS 2018/19) (2020),https://microdata.worldbank.org/index.php/catalog/ 3823, world Bank Microdata Library

2018
[36]

K. Tang,et al., Modeling food fortification contributions to micronutrient requirements in Malawi using Household Consumption and Expenditure Surveys.Annals of the New York Academy of Sciences1508(1), 105–122 (2022). 34

2022
[37]

N. B. of Statistics (Nigeria), Nigeria - Living Standards Survey 2018-2019 (NLSS 2018/19) (2021),https://microdata.worldbank.org/index.php/catalog/3827, world Bank Microdata Library

2018
[38]

Department of Census and Statistics (Sri Lanka), Sri Lanka - Household Income and Expen- diture Survey 2019 (HIES 2019) (2023),https://catalog.ihsn.org/catalog/11323, iHSN Survey Catalog (World Bank Microdata ecosystem)

2019
[39]

World Food Programme, WFP Vulnerability Analysis and Mapping (V AM) – Sri Lanka (data portal) (2021),https://dataviz.vam.wfp.org/asia-and-the-pacific/sri-lanka/ overview

2021
[40]

World Food Programme, The World Food Programme’s Real-Time Monitoring Sys- tems: Approaches and Methodologies,https://executiveboard.wfp.org/document_ download/WFP-135070(2021), accessed 2026-04-01

2021
[41]

World Food Programme, Annual country reports – Mozambique – 2023, MZ02 (2023),https: //www.wfp.org/publications/annual-country-reports-mozambique, access the 2023 Mozambique Annual Country Report (operation MZ02) via the “View” entry on this page

2023
[42]

Zimbabwe National Statistics Agency (ZIMSTAT), United Nations Children’s Fund (UNICEF),Zimbabwe 2019 Multiple Indicator Cluster Survey: Survey Find- ings Report, Survey findings report, Zimbabwe National Statistics Agency and UNICEF, Harare, Zimbabwe (2020),https://www.unicef.org/zimbabwe/reports/ zimbabwe-2019-mics-survey-findings-report

2019
[43]

Hushchyn, probaforms: Synthetic data generation for tables (2023),https://pypi.org/ project/probaforms/, mIT License

M. Hushchyn, probaforms: Synthetic data generation for tables (2023),https://pypi.org/ project/probaforms/, mIT License. Release date: 2023-07-26

2023
[44]

S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions.Advances in neural information processing systems30(2017). 35

2017
[45]

F. Sibilla, Code and data for: Context-conditioned generative models enable sub-national refinement of sparse humanitarian survey data,https://github.com/federicasibilla/ cNF_HS(2025), accessed: 2025

2025
[46]

Patki, R

N. Patki, R. Wedge, K. Veeramachaneni, The Synthetic Data Vault, in2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)(IEEE) (2016), pp. 399–410, doi:10.1109/DSAA.2016.49

work page doi:10.1109/dsaa.2016.49 2016
[47]

SDV Developers, SDV: Synthetic Data Vault,https://github.com/sdv-dev/SDV(2023), accessed: 2025-10-01

2023
[48]

Stoian, E

M.-D. Stoian, E. Giunchiglia, T. Lukasiewicz, A Survey on Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, and Beyond.arXiv preprint arXiv:2503.05954(2025),https: //arxiv.org/abs/2503.05954. Acknowledgments We would like to thank Frances Knight for the support and feedback, Jonas De Meyer for the support on extracting and interpreting the m...

work page arXiv 2025

[1] [1]

Elbers, J

C. Elbers, J. O. Lanjouw, P. Lanjouw, Micro-Level Estimation of Poverty and Inequality. Econometrica71(1), 355–364 (2003), doi:10.1111/1468-0262.00399

work page doi:10.1111/1468-0262.00399 2003

[2] [2]

J. Wakefield,et al., Estimating under-five mortality in space and time in a developing world context.Statistical Methods in Medical Research28(9), 2614–2634 (2019), doi:10.1177/ 0962280218767988

2019

[3] [3]

R. E. Fay III, R. A. Herriot, Estimates of income for small places: an application of James-Stein procedures to census data.Journal of the American Statistical Association74(366a), 269–277 (1979)

1979

[4] [4]

G. E. Battese, R. M. Harter, W. A. Fuller, An error-components model for prediction of county crop areas using survey and satellite data.Journal of the American Statistical Association 83(401), 28–36 (1988)

1988

[5] [5]

manuals.wfp.org/docs/food-security-assessments, published 2025-08-12, updated 2025-10-10, accessed 2026-04-01

World Food Programme, Food Security Assessments (2025),https://vamresources. manuals.wfp.org/docs/food-security-assessments, published 2025-08-12, updated 2025-10-10, accessed 2026-04-01

2025

[6] [6]

National Bureau of Statistics (Nigeria), United Nations Children’s Fund (UNICEF),Nigeria Multiple Indicator Cluster Survey and National Immunization Coverage Survey 2021, Survey findings report, United Nations Children’s Fund (UNICEF), New York, USA (2022),https: //l1nq.com/f9hy44k

2021

[7] [7]

Bourou, A

S. Bourou, A. El Saer, T. H. Velivassaki, A. Voulkidis, T. Zahariadis, A Review of Tabular Data Synthesis Using GANs on an IDS Dataset.Information12(9), 375 (2021), doi:10.3390/ info12090375,https://www.mdpi.com/2078-2489/12/9/375

2021

[8] [8]

Wang,et al., A Comprehensive Survey on Data Augmentation.arXiv preprint arXiv:2405.09591(2024), doi:10.48550/arXiv.2405.09591,https://arxiv.org/abs/ 2405.09591

Z. Wang,et al., A Comprehensive Survey on Data Augmentation.arXiv preprint arXiv:2405.09591(2024), doi:10.48550/arXiv.2405.09591,https://arxiv.org/abs/ 2405.09591. 31

work page doi:10.48550/arxiv.2405.09591 2024

[9] [9]

L. Xu, M. Skoularidou, A. Cuesta-Infante, K. Veeramachaneni, Modeling Tabular Data using Conditional GAN.arXiv preprint arXiv:1907.00503(2019), doi:10.48550/arXiv.1907.00503, https://arxiv.org/abs/1907.00503

work page doi:10.48550/arxiv.1907.00503 1907

[10] [10]

A. X. Wang, B. P. Nguyen, TTV AE: Transformer-based generative modeling for tabular data generation.Artificial Intelligence340(C), 104292 (2025), doi:10.1016/j.artint.2025.104292, https://doi.org/10.1016/j.artint.2025.104292

work page doi:10.1016/j.artint.2025.104292 2025

[11] [11]

D. J. Rezende, S. Mohamed, Variational Inference with Normalizing Flows, inProceedings of the 32nd International Conference on Machine Learning (ICML), F. Bach, D. Blei, Eds. (PMLR), vol. 37 ofProceedings of Machine Learning Research(2015), pp. 1530–1538,https: //proceedings.mlr.press/v37/rezende15.html

2015

[12] [12]

L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using Real-NVP, inInternational Conference on Learning Representations (ICLR)(2017)

2017

[13] [13]

Durkan, A

C. Durkan, A. Bekasov, I. Murray, G. Papamakarios, Neural spline flows.Advances in Neural Information Processing Systems32(2019)

2019

[14] [14]

Jiang, S

Y. Jiang, S. Liang, J. Choi, Synthetic Survey Data Generation and Evaluation, inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’25 (ACM) (2025), pp. 2292–2302, doi:10.1145/3690624.3709421,https://dl.acm.org/doi/ 10.1145/3690624.3709421

work page doi:10.1145/3690624.3709421 2025

[15] [15]

T. Liu, Z. Qian, J. Berrevoets, M. van der Schaar, GOGGLE: Generative Modelling for Tabular Data by Learning Relational Structure, inProceedings of the International Confer- ence on Learning Representations (ICLR)(2023),https://openreview.net/forum?id= goggle-iclr2023, poster Presentation

2023

[16] [16]

S. Y. Lim, H. Yun, P. Bansal, D.-K. Kim, E.-J. Kim, A Large Language Model for Feasi- ble and Diverse Population Synthesis.arXiv preprint arXiv:2505.04196(2025), submitted to Transportation Research Part C: Emerging Technologies, doi:10.48550/arXiv.2505.04196, https://doi.org/10.48550/arXiv.2505.04196. 32

work page doi:10.48550/arxiv.2505.04196 2025

[17] [17]

Johnsen, O

M. Johnsen, O. Brandt, S. Garrido, F. Pereira, Population synthesis for urban resident modeling using deep generative models.Neural Computing and Applications34, 4677–4692 (2022), doi: 10.1007/s00521-021-06634-7,https://doi.org/10.1007/s00521-021-06634-7

work page doi:10.1007/s00521-021-06634-7 2022

[18] [18]

Tanton, K

R. Tanton, K. Edwards,Spatial Microsimulation: A Reference Guide for Users, vol. 6 (Springer Science & Business Media) (2012)

2012

[19] [19]

Liu,et al., Synthetic Data Generation for Augmenting Small Samples.arXiv preprint arXiv:2501.18741(2025)

D. Liu,et al., Synthetic Data Generation for Augmenting Small Samples.arXiv preprint arXiv:2501.18741(2025)

work page arXiv 2025

[20] [20]

Manousakas, S

D. Manousakas, S. Ayd ¨ore, On the Usefulness of Synthetic Tabular Data Generation.arXiv preprint arXiv:2306.15636(2023), data-centric Machine Learning Research (DMLR) Work- shop at the 40th International Conference on Machine Learning (ICML),https://doi.org/ 10.48550/arXiv.2306.15636

work page doi:10.48550/arxiv.2306.15636 2023

[21] [21]

URL https: //doi.org/10.1038/s41586-024-07566-y

I. Shumailov,et al., AI models collapse when trained on recursively generated data.Nature 631, 755–759 (2024), doi:10.1038/s41586-024-07566-y

work page doi:10.1038/s41586-024-07566-y 2024

[22] [22]

arXiv preprint arXiv:2205.03257 , year=

J. Jordon,et al.,Synthetic Data – what, why and how?, Tech. rep., The Royal Society (2022), https://arxiv.org/abs/2205.03257, arXiv:2205.03257

work page arXiv 2022

[23] [23]

Materials and methods are available as supplementary material

[24] [24]

Jean,et al., Combining satellite imagery and machine learning to predict poverty.Science 353(6301), 790–794 (2016), doi:10.1126/science.aaf7894

N. Jean,et al., Combining satellite imagery and machine learning to predict poverty.Science 353(6301), 790–794 (2016), doi:10.1126/science.aaf7894

work page doi:10.1126/science.aaf7894 2016

[25] [25]

Yeh,et al., Using publicly available satellite imagery and deep learning to understand economic well-being in Africa.Nature Communications11(1), 2583 (2020)

C. Yeh,et al., Using publicly available satellite imagery and deep learning to understand economic well-being in Africa.Nature Communications11(1), 2583 (2020)

2020

[26] [26]

G. Chi, H. Fang, S. Chatterjee, J. E. Blumenstock, Microestimates of wealth for all low- and middle-income countries.Proceedings of the National Academy of Sciences119(3), e2113658119 (2022)

2022

[27] [27]

Voukelatou,et al., Predicting risk of inadequate micronutrient intake with transferable ma- chine learning models.Scientific Reports16, 4104 (2026), doi:10.1038/s41598-025-26179-7

V. Voukelatou,et al., Predicting risk of inadequate micronutrient intake with transferable ma- chine learning models.Scientific Reports16, 4104 (2026), doi:10.1038/s41598-025-26179-7. 33

work page doi:10.1038/s41598-025-26179-7 2026

[28] [28]

Continental-scale assessment of spatial food market accessibility in Africa using open geospatial data

R. Benassai-Dalmau,et al., Unequal journeys to food markets: Continental-scale evidence from open data in Africa.arXiv preprint arXiv:2505.07913(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

Huang, L

S. Huang, L. Tang, J. P. Hupy, Y. Wang, G. Shao, A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing.Journal of forestry research32(1), 1–6 (2021)

2021

[30] [30]

Winkler, D

C. Winkler, D. E. Worrall, E. Hoogeboom, M. Welling, Learning Likelihoods with Conditional Normalizing Flows.arXiv preprint arXiv:1912.00042(2019), doi:10.48550/arXiv.1912.00042, https://arxiv.org/abs/1912.00042

work page doi:10.48550/arxiv.1912.00042 1912

[31] [31]

M. S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, S. Gelly, Assessing Generative Models via Precision and Recall.arXiv preprint arXiv:1806.00035(2018), neurIPS 2018,https: //doi.org/10.48550/arXiv.1806.00035

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1806.00035 2018

[32] [32]

H. Shimodaira, Improving predictive inference under covariate shift by weighting the log- likelihood function.Journal of Statistical Planning and Inference90(2), 227–244 (2000), doi:10.1016/S0378-3758(00)00115-4

work page doi:10.1016/s0378-3758(00)00115-4 2000

[33] [33]

General Geospatial Inference with a Population Dynamics Foundation Model

M. Agarwal,et al., General Geospatial Inference with a Population Dynamics Foundation Model.arXiv preprint arXiv:2411.07207(2024),https://arxiv.org/abs/2411.07207

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

rep., World Food Programme (WFP) (n.d.),https://docs.wfp.org/api/documents/WFP-0000168485/ download/, accessed: 2026-02-05

World Food Programme,WFP Document WFP-0000168485, Tech. rep., World Food Programme (WFP) (n.d.),https://docs.wfp.org/api/documents/WFP-0000168485/ download/, accessed: 2026-02-05

2026

[35] [35]

Central Statistics Agency of Ethiopia, Ethiopia - Socioeconomic Survey 2018-2019 (ESS 2018/19) (2020),https://microdata.worldbank.org/index.php/catalog/ 3823, world Bank Microdata Library

2018

[36] [36]

K. Tang,et al., Modeling food fortification contributions to micronutrient requirements in Malawi using Household Consumption and Expenditure Surveys.Annals of the New York Academy of Sciences1508(1), 105–122 (2022). 34

2022

[37] [37]

N. B. of Statistics (Nigeria), Nigeria - Living Standards Survey 2018-2019 (NLSS 2018/19) (2021),https://microdata.worldbank.org/index.php/catalog/3827, world Bank Microdata Library

2018

[38] [38]

Department of Census and Statistics (Sri Lanka), Sri Lanka - Household Income and Expen- diture Survey 2019 (HIES 2019) (2023),https://catalog.ihsn.org/catalog/11323, iHSN Survey Catalog (World Bank Microdata ecosystem)

2019

[39] [39]

World Food Programme, WFP Vulnerability Analysis and Mapping (V AM) – Sri Lanka (data portal) (2021),https://dataviz.vam.wfp.org/asia-and-the-pacific/sri-lanka/ overview

2021

[40] [40]

World Food Programme, The World Food Programme’s Real-Time Monitoring Sys- tems: Approaches and Methodologies,https://executiveboard.wfp.org/document_ download/WFP-135070(2021), accessed 2026-04-01

2021

[41] [41]

World Food Programme, Annual country reports – Mozambique – 2023, MZ02 (2023),https: //www.wfp.org/publications/annual-country-reports-mozambique, access the 2023 Mozambique Annual Country Report (operation MZ02) via the “View” entry on this page

2023

[42] [42]

Zimbabwe National Statistics Agency (ZIMSTAT), United Nations Children’s Fund (UNICEF),Zimbabwe 2019 Multiple Indicator Cluster Survey: Survey Find- ings Report, Survey findings report, Zimbabwe National Statistics Agency and UNICEF, Harare, Zimbabwe (2020),https://www.unicef.org/zimbabwe/reports/ zimbabwe-2019-mics-survey-findings-report

2019

[43] [43]

Hushchyn, probaforms: Synthetic data generation for tables (2023),https://pypi.org/ project/probaforms/, mIT License

M. Hushchyn, probaforms: Synthetic data generation for tables (2023),https://pypi.org/ project/probaforms/, mIT License. Release date: 2023-07-26

2023

[44] [44]

S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions.Advances in neural information processing systems30(2017). 35

2017

[45] [45]

F. Sibilla, Code and data for: Context-conditioned generative models enable sub-national refinement of sparse humanitarian survey data,https://github.com/federicasibilla/ cNF_HS(2025), accessed: 2025

2025

[46] [46]

Patki, R

N. Patki, R. Wedge, K. Veeramachaneni, The Synthetic Data Vault, in2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)(IEEE) (2016), pp. 399–410, doi:10.1109/DSAA.2016.49

work page doi:10.1109/dsaa.2016.49 2016

[47] [47]

SDV Developers, SDV: Synthetic Data Vault,https://github.com/sdv-dev/SDV(2023), accessed: 2025-10-01

2023

[48] [48]

Stoian, E

M.-D. Stoian, E. Giunchiglia, T. Lukasiewicz, A Survey on Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, and Beyond.arXiv preprint arXiv:2503.05954(2025),https: //arxiv.org/abs/2503.05954. Acknowledgments We would like to thank Frances Knight for the support and feedback, Jonas De Meyer for the support on extracting and interpreting the m...

work page arXiv 2025