Sampling distributions for complex design variance estimators in a Fay-Herriot model

Alana McGovern; Geir-Arne Fuglstad; Jon Wakefield

arxiv: 2604.23029 · v1 · submitted 2026-04-24 · 📊 stat.ME

Sampling distributions for complex design variance estimators in a Fay-Herriot model

Alana McGovern , Geir-Arne Fuglstad , Jon Wakefield This is my paper

Pith reviewed 2026-05-08 10:41 UTC · model grok-4.3

classification 📊 stat.ME

keywords Fay-Herriot modelvariance smoothingsampling distributioncomplex survey designDHSsmall area estimationcredible intervals

0 comments

The pith

Fay-Herriot models with variance smoothing yield better credible intervals when using derived sampling distributions instead of chi-squared for complex survey designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Fay-Herriot models typically assume chi-squared sampling distributions for design variance estimators, but this is valid only under strong assumptions that do not hold for complex designs like the stratified two-stage clustering in DHS surveys. The paper derives two alternative sampling distributions for these estimators, specifying the necessary superpopulation and design assumptions. Simulations reveal that the standard model undercovers while variance smoothing with the new distributions produces superior credible intervals according to proper scoring rules. The simpler distribution performs as well as the more complex one, and the approach is demonstrated on Kenya DHS data for height-for-age z-scores.

Core claim

We derive two sampling distributions under the DHS design while specifying the required superpopulation and design assumptions. The variance smoothing models produce better credible intervals according to proper scoring rules than the standard Fay-Herriot model, which exhibits undercoverage. The simple sampling distribution performs equally as well as the more complex one.

What carries the argument

Two derived sampling distributions (simple and complex) for design variance estimators in stratified two-stage cluster sampling, incorporated into variance smoothing for Fay-Herriot models.

If this is right

Standard Fay-Herriot models show undercoverage of credible intervals under complex designs.
Variance smoothing with the derived distributions improves credible interval performance by proper scoring rules.
The simple derived distribution matches the complex one in effectiveness while being easier to implement.
The method applies to estimating domain-level health indicators such as height-for-age z-scores using DHS data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other complex survey designs may require analogous derivations to avoid relying on invalid chi-squared assumptions.
Adopting variance smoothing could become standard practice in small area estimation for national health surveys with small domains.
The equivalence in performance suggests prioritizing implementability when choosing between the two distributions.
This work underscores the importance of matching statistical assumptions to the actual sampling design in survey data analysis.

Load-bearing premise

The sampling distributions are derived under strong assumptions about the superpopulation and the specific stratified two-stage clustering design used by DHS surveys.

What would settle it

Simulations under the exact DHS sampling design assumptions where the observed distribution of design variance estimators fails to match the derived simple or complex sampling distributions.

Figures

Figures reproduced from arXiv: 2604.23029 by Alana McGovern, Geir-Arne Fuglstad, Jon Wakefield.

**Figure 1.** Figure 1: Number of sampled clusters in each Admin-1 and Admin-2 area in the 2022 Kenya DHS. Admin-2 areas with less than 5 sampled clusters are marked by hatching and areas with no sampled clusters are filled in gray. variances of the weighted estimates are known, as data sparsity makes variance estimates more unstable. There is a vast literature concerning how variance estimates may be adjusted or have their uncer… view at source ↗

**Figure 2.** Figure 2: Design-based mean and standard deviation estimates for HAZ at the Admin-1 and Admin-2 levels using 2022 Kenya DHS. assumptions result in the variance estimator, 𝑉ˆ 𝑖 , having the sampling distribution 𝑉𝑖/df𝑖 × 𝜒 2 df𝑖 , where 𝑉𝑖 is the true design variance and the degrees of freedom (df𝑖) are equal to the sample size minus one. Gao and Wakefield (2023) modify this sampling distribution to account for a str… view at source ↗

**Figure 3.** Figure 3: Average theoretical-to-truth design variance ratio (19) for each model and area, across 100 simulations for each of 5 settings. The gray lines indicate the average ratio across areas and the red line indicates the point where the theoretical variance is equal to the true variance. Values below the red line indicate underestimation. distribution of the design variance estimator and observe that the SASW sam… view at source ↗

**Figure 4.** Figure 4: Estimate-to-truth ratios for within-stratum superpopulation variance (20) and design variance (21) for each area 𝑖 and model across 100 simulations, for each of 5 settings. The gray lines indicate the average ratio across areas, and the red line indicates the point where the estimate is equal to the truth. Values above the red line indicate overestimation. Finally, we evaluate the average interval score, a… view at source ↗

**Figure 5.** Figure 5: RMSE of area-level point estimates and coverage, average width, and average interval score of 90% credible intervals, for each area and model, across 100 simulations, for each of 5 settings. The gray lines indicate the mean across areas. where 𝑿𝑖 is an area-specific auxiliary variable vector further defined below, 𝜷 is a vector of fixed effect regression coefficients, 𝜂 is an intercept term, and 𝒃 and 𝒆 ar… view at source ↗

**Figure 6.** Figure 6: Mean estimates and width of 90% credible intervals for HAZ using 2022 Kenya DHS, under a standard Fay-Herriot model and two types of variance smoothing Fay-Herriot models. In view at source ↗

**Figure 7.** Figure 7: Mean estimates with 90% credible intervals for HAZ using 2022 Kenya DHS, under a standard Fay-Herriot model and two types of variance smoothing Fay-Herriot models. Admin-2 areas which have extremely narrow or wide credible intervals under the standard model are indicated in purple and orange, respectively. 7 Discussion We began by establishing the limitations of assuming the design variance is known in a F… view at source ↗

**Figure 8.** Figure 8: Posterior probabilities of membership in lowest decile and quartile for HAZ using 2022 Kenya DHS, under a standard Fay-Herriot model and two types of variance smoothing Fay-Herriot models. literature, their choices of sampling distribution make strong implicit design and population assumptions. This work thoroughly considers the choice of sampling distribution for the variance estimator in a complex survey… view at source ↗

**Figure 9.** Figure 9: Quantile-quantile plots comparing survey-weighted sampling distribution, on the x-axis, to its Satterthwhaite approximation, on the y-axis, when 𝛾 = 1 and 𝜎 2 = 1, using the sampling weights and sample sizes from 8 Admin-2 areas in the 2022 Kenya DHS view at source ↗

**Figure 10.** Figure 10: Quantile-quantile plots comparing survey-weighted sampling distribution, on the x-axis, to its Satterthwhaite approximation, on the y-axis, when 𝛾 = 2 and 𝜎 2 = 4, using the sampling weights and sample sizes from 8 Admin-2 areas in the 2022 Kenya DHS. 33 view at source ↗

**Figure 11.** Figure 11: RMSE of area-level point estimates and coverage, average width, and average interval score of 90% credible intervals, for each area and model, across 100 simulations. The gray lines indicate the mean across areas and the red line in panel B indicates the nominal rate of the credible intervals. 37 view at source ↗

**Figure 12.** Figure 12: Comparison of simple and SASW sampling distributions to empirical distribution of the design variance estimator. Averages are computed for each area across 100 simulations for each of 5 settings. The red line in panel B indicates the point where the estimate is equal to the truth and values below the red line indicate underestimation. 39 view at source ↗

**Figure 13.** Figure 13: Distribution of urban and rural cluster sample sizes in the 2022 Kenya DHS. view at source ↗

**Figure 14.** Figure 14: Maps of standardized Admin-2 area-level auxiliary variables in Kenya view at source ↗

**Figure 15.** Figure 15: Scatter plots comparing mean and standard deviation estimates of HAZ under each model against the design-based estimates. 43 view at source ↗

read the original abstract

Fay-Herriot (FH) models with variance smoothing typically use chi-squared sampling distributions for the design variance estimators. This choice is only valid under strong assumptions on the population and the sampling design, and the choice of sampling distribution is understudied for complex survey designs such as the stratified two-stage clustering design used by the Demographic and Health Surveys (DHS). DHS conducts surveys in low- and middle-income countries and result in low sample sizes for unplanned domains of interest. Thus, accounting for the uncertainty in the estimated design variances is important. We derive two sampling distributions under the DHS design, a simple and a more complex, while clearly specifying and discussing the required superpopulation and design assumptions. In a simulation study, we compare the two sampling distributions to the empirical sampling distributions, and the resulting FH models with variance smoothing to the standard FH model. We find that the standard model exhibits undercoverage, while the variance smoothing models produce better credible intervals according to proper scoring rules. Interestingly, the simple sampling distribution, which is easiest to implement, performs equally as well as the more complex sampling distribution. We illustrate the proposed models by estimating height-for-age z-scores using the 2022 Kenya DHS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives two sampling distributions for variance estimators under the DHS stratified two-stage design and shows that variance smoothing with either improves FH credible intervals over the standard chi-squared, with the simpler one performing equally well.

read the letter

The main point is that the usual chi-squared sampling distribution for design variance estimators in Fay-Herriot models fails to hold for complex surveys like DHS, and the authors derive two alternatives that fix the undercoverage in the resulting credible intervals. They state the superpopulation and design assumptions explicitly, derive a simple and a more complex distribution for the stratified two-stage clustering, and check both against empirical sampling distributions generated from the same design in simulation. The variance-smoothed models score better on proper scoring rules, and the simple distribution matches the complex one closely enough that the extra work brings no gain. They close with an application to height-for-age z-scores from the 2022 Kenya DHS. This is useful work because it gives a concrete, implementable fix for a known problem in small-area estimation with low domain sample sizes. The simulation tests the modeling choice directly rather than leaving it to theory, and the finding that the easy version suffices is the kind of result practitioners can actually use. The soft spot is the dependence on strong assumptions about the population and the exact sampling design; the paper flags that the standard chi-squared choice is valid only under those same assumptions, so the new distributions are an improvement only when the assumptions match reality. Real DHS data can deviate in ways the simulation may not fully capture, though the setup is at least consistent with the design. This paper is for survey statisticians and applied researchers doing small-area work with complex health surveys. A reader who needs to produce intervals for unplanned domains in DHS-style data will find the derivations and the practical recommendation worth the time. It deserves a serious referee because the derivations are new for this design, the simulation provides an external check, and the claims are narrow enough to be evaluated on their own terms. I would send it to peer review.

Referee Report

1 major / 3 minor

Summary. The paper derives two sampling distributions (simple and complex) for design-based variance estimators under the stratified two-stage cluster sampling of DHS surveys, with explicit superpopulation and design assumptions stated. A simulation study compares these distributions to empirical sampling distributions generated under the same design and evaluates the resulting variance-smoothed Fay-Herriot models against the standard chi-squared choice using proper scoring rules for credible intervals. The standard model shows undercoverage while the proposed models improve interval performance, with the simple distribution performing equally well as the complex one. The approach is illustrated by estimating height-for-age z-scores from the 2022 Kenya DHS.

Significance. If the derivations hold under the stated assumptions, the work addresses a genuine gap in small-area estimation for complex surveys by providing alternatives to the chi-squared sampling distribution, which the abstract notes is valid only under strong assumptions. The explicit discussion of assumptions, the direct simulation check against empirical distributions, and the finding that the simpler distribution suffices are strengths that enhance practical utility. The use of proper scoring rules for evaluating credible intervals adds rigor, and the DHS application demonstrates relevance for low-sample domains in low- and middle-income country surveys.

major comments (1)

[Simulation study] Simulation study: the central claim that the simple sampling distribution performs equally as well as the more complex one (and both outperform the standard model) rests on proper scoring rule comparisons; the manuscript should report the actual numerical scores, coverage probabilities, and any differences in a table to allow assessment of whether equivalence holds across metrics or if small but systematic differences exist.

minor comments (3)

The abstract refers to 'proper scoring rules' without naming them (e.g., CRPS or logarithmic score); specify the exact rules and their implementation in the methods or simulation section for reproducibility.
[Assumptions discussion] A summary table or dedicated paragraph listing the superpopulation and design assumptions required for each of the two derived distributions would improve clarity and help readers evaluate applicability to other surveys.
Consider adding a short comparison to existing literature on variance smoothing or sampling distributions in Fay-Herriot models to better position the novelty of the two new distributions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive and constructive review, which recommends minor revision. We address the single major comment below and will incorporate the suggested changes into the revised manuscript.

read point-by-point responses

Referee: [Simulation study] Simulation study: the central claim that the simple sampling distribution performs equally as well as the more complex one (and both outperform the standard model) rests on proper scoring rule comparisons; the manuscript should report the actual numerical scores, coverage probabilities, and any differences in a table to allow assessment of whether equivalence holds across metrics or if small but systematic differences exist.

Authors: We agree that including the numerical results would strengthen the presentation and allow readers to directly assess the claimed equivalence. In the revised manuscript we will add a table (or expanded table) in the simulation study section that reports the actual proper scoring rule values (e.g., CRPS or log score), empirical coverage probabilities, and the observed differences between the standard chi-squared model, the simple sampling distribution, and the complex sampling distribution. This will make the performance comparisons fully transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper explicitly derives two sampling distributions (simple and complex) for design-based variance estimators under the DHS stratified two-stage clustering design, stating the required superpopulation and design assumptions in the abstract and full text. These derivations are presented as new contributions, with the standard chi-squared choice identified as valid only under strong assumptions that the authors relax. The simulation directly generates empirical sampling distributions under the same design to compare against the derived forms, and evaluates the resulting variance-smoothed Fay-Herriot models via proper scoring rules against the standard model. No step reduces a prediction or central claim to a fitted input, self-citation, or definitional equivalence; the chain is externally validated by simulation and remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about the superpopulation model and the exact stratified two-stage clustering design; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Strong assumptions on the population and sampling design required for the derived sampling distributions to be valid
Explicitly stated in the abstract as necessary for the chi-squared choice and the new derivations.

pith-pipeline@v0.9.0 · 5518 in / 1233 out tokens · 54154 ms · 2026-05-08T10:41:03.823282+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

E., Gerber, M., and Robert, C

Bernton, E., Jacob, P. E., Gerber, M., and Robert, C. P. (2019). On parameter estimation with the Wasserstein distance.Information and Inference: A Journal of the IMA, 8(4):657–676. Besag, J., York, J., and Molli´e, A. (1991). Bayesian image restoration, with two applications in spatial statistics.Annals of the Institute of Statistical Mathematics, 43:1–2...

work page 2019
[2]

An essay on the logical foundations of survey sampling, part I

Earth Observation Group, NOAA National Centers for Environmental Information (2022). VIIRS nighttime lights version 2 annual composites.https://eogdata.mines.edu/products/vnl/. Accessed: 2026-02-16. Erciulescu, A. L., Cruze, N. B., and Nandram, B. (2019). Model-based county level crop estimates incorporating auxiliary sources of information.Journal of the...

work page 2022

[1] [1]

E., Gerber, M., and Robert, C

Bernton, E., Jacob, P. E., Gerber, M., and Robert, C. P. (2019). On parameter estimation with the Wasserstein distance.Information and Inference: A Journal of the IMA, 8(4):657–676. Besag, J., York, J., and Molli´e, A. (1991). Bayesian image restoration, with two applications in spatial statistics.Annals of the Institute of Statistical Mathematics, 43:1–2...

work page 2019

[2] [2]

An essay on the logical foundations of survey sampling, part I

Earth Observation Group, NOAA National Centers for Environmental Information (2022). VIIRS nighttime lights version 2 annual composites.https://eogdata.mines.edu/products/vnl/. Accessed: 2026-02-16. Erciulescu, A. L., Cruze, N. B., and Nandram, B. (2019). Model-based county level crop estimates incorporating auxiliary sources of information.Journal of the...

work page 2022