Covariate Balancing Based on Kernel Density Estimates for Controlled Experiments

Lulu Kang; Xiao Huang; Yiou Li

arxiv: 2008.05578 · v2 · submitted 2020-08-12 · 📊 stat.ME

Covariate Balancing Based on Kernel Density Estimates for Controlled Experiments

Yiou Li , Lulu Kang , Xiao Huang This is my paper

Pith reviewed 2026-05-24 14:07 UTC · model grok-4.3

classification 📊 stat.ME

keywords covariate balancingkernel density estimationcontrolled experimentsrandomization designdifference-in-mean estimatorexperimental designtreatment effect estimationpartition approach

0 comments

The pith

Partitioning units by minimizing kernel density differences in covariates before randomization improves accuracy of the difference-in-mean estimator.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes partitioning experimental units to minimize differences between kernel density estimates of their covariates across treatment groups, then randomly assigning treatments within those partitions. This addresses covariate imbalance that arises in a single draw from complete randomization, which is especially problematic for small or moderate sample sizes. The authors show through numerical examples that the resulting design yields more accurate estimates of treatment effects. A sympathetic reader cares because better pre-assignment balance reduces confounding and improves reliability of causal conclusions without needing post-experiment adjustments.

Core claim

The authors introduce a new covariate balancing criterion that measures differences between kernel density estimates of covariates across treatment groups. Experimental units are partitioned by minimizing this criterion before treatments are randomly assigned within partitions. Numerical examples demonstrate that this partition approach improves the accuracy of the difference-in-mean estimator and outperforms both complete randomization and rerandomization.

What carries the argument

The kernel density estimate difference criterion for partitioning units to achieve covariate balance before treatment randomization.

Load-bearing premise

Minimizing differences between kernel density estimates of covariates across partitions will produce better finite-sample properties for the difference-in-mean estimator without introducing new biases or power loss.

What would settle it

A Monte Carlo simulation study in which the proposed partition method produces higher mean squared error or bias for the treatment effect estimator than rerandomization under the same covariate distributions.

Figures

Figures reproduced from arXiv: 2008.05578 by Lulu Kang, Xiao Huang, Yiou Li.

**Figure 2.** Figure 2: Comparison of the estimated mean squared error of differ [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

read the original abstract

Controlled experiments are widely used in many applications to investigate the causal relationship between input factors and experimental outcomes. A completely randomized design is usually used to randomly assign treatment levels to experimental units. When covariates of the experimental units are available, the experimental design should achieve covariate balancing among the treatment groups, such that the statistical inference of the treatment effects is not confounded with any possible effects of covariates. However, covariate imbalance often exists, because the experiment is carried out based on a single realization of the complete randomization. It is more likely to occur and worsen when the size of the experimental units is small or moderate. In this paper, we introduce a new covariate balancing criterion, which measures the differences between kernel density estimates of the covariates of treatment groups. To achieve covariate balance before the treatments are randomly assigned, we partition the experimental units by minimizing the criterion, then randomly assign the treatment levels to the partitioned groups. Through numerical examples, we show that the proposed partition approach can improve the accuracy of the difference-in-mean estimator and outperforms the complete randomization and rerandomization approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes partitioning units by minimizing a KDE discrepancy on covariates before within-group randomization, and claims via examples that this beats complete randomization and rerandomization on difference-in-means accuracy.

read the letter

The core move is to partition experimental units ahead of time so that the kernel density estimates of the covariates look similar across partitions, then randomize treatment inside those partitions. This keeps the difference-in-means estimator unbiased while targeting fuller distributional balance rather than just first moments or Mahalanobis distance. That specific KDE criterion for the partitioning step is the main novelty relative to the rerandomization literature the abstract cites. The design itself is coherent: it does not introduce new bias and the numerical claim is presented as an empirical improvement rather than a theorem. The stress-test note is right that no obvious internal contradiction appears in the argument as stated. The examples are the weak point. The abstract gives no information on data-generating processes, bandwidth choice, number of partitions, or how many replications were run, so it is impossible to judge whether the reported gains are robust or setup-dependent. That leaves the outperformance claim hard to evaluate from the given text. This is a narrow but practical idea aimed at people running moderate-sized controlled experiments who already have covariate data and want a lightweight way to reduce imbalance. It is not foundational, but the method is simple enough that a referee could check the code and examples directly. I would send it to peer review so the details can be filled in and tested.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes a covariate balancing criterion based on differences between kernel density estimates (KDEs) of covariates across treatment groups. Experimental units are partitioned by minimizing this criterion, after which treatments are randomly assigned within the resulting partitions. The central claim, supported by numerical examples, is that this approach improves the accuracy of the difference-in-mean estimator relative to complete randomization and rerandomization while preserving unbiasedness under the randomization distribution.

Significance. If the empirical results hold under more detailed scrutiny, the method provides a direct way to target finite-sample distributional balance in randomized experiments via a KDE discrepancy, which could improve precision of treatment effect estimates in small-to-moderate samples without altering the randomization-based inference framework. The approach is conceptually straightforward and avoids introducing bias by construction.

major comments (1)

[Numerical examples] Numerical examples: the simulation setup is described without specifying the data-generating process, bandwidth selection procedure for the KDEs, number of partitions, number of Monte Carlo replications, or any variability measures (standard errors or error bars) on the reported accuracy improvements. These omissions make it impossible to verify or reproduce the claimed outperformance, which is load-bearing for the paper's empirical central claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We agree that the numerical examples require substantially more detail to support the central empirical claims and will revise the manuscript accordingly.

read point-by-point responses

Referee: Numerical examples: the simulation setup is described without specifying the data-generating process, bandwidth selection procedure for the KDEs, number of partitions, number of Monte Carlo replications, or any variability measures (standard errors or error bars) on the reported accuracy improvements. These omissions make it impossible to verify or reproduce the claimed outperformance, which is load-bearing for the paper's empirical central claim.

Authors: We agree that the current description of the numerical examples is insufficient for reproducibility. In the revised manuscript we will add: (i) explicit data-generating processes for all simulation scenarios, (ii) the precise bandwidth selection rule employed for the KDEs, (iii) the number of partitions used, (iv) the number of Monte Carlo replications, and (v) standard errors or error bars on all reported accuracy measures. These additions will be placed in a dedicated simulation section with accompanying tables or figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claim stands on numerical tests

full rationale

The paper proposes a KDE-based discrepancy as a new partitioning criterion for covariate balance, then demonstrates via numerical examples that the resulting design improves finite-sample accuracy of the difference-in-mean estimator relative to complete randomization and rerandomization. No derivation chain exists that reduces a claimed prediction or uniqueness result to a fitted input, self-citation, or definitional tautology. The central claim is explicitly empirical and does not rely on any load-bearing self-citation or ansatz smuggled from prior work by the same authors. The design preserves the unbiasedness property of randomization within partitions, and the reported outperformance is presented as a numerical finding rather than an algebraic identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; the method relies on standard kernel density estimation but introduces no explicit free parameters, axioms, or invented entities beyond the new balancing criterion itself.

pith-pipeline@v0.9.0 · 5710 in / 982 out tokens · 20520 ms · 2026-05-24T14:07:42.500352+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Two-sam ple test statistics for measuring discrepancies between two multivariate probability de nsity functions using kernel-based density estimates,

Anderson, N. H., Hall, P., and Titterington, D. M. (1994), “Two-sam ple test statistics for measuring discrepancies between two multivariate probability de nsity functions using kernel-based density estimates,” Journal of Multivariate Analysis , 50, 41–54

work page 1994
[2]

Sur l’extension du th´ eor` eme limite du calcu l des probabilit´ es aux sommes de quantit´ es d´ ependantes,

Bernstein, S. (1927), “Sur l’extension du th´ eor` eme limite du calcu l des probabilit´ es aux sommes de quantit´ es d´ ependantes,”Mathematische Annalen , 97, 1–59

work page 1927
[3]

The power of op timization over ran- domization in designing experiments involving small samples,

Bertsimas, D., Johnson, M., and Kallus, N. (2015), “The power of op timization over ran- domization in designing experiments involving small samples,” Operations Research, 63, 868–876

work page 2015
[4]

cem: Coars ened exact matching in Stata,

Blackwell, M., Iacus, S., King, G., and Porro, G. (2009), “cem: Coars ened exact matching in Stata,” The Stata Journal , 9, 524–546. 23 de Lima, M. S. and Atuncar, G. S. (2011), “A Bayesian method to es timate the optimal bandwidth for multivariate kernel estimator,” Journal of Nonparametric Statistics , 23, 137–148

work page 2009
[5]

Plug-in bandwidth matrices fo r bivariate kernel density estimation,

Duong, T. and Hazelton, M. (2003), “Plug-in bandwidth matrices fo r bivariate kernel density estimation,” Journal of Nonparametric Statistics , 15, 17–30

work page 2003
[6]

Cross-validation Bandwidt h Matrices for Multi- variate Kernel Density Estimation,

Duong, T. and Hazelton, M. L. (2005), “Cross-validation Bandwidt h Matrices for Multi- variate Kernel Density Estimation,” Scandinavian Journal of Statistics , 32, 485–506

work page 2005
[7]

L east angle regression,

Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004), “L east angle regression,” The Annals of statistics , 32, 407–499

work page 2004
[8]

Doubly robust estimation of causal eﬀects,

Funk, M. J., Westreich, D., Wiesen, C., St¨ urmer, T., Brookhart, M. A., and Davidian, M. (2011), “Doubly robust estimation of causal eﬀects,” American journal of epidemiology , 173, 761–767. Gurobi Optimization, L. (2020), “Gurobi Optimizer Reference Man ual,” . H¨ ardle, W. K., M¨ uller, M., Sperlich, S., and Werwatz, A. (2012), Nonparametric and semipa...

work page 2011
[9]

Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction , Cambridge University Press

work page 2015
[10]

Progre ss in data-based bandwidth selection for kernel density estimation,

Jones, M. C., Marron, J. S., and Sheather, S. J. (1996), “Progre ss in data-based bandwidth selection for kernel density estimation,” Computational Statistics , 11, 337–381

work page 1996
[11]

Optimal a priori balance in the design of controlle d experiments,

Kallus, N. (2018), “Optimal a priori balance in the design of controlle d experiments,” Journal of the Royal Statistical Society: Series B (Statist ical Methodology), 80, 85–112

work page 2018
[12]

Post-stratiﬁcation in the ran domized clinical trial,

McHugh, R. and Matts, J. (1983), “Post-stratiﬁcation in the ran domized clinical trial,” Biometrics, 217–225

work page 1983
[13]

Genetic algorithms, tour nament selection, and the eﬀects of noise,

Miller, B. L., Goldberg, D. E., et al. (1995), “Genetic algorithms, tour nament selection, and the eﬀects of noise,” Complex systems , 9, 193–212. 24

work page 1995
[14]

Rerandomization to balanc e tiers of covariates,

Morgan, K. L. and Rubin, D. B. (2015), “Rerandomization to balanc e tiers of covariates,” Journal of the American Statistical Association , 110, 1412–1421

work page 2015
[15]

Rerandomization to impro ve covariate balance in experiments,

Morgan, K. L., Rubin, D. B., et al. (2012), “Rerandomization to impro ve covariate balance in experiments,” The Annals of Statistics , 40, 1263–1282

work page 2012
[16]

(2000), Causality: Models, Reasoning, and Inference , Cambridge University Press

Pearl, J. (2000), Causality: Models, Reasoning, and Inference , Cambridge University Press

work page 2000
[17]

(2017), Observation and Experiment: An Introduction to Causal Infe rence, Harvard University Press

Rosenbaum, P. (2017), Observation and Experiment: An Introduction to Causal Infe rence, Harvard University Press

work page 2017
[18]

The central role of th e propensity score in observational studies for causal eﬀects,

Rosenbaum, P. R. and Rubin, D. B. (1983), “The central role of th e propensity score in observational studies for causal eﬀects,” Biometrika, 70, 41–55

work page 1983
[19]

Randomization analysis of experimental data : The Fisher random- ization test comment,

Rubin, D. B. (1980), “Randomization analysis of experimental data : The Fisher random- ization test comment,” Journal of the American Statistical Association , 75, 591–593. — (2005), “Causal inference using potential outcomes: Design, m odeling, decisions,” Jour- nal of the American Statistical Association , 100, 322–331

work page 1980
[20]

Cross-valida tion of multivariate densities,

Sain, S. R., Baggerly, K. A., and Scott, D. W. (1994), “Cross-valida tion of multivariate densities,” Journal of the American Statistical Association , 89, 807–817

work page 1994
[21]

Scott, D. W. (2015), Multivariate density estimation: theory, practice, and vi sualization,

work page 2015
[22]

A reliable data-based b andwidth selection method for kernel density estimation,

Sheather, S. J. and Jones, M. C. (1991), “A reliable data-based b andwidth selection method for kernel density estimation,” Journal of the Royal Statistical Society. Series B (Method- ological), 53, 683–690

work page 1991
[23]

Simulated annealin g,

Silverman, B. W. (1986a), Density estimation for statistics and data analysis , vol. 26, CRC press. — (1986b), Density estimation for statistics and data analysis , vol. 26, Boca Raton, FL: CRC press. 25 Simonoﬀ, J. S. (2012a), Smoothing methods in statistics , Springer Science & Business Me- dia. — (2012b), Smoothing methods in statistics , New York, NY:...

work page 1987
[24]

Comparison of smoothing pa rameterizations in bivariate kernel density estimation,

Wand, M. P. and Jones, M. C. (1993), “Comparison of smoothing pa rameterizations in bivariate kernel density estimation,” Journal of the American Statistical Association , 88, 520–528. — (1994), “Multivariate plug-in bandwidth selection,” Computational Statistics, 9, 97–116

work page 1993
[25]

Wu, C. J. and Hamada, M. S. (2011), Experiments: planning, analysis, and optimization , vol. 552, Hoboken, New Jersey: John Wiley & Sons

work page 2011
[26]

Improving the sensitivity of online controlled experi- ments: Case studies at netﬂix,

Xie, H. and Aurisset, J. (2016), “Improving the sensitivity of online controlled experi- ments: Case studies at netﬂix,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ACM, pp. 645–654

work page 2016
[27]

A Bayesian appr oach to bandwidth selection for multivariate kernel density estimation,

Zhang, X., King, M. L., and Hyndman, R. J. (2006), “A Bayesian appr oach to bandwidth selection for multivariate kernel density estimation,” Computational Statistics & Data Analysis, 50, 3009–3031. 26

work page 2006

[1] [1]

Two-sam ple test statistics for measuring discrepancies between two multivariate probability de nsity functions using kernel-based density estimates,

Anderson, N. H., Hall, P., and Titterington, D. M. (1994), “Two-sam ple test statistics for measuring discrepancies between two multivariate probability de nsity functions using kernel-based density estimates,” Journal of Multivariate Analysis , 50, 41–54

work page 1994

[2] [2]

Sur l’extension du th´ eor` eme limite du calcu l des probabilit´ es aux sommes de quantit´ es d´ ependantes,

Bernstein, S. (1927), “Sur l’extension du th´ eor` eme limite du calcu l des probabilit´ es aux sommes de quantit´ es d´ ependantes,”Mathematische Annalen , 97, 1–59

work page 1927

[3] [3]

The power of op timization over ran- domization in designing experiments involving small samples,

Bertsimas, D., Johnson, M., and Kallus, N. (2015), “The power of op timization over ran- domization in designing experiments involving small samples,” Operations Research, 63, 868–876

work page 2015

[4] [4]

cem: Coars ened exact matching in Stata,

Blackwell, M., Iacus, S., King, G., and Porro, G. (2009), “cem: Coars ened exact matching in Stata,” The Stata Journal , 9, 524–546. 23 de Lima, M. S. and Atuncar, G. S. (2011), “A Bayesian method to es timate the optimal bandwidth for multivariate kernel estimator,” Journal of Nonparametric Statistics , 23, 137–148

work page 2009

[5] [5]

Plug-in bandwidth matrices fo r bivariate kernel density estimation,

Duong, T. and Hazelton, M. (2003), “Plug-in bandwidth matrices fo r bivariate kernel density estimation,” Journal of Nonparametric Statistics , 15, 17–30

work page 2003

[6] [6]

Cross-validation Bandwidt h Matrices for Multi- variate Kernel Density Estimation,

Duong, T. and Hazelton, M. L. (2005), “Cross-validation Bandwidt h Matrices for Multi- variate Kernel Density Estimation,” Scandinavian Journal of Statistics , 32, 485–506

work page 2005

[7] [7]

L east angle regression,

Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004), “L east angle regression,” The Annals of statistics , 32, 407–499

work page 2004

[8] [8]

Doubly robust estimation of causal eﬀects,

Funk, M. J., Westreich, D., Wiesen, C., St¨ urmer, T., Brookhart, M. A., and Davidian, M. (2011), “Doubly robust estimation of causal eﬀects,” American journal of epidemiology , 173, 761–767. Gurobi Optimization, L. (2020), “Gurobi Optimizer Reference Man ual,” . H¨ ardle, W. K., M¨ uller, M., Sperlich, S., and Werwatz, A. (2012), Nonparametric and semipa...

work page 2011

[9] [9]

Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction , Cambridge University Press

work page 2015

[10] [10]

Progre ss in data-based bandwidth selection for kernel density estimation,

Jones, M. C., Marron, J. S., and Sheather, S. J. (1996), “Progre ss in data-based bandwidth selection for kernel density estimation,” Computational Statistics , 11, 337–381

work page 1996

[11] [11]

Optimal a priori balance in the design of controlle d experiments,

Kallus, N. (2018), “Optimal a priori balance in the design of controlle d experiments,” Journal of the Royal Statistical Society: Series B (Statist ical Methodology), 80, 85–112

work page 2018

[12] [12]

Post-stratiﬁcation in the ran domized clinical trial,

McHugh, R. and Matts, J. (1983), “Post-stratiﬁcation in the ran domized clinical trial,” Biometrics, 217–225

work page 1983

[13] [13]

Genetic algorithms, tour nament selection, and the eﬀects of noise,

Miller, B. L., Goldberg, D. E., et al. (1995), “Genetic algorithms, tour nament selection, and the eﬀects of noise,” Complex systems , 9, 193–212. 24

work page 1995

[14] [14]

Rerandomization to balanc e tiers of covariates,

Morgan, K. L. and Rubin, D. B. (2015), “Rerandomization to balanc e tiers of covariates,” Journal of the American Statistical Association , 110, 1412–1421

work page 2015

[15] [15]

Rerandomization to impro ve covariate balance in experiments,

Morgan, K. L., Rubin, D. B., et al. (2012), “Rerandomization to impro ve covariate balance in experiments,” The Annals of Statistics , 40, 1263–1282

work page 2012

[16] [16]

(2000), Causality: Models, Reasoning, and Inference , Cambridge University Press

Pearl, J. (2000), Causality: Models, Reasoning, and Inference , Cambridge University Press

work page 2000

[17] [17]

(2017), Observation and Experiment: An Introduction to Causal Infe rence, Harvard University Press

Rosenbaum, P. (2017), Observation and Experiment: An Introduction to Causal Infe rence, Harvard University Press

work page 2017

[18] [18]

The central role of th e propensity score in observational studies for causal eﬀects,

Rosenbaum, P. R. and Rubin, D. B. (1983), “The central role of th e propensity score in observational studies for causal eﬀects,” Biometrika, 70, 41–55

work page 1983

[19] [19]

Randomization analysis of experimental data : The Fisher random- ization test comment,

Rubin, D. B. (1980), “Randomization analysis of experimental data : The Fisher random- ization test comment,” Journal of the American Statistical Association , 75, 591–593. — (2005), “Causal inference using potential outcomes: Design, m odeling, decisions,” Jour- nal of the American Statistical Association , 100, 322–331

work page 1980

[20] [20]

Cross-valida tion of multivariate densities,

Sain, S. R., Baggerly, K. A., and Scott, D. W. (1994), “Cross-valida tion of multivariate densities,” Journal of the American Statistical Association , 89, 807–817

work page 1994

[21] [21]

Scott, D. W. (2015), Multivariate density estimation: theory, practice, and vi sualization,

work page 2015

[22] [22]

A reliable data-based b andwidth selection method for kernel density estimation,

Sheather, S. J. and Jones, M. C. (1991), “A reliable data-based b andwidth selection method for kernel density estimation,” Journal of the Royal Statistical Society. Series B (Method- ological), 53, 683–690

work page 1991

[23] [23]

Simulated annealin g,

Silverman, B. W. (1986a), Density estimation for statistics and data analysis , vol. 26, CRC press. — (1986b), Density estimation for statistics and data analysis , vol. 26, Boca Raton, FL: CRC press. 25 Simonoﬀ, J. S. (2012a), Smoothing methods in statistics , Springer Science & Business Me- dia. — (2012b), Smoothing methods in statistics , New York, NY:...

work page 1987

[24] [24]

Comparison of smoothing pa rameterizations in bivariate kernel density estimation,

Wand, M. P. and Jones, M. C. (1993), “Comparison of smoothing pa rameterizations in bivariate kernel density estimation,” Journal of the American Statistical Association , 88, 520–528. — (1994), “Multivariate plug-in bandwidth selection,” Computational Statistics, 9, 97–116

work page 1993

[25] [25]

Wu, C. J. and Hamada, M. S. (2011), Experiments: planning, analysis, and optimization , vol. 552, Hoboken, New Jersey: John Wiley & Sons

work page 2011

[26] [26]

Improving the sensitivity of online controlled experi- ments: Case studies at netﬂix,

Xie, H. and Aurisset, J. (2016), “Improving the sensitivity of online controlled experi- ments: Case studies at netﬂix,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ACM, pp. 645–654

work page 2016

[27] [27]

A Bayesian appr oach to bandwidth selection for multivariate kernel density estimation,

Zhang, X., King, M. L., and Hyndman, R. J. (2006), “A Bayesian appr oach to bandwidth selection for multivariate kernel density estimation,” Computational Statistics & Data Analysis, 50, 3009–3031. 26

work page 2006