Covariate Balancing Based on Kernel Density Estimates for Controlled Experiments
Pith reviewed 2026-05-24 14:07 UTC · model grok-4.3
The pith
Partitioning units by minimizing kernel density differences in covariates before randomization improves accuracy of the difference-in-mean estimator.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a new covariate balancing criterion that measures differences between kernel density estimates of covariates across treatment groups. Experimental units are partitioned by minimizing this criterion before treatments are randomly assigned within partitions. Numerical examples demonstrate that this partition approach improves the accuracy of the difference-in-mean estimator and outperforms both complete randomization and rerandomization.
What carries the argument
The kernel density estimate difference criterion for partitioning units to achieve covariate balance before treatment randomization.
Load-bearing premise
Minimizing differences between kernel density estimates of covariates across partitions will produce better finite-sample properties for the difference-in-mean estimator without introducing new biases or power loss.
What would settle it
A Monte Carlo simulation study in which the proposed partition method produces higher mean squared error or bias for the treatment effect estimator than rerandomization under the same covariate distributions.
Figures
read the original abstract
Controlled experiments are widely used in many applications to investigate the causal relationship between input factors and experimental outcomes. A completely randomized design is usually used to randomly assign treatment levels to experimental units. When covariates of the experimental units are available, the experimental design should achieve covariate balancing among the treatment groups, such that the statistical inference of the treatment effects is not confounded with any possible effects of covariates. However, covariate imbalance often exists, because the experiment is carried out based on a single realization of the complete randomization. It is more likely to occur and worsen when the size of the experimental units is small or moderate. In this paper, we introduce a new covariate balancing criterion, which measures the differences between kernel density estimates of the covariates of treatment groups. To achieve covariate balance before the treatments are randomly assigned, we partition the experimental units by minimizing the criterion, then randomly assign the treatment levels to the partitioned groups. Through numerical examples, we show that the proposed partition approach can improve the accuracy of the difference-in-mean estimator and outperforms the complete randomization and rerandomization approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a covariate balancing criterion based on differences between kernel density estimates (KDEs) of covariates across treatment groups. Experimental units are partitioned by minimizing this criterion, after which treatments are randomly assigned within the resulting partitions. The central claim, supported by numerical examples, is that this approach improves the accuracy of the difference-in-mean estimator relative to complete randomization and rerandomization while preserving unbiasedness under the randomization distribution.
Significance. If the empirical results hold under more detailed scrutiny, the method provides a direct way to target finite-sample distributional balance in randomized experiments via a KDE discrepancy, which could improve precision of treatment effect estimates in small-to-moderate samples without altering the randomization-based inference framework. The approach is conceptually straightforward and avoids introducing bias by construction.
major comments (1)
- [Numerical examples] Numerical examples: the simulation setup is described without specifying the data-generating process, bandwidth selection procedure for the KDEs, number of partitions, number of Monte Carlo replications, or any variability measures (standard errors or error bars) on the reported accuracy improvements. These omissions make it impossible to verify or reproduce the claimed outperformance, which is load-bearing for the paper's empirical central claim.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We agree that the numerical examples require substantially more detail to support the central empirical claims and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: Numerical examples: the simulation setup is described without specifying the data-generating process, bandwidth selection procedure for the KDEs, number of partitions, number of Monte Carlo replications, or any variability measures (standard errors or error bars) on the reported accuracy improvements. These omissions make it impossible to verify or reproduce the claimed outperformance, which is load-bearing for the paper's empirical central claim.
Authors: We agree that the current description of the numerical examples is insufficient for reproducibility. In the revised manuscript we will add: (i) explicit data-generating processes for all simulation scenarios, (ii) the precise bandwidth selection rule employed for the KDEs, (iii) the number of partitions used, (iv) the number of Monte Carlo replications, and (v) standard errors or error bars on all reported accuracy measures. These additions will be placed in a dedicated simulation section with accompanying tables or figures. revision: yes
Circularity Check
No significant circularity; empirical claim stands on numerical tests
full rationale
The paper proposes a KDE-based discrepancy as a new partitioning criterion for covariate balance, then demonstrates via numerical examples that the resulting design improves finite-sample accuracy of the difference-in-mean estimator relative to complete randomization and rerandomization. No derivation chain exists that reduces a claimed prediction or uniqueness result to a fitted input, self-citation, or definitional tautology. The central claim is explicitly empirical and does not rely on any load-bearing self-citation or ansatz smuggled from prior work by the same authors. The design preserves the unbiasedness property of randomization within partitions, and the reported outperformance is presented as a numerical finding rather than an algebraic identity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Anderson, N. H., Hall, P., and Titterington, D. M. (1994), “Two-sam ple test statistics for measuring discrepancies between two multivariate probability de nsity functions using kernel-based density estimates,” Journal of Multivariate Analysis , 50, 41–54
work page 1994
-
[2]
Bernstein, S. (1927), “Sur l’extension du th´ eor` eme limite du calcu l des probabilit´ es aux sommes de quantit´ es d´ ependantes,”Mathematische Annalen , 97, 1–59
work page 1927
-
[3]
The power of op timization over ran- domization in designing experiments involving small samples,
Bertsimas, D., Johnson, M., and Kallus, N. (2015), “The power of op timization over ran- domization in designing experiments involving small samples,” Operations Research, 63, 868–876
work page 2015
-
[4]
cem: Coars ened exact matching in Stata,
Blackwell, M., Iacus, S., King, G., and Porro, G. (2009), “cem: Coars ened exact matching in Stata,” The Stata Journal , 9, 524–546. 23 de Lima, M. S. and Atuncar, G. S. (2011), “A Bayesian method to es timate the optimal bandwidth for multivariate kernel estimator,” Journal of Nonparametric Statistics , 23, 137–148
work page 2009
-
[5]
Plug-in bandwidth matrices fo r bivariate kernel density estimation,
Duong, T. and Hazelton, M. (2003), “Plug-in bandwidth matrices fo r bivariate kernel density estimation,” Journal of Nonparametric Statistics , 15, 17–30
work page 2003
-
[6]
Cross-validation Bandwidt h Matrices for Multi- variate Kernel Density Estimation,
Duong, T. and Hazelton, M. L. (2005), “Cross-validation Bandwidt h Matrices for Multi- variate Kernel Density Estimation,” Scandinavian Journal of Statistics , 32, 485–506
work page 2005
-
[7]
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004), “L east angle regression,” The Annals of statistics , 32, 407–499
work page 2004
-
[8]
Doubly robust estimation of causal effects,
Funk, M. J., Westreich, D., Wiesen, C., St¨ urmer, T., Brookhart, M. A., and Davidian, M. (2011), “Doubly robust estimation of causal effects,” American journal of epidemiology , 173, 761–767. Gurobi Optimization, L. (2020), “Gurobi Optimizer Reference Man ual,” . H¨ ardle, W. K., M¨ uller, M., Sperlich, S., and Werwatz, A. (2012), Nonparametric and semipa...
work page 2011
-
[9]
Imbens, G. W. and Rubin, D. B. (2015), Causal Inference for Statistics, Social, and Biomed- ical Sciences: An Introduction , Cambridge University Press
work page 2015
-
[10]
Progre ss in data-based bandwidth selection for kernel density estimation,
Jones, M. C., Marron, J. S., and Sheather, S. J. (1996), “Progre ss in data-based bandwidth selection for kernel density estimation,” Computational Statistics , 11, 337–381
work page 1996
-
[11]
Optimal a priori balance in the design of controlle d experiments,
Kallus, N. (2018), “Optimal a priori balance in the design of controlle d experiments,” Journal of the Royal Statistical Society: Series B (Statist ical Methodology), 80, 85–112
work page 2018
-
[12]
Post-stratification in the ran domized clinical trial,
McHugh, R. and Matts, J. (1983), “Post-stratification in the ran domized clinical trial,” Biometrics, 217–225
work page 1983
-
[13]
Genetic algorithms, tour nament selection, and the effects of noise,
Miller, B. L., Goldberg, D. E., et al. (1995), “Genetic algorithms, tour nament selection, and the effects of noise,” Complex systems , 9, 193–212. 24
work page 1995
-
[14]
Rerandomization to balanc e tiers of covariates,
Morgan, K. L. and Rubin, D. B. (2015), “Rerandomization to balanc e tiers of covariates,” Journal of the American Statistical Association , 110, 1412–1421
work page 2015
-
[15]
Rerandomization to impro ve covariate balance in experiments,
Morgan, K. L., Rubin, D. B., et al. (2012), “Rerandomization to impro ve covariate balance in experiments,” The Annals of Statistics , 40, 1263–1282
work page 2012
-
[16]
(2000), Causality: Models, Reasoning, and Inference , Cambridge University Press
Pearl, J. (2000), Causality: Models, Reasoning, and Inference , Cambridge University Press
work page 2000
-
[17]
(2017), Observation and Experiment: An Introduction to Causal Infe rence, Harvard University Press
Rosenbaum, P. (2017), Observation and Experiment: An Introduction to Causal Infe rence, Harvard University Press
work page 2017
-
[18]
The central role of th e propensity score in observational studies for causal effects,
Rosenbaum, P. R. and Rubin, D. B. (1983), “The central role of th e propensity score in observational studies for causal effects,” Biometrika, 70, 41–55
work page 1983
-
[19]
Randomization analysis of experimental data : The Fisher random- ization test comment,
Rubin, D. B. (1980), “Randomization analysis of experimental data : The Fisher random- ization test comment,” Journal of the American Statistical Association , 75, 591–593. — (2005), “Causal inference using potential outcomes: Design, m odeling, decisions,” Jour- nal of the American Statistical Association , 100, 322–331
work page 1980
-
[20]
Cross-valida tion of multivariate densities,
Sain, S. R., Baggerly, K. A., and Scott, D. W. (1994), “Cross-valida tion of multivariate densities,” Journal of the American Statistical Association , 89, 807–817
work page 1994
-
[21]
Scott, D. W. (2015), Multivariate density estimation: theory, practice, and vi sualization,
work page 2015
-
[22]
A reliable data-based b andwidth selection method for kernel density estimation,
Sheather, S. J. and Jones, M. C. (1991), “A reliable data-based b andwidth selection method for kernel density estimation,” Journal of the Royal Statistical Society. Series B (Method- ological), 53, 683–690
work page 1991
-
[23]
Silverman, B. W. (1986a), Density estimation for statistics and data analysis , vol. 26, CRC press. — (1986b), Density estimation for statistics and data analysis , vol. 26, Boca Raton, FL: CRC press. 25 Simonoff, J. S. (2012a), Smoothing methods in statistics , Springer Science & Business Me- dia. — (2012b), Smoothing methods in statistics , New York, NY:...
work page 1987
-
[24]
Comparison of smoothing pa rameterizations in bivariate kernel density estimation,
Wand, M. P. and Jones, M. C. (1993), “Comparison of smoothing pa rameterizations in bivariate kernel density estimation,” Journal of the American Statistical Association , 88, 520–528. — (1994), “Multivariate plug-in bandwidth selection,” Computational Statistics, 9, 97–116
work page 1993
-
[25]
Wu, C. J. and Hamada, M. S. (2011), Experiments: planning, analysis, and optimization , vol. 552, Hoboken, New Jersey: John Wiley & Sons
work page 2011
-
[26]
Improving the sensitivity of online controlled experi- ments: Case studies at netflix,
Xie, H. and Aurisset, J. (2016), “Improving the sensitivity of online controlled experi- ments: Case studies at netflix,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ACM, pp. 645–654
work page 2016
-
[27]
A Bayesian appr oach to bandwidth selection for multivariate kernel density estimation,
Zhang, X., King, M. L., and Hyndman, R. J. (2006), “A Bayesian appr oach to bandwidth selection for multivariate kernel density estimation,” Computational Statistics & Data Analysis, 50, 3009–3031. 26
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.