Cluster-level cross-fitting restores valid coverage for survey-weighted TMLE with flexible learners under stratified multistage designs, while single-fit and internal cross-validation versions under-cover.
On the use of cross-fitting in causal machine learning with correlated units
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
In causal machine learning, the fitting and evaluation of nuisance models are often performed on separate partitions, or folds, of the observed data. This technique, called cross-fitting, eliminates bias introduced by the use of black-box predictive algorithms. When study units may be correlated, such as in spatial, clustered, or time-series data, investigators often design bespoke forms of cross-fitting to minimize correlation between folds. We prove that, perhaps contrary to popular belief, this is typically unnecessary: performing cross fitting as if study units were independent still eliminates key bias terms even when units may be correlated. In simulation experiments with various correlation structures, we show that causal machine learning estimators achieve the same or improved bias and precision under cross-fitting that ignores correlation compared to techniques striving to eliminate correlation between folds.
fields
stat.ME 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Cross-Fitted Survey-Weighted TMLE with Design-Based Variance for Causal Machine Learning
Cluster-level cross-fitting restores valid coverage for survey-weighted TMLE with flexible learners under stratified multistage designs, while single-fit and internal cross-validation versions under-cover.