Bootstrap consistency for general double/debiased machine learning estimators

Fang Han; Ziming Lin

arxiv: 2604.17239 · v1 · submitted 2026-04-19 · 🧮 math.ST · econ.EM· stat.TH

Bootstrap consistency for general double/debiased machine learning estimators

Ziming Lin , Fang Han This is my paper

Pith reviewed 2026-05-10 06:09 UTC · model grok-4.3

classification 🧮 math.ST econ.EMstat.TH

keywords bootstrap consistencydouble machine learningdebiased machine learningNeyman orthogonalitycross-fittingasymptotic normalityresampling

0 comments

The pith

Bootstrap methods are valid for double/debiased machine learning estimators under exactly the conditions already required for their asymptotic normality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that resampling schemes such as Efron's bootstrap produce a distribution that matches the sampling distribution of a DML estimator. This holds without any extra assumptions beyond those already needed for the estimator itself to be asymptotically normal. The result matters because bootstrap is often used in practice for DML inference, yet bootstrap can fail for other root-n consistent estimators and had lacked general justification in the DML setting. A reader who accepts the claim gains a theoretical basis to use bootstrap standard errors and intervals directly from the same orthogonal scores and cross-fitting already in place.

Core claim

Under exactly the same conditions required for the validity of DML itself, the bootstrap law converges conditionally weakly to the sampling law of the original estimator, and this holds for general exchangeably weighted resampling schemes with Efron's bootstrap as a special case.

What carries the argument

Neyman-orthogonal scores with cross-fitting, which remove the need for Donsker-type conditions and allow the bootstrap to track the estimator's limiting distribution.

Load-bearing premise

The DML estimator must satisfy the Neyman-orthogonality and cross-fitting rate conditions that already make it asymptotically normal.

What would settle it

A data-generating process in which the DML estimator is asymptotically normal yet the conditional distribution of the bootstrap version fails to converge to the same limit.

read the original abstract

Double/debiased machine learning (DML) provides a general framework for inference with high-dimensional or otherwise complex nuisance parameters by combining Neyman-orthogonal scores with cross-fitting, thereby circumventing classical Donsker-type conditions in many modern machine-learning settings. Despite its strong empirical performance, bootstrap inference for DML estimators has received little theoretical justification. This is particularly noteworthy since bootstrap methods are suggested ad used for inference on DML estimators, even though bootstrap procedures can fail for estimators that are root-$n$ consistent and asymptotically normal. This paper fills this gap by establishing bootstrap validity for DML estimators under general exchangeably weighted resampling schemes, with Efron's bootstrap as a special case. Under exactly the same conditions required for the validity of DML itself, we prove that the bootstrap law converges conditionally weakly to the sampling law of the original estimator.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows bootstrap works for DML under exactly the same conditions already needed for its own asymptotic normality.

read the letter

The main point is that this paper proves bootstrap consistency for double/debiased machine learning estimators under the exact Neyman-orthogonality, cross-fitting, and rate conditions that already justify the DML estimator's root-n normality. No stronger assumptions are added. It covers general exchangeably weighted resampling, with Efron's bootstrap as the main case, and shows the bootstrap law converges conditionally weakly to the true sampling law. Prior DML papers noted people were already using bootstrap in applications but supplied no matching theory under those weak conditions, so this closes the gap directly. The argument builds on the existing DML framework without introducing circularity or extra restrictions, which keeps the result clean and usable. The abstract states the claim plainly and the stress-test found no internal contradictions or hidden assumptions. The main limitation is that the full proof and lemmas are not visible in the provided text, so the technical steps around how cross-fitting carries over to the bootstrap world cannot be checked here. If those steps hold, the result is solid; if not, the claim would need adjustment. This is aimed at researchers in high-dimensional statistics and econometrics who rely on DML and want bootstrap-based inference with theoretical support. A reader who already knows the DML literature will see the value immediately and can apply it without learning new conditions. The paper deserves a serious referee because it addresses a practical hole in a widely used method with a direct extension. I would recommend sending it to peer review so the details can be verified.

Referee Report

0 major / 1 minor

Summary. The paper establishes bootstrap consistency for general double/debiased machine learning (DML) estimators. It proves that, for exchangeably weighted resampling schemes (with Efron's bootstrap as a special case), the bootstrap law converges conditionally weakly to the sampling distribution of the DML estimator, under precisely the same Neyman-orthogonality, cross-fitting, and rate conditions already required for the asymptotic normality of the DML estimator itself.

Significance. If the result holds, this supplies the missing theoretical justification for bootstrap inference with DML estimators, which are widely used in practice for high-dimensional and complex nuisance settings. The paper merits credit for deriving the result as a direct extension without introducing extra assumptions beyond those for DML validity, and for covering general exchangeably weighted schemes rather than a single bootstrap variant.

minor comments (1)

[Abstract] Abstract: the phrase 'suggested ad used' is a typographical error and should read 'suggested and used'.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our manuscript on bootstrap consistency for general DML estimators. We appreciate the recommendation for minor revision and the recognition that the result holds under precisely the conditions already required for DML validity itself.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a direct mathematical proof that the bootstrap law converges conditionally weakly to the sampling law of the DML estimator, under precisely the same Neyman-orthogonality, cross-fitting, and rate conditions already required for DML asymptotic normality. No load-bearing step reduces the target bootstrap consistency result to a fitted parameter, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The central claim is an independent theorem establishing validity for general exchangeably weighted resampling schemes, with no evidence that any equation or prediction is equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard DML assumptions (Neyman orthogonality, cross-fitting, nuisance rate conditions) plus standard weak-convergence arguments for exchangeable bootstrap weights; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption Neyman orthogonality of the score function
Invoked to ensure the estimator remains root-n consistent even when nuisance parameters are estimated at slower rates.
domain assumption Cross-fitting to break dependence between nuisance estimation and score evaluation
Required for the asymptotic linearity that the bootstrap proof builds upon.

pith-pipeline@v0.9.0 · 5436 in / 1237 out tokens · 31637 ms · 2026-05-10T06:09:05.336231+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

and Imbens, G

Abadie, A. and Imbens, G. W. (2008). On the failure of the bootstrap for matching estimators. Econometrica, 76(6):1537–1557

work page 2008
[2]

Andrews, D. W. (1994). Empirical process methods in econometrics.Handbook of Econometrics, 4:2247–2294

work page 1994
[3]

Beran, R. (1987). Prepivoting to reduce level error of confidence sets.Biometrika, 74(3):457–468

work page 1987
[4]

and van der Laan, M

Cai, W. and van der Laan, M. (2020). Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (lasso) estimator.The International Journal of Biostatistics, 16(2):20170070

work page 2020
[5]

and Huang, J

Cheng, G. and Huang, J. Z. (2010). Bootstrap consistency for general semiparametric M-estimation. The Annals of Statistics, 38(5):2884–2915

work page 2010
[6]

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68

work page 2018
[7]

Chernozhukov, V., Chetverikov, D., and Kato, K. (2014). Gaussian approximation of suprema of empirical processes.The Annals of Statistics, 42(4):1564–1597. 28

work page 2014
[8]

Diciccio, T. J. and Romano, J. P. (1988). A review of bootstrap confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 50(3):338–354

work page 1988
[9]

Dukes, O., Vansteelandt, S., and Whitney, D. (2024). On doubly robust inference for double machine learning in semiparametric regression.Journal of Machine Learning Research, 25(279):1–46

work page 2024
[10]

Efron, B. (1979). Bootstrap methods: Another look at the jackknife.The Annals of Statistics, 7(1):1–26

work page 1979
[11]

Fingerhut, N., Sesia, M., and Romano, Y. (2022). Coordinated double machine learning. InInter- national Conference on Machine Learning, pages 6499–6513. PMLR

work page 2022
[12]

Gonnet, G. H. (1981). Expected length of the longest probe sequence in hash code searching. Journal of the ACM (JACM), 28(2):289–304. Hájek, J. (1961). Some extensions of the Wald-Wolfowitz-Noether theorem.The Annals of Mathe- matical Statistics, 32(2):506–523

work page 1981
[13]

Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals.The Annals of Statistics, 16(3):927–953

work page 1988
[14]

Imbens, G. W. (2024). Causal inference in the social sciences.Annual Review of Statistics and Its Application, 11:123–152

work page 2024
[15]

Kosorok, M. R. (2008).Introduction to Empirical Processes and Semiparametric Inference. Springer

work page 2008
[16]

Lin, Z., Ding, P., and Han, F. (2023). Estimation based on nearest neighbor matching: from density ratio to average treatment effect.Econometrica, 91(6):2187–2217

work page 2023
[17]

and Han, F

Lin, Z. and Han, F. (2024). On the failure of the bootstrap for Chatterjee’s rank correlation. Biometrika, 111(3):1063–1070

work page 2024
[18]

and Han, F

Lin, Z. and Han, F. (2025). On regression-adjusted imputation estimators of the average treatment effect.Journal of Econometrics, 251:106080

work page 2025
[19]

and Han, F

Lin, Z. and Han, F. (2026). On the consistency of bootstrap for matching estimators.Biometrika, 113(1):asag005

work page 2026
[20]

Luenberger, D. G. (1997).Optimization by Vector Space Methods. John Wiley and Sons

work page 1997
[21]

Mason, D. M. and Newton, M. A. (1992). A rank statistics approach to the consistency of a general bootstrap.The Annals of Statistics, 20(3):1611–1624

work page 1992
[22]

and Wellner, J

Praestgaard, J. and Wellner, J. A. (1993). Exchangeably weighted bootstraps of the general empir- ical process.The Annals of Probability, 21(4):2053–2086. 29

work page 1993
[23]

Balls into bins

Raab, M. and Steger, A. (1998). “Balls into bins”—a simple and tight analysis. InInternational Workshop on Randomization and Approximation Techniques in Computer Science, pages 159–170. Springer

work page 1998
[24]

Rubin, D. B. (1981). The bayesian bootstrap.The Annals of Statistics, 9(1):130–134

work page 1981
[25]

Consistency of the bootstrap for asymptotically linear estimators based on machine learning.arXiv preprint arXiv:2404.03064,

Tang, Z. and Westling, T. (2024). Consistency of the bootstrap for asymptotically linear estimators based on machine learning.arXiv preprint arXiv:2404.03064

work page arXiv 2024
[26]

Wellner, J. A. and Zhan, Y. (1996). Bootstrapping Z-estimators.University of Washington Depart- ment of Statistics Technical Report, 308(5)

work page 1996
[27]

Wu, C.-F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis.the Annals of Statistics, 14(4):1261–1295. 30

work page 1986

[1] [1]

and Imbens, G

Abadie, A. and Imbens, G. W. (2008). On the failure of the bootstrap for matching estimators. Econometrica, 76(6):1537–1557

work page 2008

[2] [2]

Andrews, D. W. (1994). Empirical process methods in econometrics.Handbook of Econometrics, 4:2247–2294

work page 1994

[3] [3]

Beran, R. (1987). Prepivoting to reduce level error of confidence sets.Biometrika, 74(3):457–468

work page 1987

[4] [4]

and van der Laan, M

Cai, W. and van der Laan, M. (2020). Nonparametric bootstrap inference for the targeted highly adaptive least absolute shrinkage and selection operator (lasso) estimator.The International Journal of Biostatistics, 16(2):20170070

work page 2020

[5] [5]

and Huang, J

Cheng, G. and Huang, J. Z. (2010). Bootstrap consistency for general semiparametric M-estimation. The Annals of Statistics, 38(5):2884–2915

work page 2010

[6] [6]

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68

work page 2018

[7] [7]

Chernozhukov, V., Chetverikov, D., and Kato, K. (2014). Gaussian approximation of suprema of empirical processes.The Annals of Statistics, 42(4):1564–1597. 28

work page 2014

[8] [8]

Diciccio, T. J. and Romano, J. P. (1988). A review of bootstrap confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 50(3):338–354

work page 1988

[9] [9]

Dukes, O., Vansteelandt, S., and Whitney, D. (2024). On doubly robust inference for double machine learning in semiparametric regression.Journal of Machine Learning Research, 25(279):1–46

work page 2024

[10] [10]

Efron, B. (1979). Bootstrap methods: Another look at the jackknife.The Annals of Statistics, 7(1):1–26

work page 1979

[11] [11]

Fingerhut, N., Sesia, M., and Romano, Y. (2022). Coordinated double machine learning. InInter- national Conference on Machine Learning, pages 6499–6513. PMLR

work page 2022

[12] [12]

Gonnet, G. H. (1981). Expected length of the longest probe sequence in hash code searching. Journal of the ACM (JACM), 28(2):289–304. Hájek, J. (1961). Some extensions of the Wald-Wolfowitz-Noether theorem.The Annals of Mathe- matical Statistics, 32(2):506–523

work page 1981

[13] [13]

Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals.The Annals of Statistics, 16(3):927–953

work page 1988

[14] [14]

Imbens, G. W. (2024). Causal inference in the social sciences.Annual Review of Statistics and Its Application, 11:123–152

work page 2024

[15] [15]

Kosorok, M. R. (2008).Introduction to Empirical Processes and Semiparametric Inference. Springer

work page 2008

[16] [16]

Lin, Z., Ding, P., and Han, F. (2023). Estimation based on nearest neighbor matching: from density ratio to average treatment effect.Econometrica, 91(6):2187–2217

work page 2023

[17] [17]

and Han, F

Lin, Z. and Han, F. (2024). On the failure of the bootstrap for Chatterjee’s rank correlation. Biometrika, 111(3):1063–1070

work page 2024

[18] [18]

and Han, F

Lin, Z. and Han, F. (2025). On regression-adjusted imputation estimators of the average treatment effect.Journal of Econometrics, 251:106080

work page 2025

[19] [19]

and Han, F

Lin, Z. and Han, F. (2026). On the consistency of bootstrap for matching estimators.Biometrika, 113(1):asag005

work page 2026

[20] [20]

Luenberger, D. G. (1997).Optimization by Vector Space Methods. John Wiley and Sons

work page 1997

[21] [21]

Mason, D. M. and Newton, M. A. (1992). A rank statistics approach to the consistency of a general bootstrap.The Annals of Statistics, 20(3):1611–1624

work page 1992

[22] [22]

and Wellner, J

Praestgaard, J. and Wellner, J. A. (1993). Exchangeably weighted bootstraps of the general empir- ical process.The Annals of Probability, 21(4):2053–2086. 29

work page 1993

[23] [23]

Balls into bins

Raab, M. and Steger, A. (1998). “Balls into bins”—a simple and tight analysis. InInternational Workshop on Randomization and Approximation Techniques in Computer Science, pages 159–170. Springer

work page 1998

[24] [24]

Rubin, D. B. (1981). The bayesian bootstrap.The Annals of Statistics, 9(1):130–134

work page 1981

[25] [25]

Consistency of the bootstrap for asymptotically linear estimators based on machine learning.arXiv preprint arXiv:2404.03064,

Tang, Z. and Westling, T. (2024). Consistency of the bootstrap for asymptotically linear estimators based on machine learning.arXiv preprint arXiv:2404.03064

work page arXiv 2024

[26] [26]

Wellner, J. A. and Zhan, Y. (1996). Bootstrapping Z-estimators.University of Washington Depart- ment of Statistics Technical Report, 308(5)

work page 1996

[27] [27]

Wu, C.-F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis.the Annals of Statistics, 14(4):1261–1295. 30

work page 1986