pith. machine review for the scientific record. sign in

arxiv: 2510.10245 · v2 · submitted 2025-10-11 · 📊 stat.ML · cs.LG· stat.ME

Kernel Treatment Effects with Adaptively Collected Data

Pith reviewed 2026-05-18 07:43 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords kernel methodstreatment effectsadaptive experimentsdistributional inferenceRKHScausal inferencetype-I errordoubly robust
0
0 comments X

The pith

A kernel method for testing full distributional differences in treatment outcomes remains valid when assignments adapt based on past data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first kernel-based procedure that lets researchers compare entire outcome distributions between treatment groups even though the experiment assigns treatments adaptively. Classical kernel tests assume independent samples, but adaptivity creates dependence that can invalidate p-values and type-I error rates. The authors split the data, learn a witness function on one part, and build a projected scalar statistic on the other part using doubly robust scores in a reproducing kernel Hilbert space. Sequential normalization of this statistic restores valid type-I error control. Experiments confirm the procedure stays calibrated for both mean shifts and differences in higher moments while outperforming methods restricted to average effects.

Core claim

The authors introduce a kernel treatment effect framework for adaptive data collection that combines doubly robust RKHS scores with a witness function learned on one data fold; inference is then performed on a held-out fold via a projected, sequentially normalized scalar statistic that maintains valid type-I error despite the dependence induced by the adaptive policy.

What carries the argument

The projected, sequentially normalized scalar statistic constructed from doubly robust RKHS scores and a learned witness function.

If this is right

  • Valid type-I error control is achieved for tests of both mean shifts and higher-moment or shape differences in outcome distributions.
  • The procedure remains calibrated when treatment assignment depends on past outcomes.
  • Power gains appear relative to adaptive baselines that only target scalar average effects.
  • The same split-sample structure supports inference on interventional distributions represented in an RKHS.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The sequential normalization step may generalize to other kernel-based causal estimators that must handle policy-induced dependence.
  • The framework could support adaptive experiments in settings such as online recommendation or sequential clinical trials where distributional outcomes matter.
  • Extensions might replace the fixed split with more efficient cross-fitting while preserving the type-I guarantee.

Load-bearing premise

The doubly robust property of the RKHS scores continues to hold even though the adaptive assignment rule creates statistical dependence across observations.

What would settle it

A Monte Carlo study under a known null hypothesis with a specific adaptive policy in which the empirical rejection rate of the new test exceeds the nominal significance level.

Figures

Figures reproduced from arXiv: 2510.10245 by Arthur Gretton, Bariscan Bozkurt, Houssam Zenati.

Figure 1
Figure 1. Figure 1: Histogram of the miscalibrated DR-xKTE statistic over [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of 200 simulations of VS-DR-KTE under the null in the adaptive setting with T = 1000: (A) Histogram with KDE and standard normal pdf, (B) Normal Q-Q plot, (C) False positives against sample sizes. The results show approximate Gaussian behaviour and controlled type-I error. Sketch of proof. We first reduce to an oracle setting. Sample splitting fixes the nuisance µˆ (r) within each evaluation f… view at source ↗
Figure 3
Figure 3. Figure 3: True positive rates (200 simulations, Scenarios II–IV). Mean-focused baselines (CADR/AW-AIPW) achieve matching performance on II; VS-DR-KTE shows markedly higher power on III–IV (higher-moment shifts). 7.2 IHDP dataset We evaluate our method on the Infant Health and Development Program (IHDP) data [20], following the same design as in [31]: after removing missing rows we retain 908 units with 18 covariates… view at source ↗
Figure 4
Figure 4. Figure 4: Observational samples from the dSprite data in Scenario IV [PITH_FULL_IMAGE:figures/full_fig_p045_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Counterfactual pairs from the dSprite data in Scenario IV [PITH_FULL_IMAGE:figures/full_fig_p046_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Calibration of VS-DR-KTE under the null hypothesis (Scenario I) in the adaptive setting for the [PITH_FULL_IMAGE:figures/full_fig_p047_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Power comparison (true positive rates) for the linear model across Scenarios II–IV, based on [PITH_FULL_IMAGE:figures/full_fig_p047_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Demonstration of the Calibration of VS-DR-KTE in the adaptive setting for the sigmoidal model [PITH_FULL_IMAGE:figures/full_fig_p048_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparative Power results (true positive rates) for the sigmoidal model across Scenarios II–IV, using [PITH_FULL_IMAGE:figures/full_fig_p048_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Assessment of the Calibration of VS-DR-KTE under the null hypothesis (Scenario I) in the [PITH_FULL_IMAGE:figures/full_fig_p049_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparative Power Analysis (true positive rates) for the IHDP dataset across Scenarios II–IV, [PITH_FULL_IMAGE:figures/full_fig_p049_11.png] view at source ↗
read the original abstract

Adaptive experiments improve efficiency by adjusting treatment assignments based on past outcomes, but this adaptivity breaks the i.i.d.\ assumptions that underpin classical asymptotics. At the same time, many questions of interest are distributional, extending beyond average effects. Kernel treatment effects (KTE) provide a flexible framework by representing interventional outcome distributions in an RKHS and comparing them via kernel distances. We present the first kernel-based framework for distributional inference under adaptive data collection. Our method combines doubly robust RKHS scores with a witness function learned on one fold, and performs inference on a second fold using a projected, sequentially normalized scalar statistic with valid type-I error. Experiments show that the resulting procedure is well calibrated and effective for both mean shifts and higher-moment differences, outperforming adaptive baselines limited to scalar effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents the first kernel-based framework for distributional inference on treatment effects under adaptive data collection. It combines doubly robust RKHS scores with a witness function learned on one data fold and performs inference on a held-out fold via a projected, sequentially normalized scalar statistic that is claimed to deliver valid type-I error control. Experiments are reported to show calibration for both mean shifts and higher-moment differences, with outperformance relative to adaptive baselines restricted to scalar effects.

Significance. If the type-I error guarantee is rigorously established, the work would constitute a meaningful extension of kernel treatment effect methods to adaptive experimental settings, enabling flexible nonparametric distributional comparisons where classical i.i.d. asymptotics fail. The combination of cross-fitting, doubly robust scores, and sequential normalization is a natural technical direction, and the empirical demonstration of calibration for non-mean effects is a positive feature.

major comments (2)
  1. [§4 (Inference procedure) or Theorem on type-I error] The validity of type-I error for the sequentially normalized statistic (described in the abstract and presumably formalized in §4 or Theorem 2) rests on the doubly robust RKHS scores remaining conditionally mean-zero under the adaptive filtration. The cross-fit construction learns the witness on one fold and projects on the second, but the manuscript does not explicitly verify that estimation error in the witness remains orthogonal to future treatment probabilities induced by the adaptive policy; without this step the martingale property required for the normalized increments can fail even though double robustness holds in the non-adaptive case.
  2. [Theorem 1 / Proposition on asymptotic validity] The claim that the procedure controls type-I error for arbitrary adaptive assignment rules is load-bearing for the central contribution. The provided description does not state additional assumptions on the adaptive policy (e.g., bounded propensity scores or limited dependence) that would be needed to close the argument; if such conditions are implicit they should be made explicit and the proof adjusted accordingly.
minor comments (2)
  1. [§2 (Background)] Notation for the RKHS embeddings and the witness function projection should be introduced with a short table or diagram in §2 to improve readability for readers unfamiliar with kernel mean embeddings.
  2. [§5 (Experiments)] The experimental section reports calibration and outperformance but omits the precise adaptive policies simulated and the number of Monte Carlo replications; adding these details would strengthen reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our work. The points raised regarding the martingale property and explicit assumptions for type-I error control under adaptivity are well-taken. We address each comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: [§4 (Inference procedure) or Theorem on type-I error] The validity of type-I error for the sequentially normalized statistic (described in the abstract and presumably formalized in §4 or Theorem 2) rests on the doubly robust RKHS scores remaining conditionally mean-zero under the adaptive filtration. The cross-fit construction learns the witness on one fold and projects on the second, but the manuscript does not explicitly verify that estimation error in the witness remains orthogonal to future treatment probabilities induced by the adaptive policy; without this step the martingale property required for the normalized increments can fail even though double robustness holds in the non-adaptive case.

    Authors: We appreciate the referee's identification of this expository gap. The cross-fit design ensures the witness function is estimated solely from the first fold and is therefore fixed (and measurable with respect to the past) when the score is evaluated on the second fold. Double robustness of the RKHS score then guarantees that its conditional expectation given the adaptive filtration is exactly zero, independent of the estimation error in the witness. This orthogonality preserves the martingale difference property for the normalized increments. We will add an explicit supporting lemma in the appendix that derives the required conditional orthogonality between the witness estimation error and the adaptive treatment probabilities. revision: yes

  2. Referee: [Theorem 1 / Proposition on asymptotic validity] The claim that the procedure controls type-I error for arbitrary adaptive assignment rules is load-bearing for the central contribution. The provided description does not state additional assumptions on the adaptive policy (e.g., bounded propensity scores or limited dependence) that would be needed to close the argument; if such conditions are implicit they should be made explicit and the proof adjusted accordingly.

    Authors: We agree that the assumptions on the adaptive policy must be stated explicitly rather than left implicit. The current proof relies on the propensity scores being uniformly bounded away from zero and one (to ensure the scores remain well-defined and the variance is controlled) and on the policy generating a filtration under which the cross-fit scores form a martingale difference sequence with bounded moments. These conditions are standard for adaptive inference but were not highlighted in the theorem statement. We will revise the statement of Theorem 2 to list these assumptions explicitly and update the proof to reference them at each step where they are used. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper's claimed framework combines existing doubly robust RKHS scores with cross-fitting (witness function on one fold, inference on the second) and sequential normalization to obtain a scalar statistic whose type-I error is asserted to remain valid under adaptive assignment via martingale properties. No equation or step is shown to define the target distributional inference or the normalized statistic in terms of itself, nor does any central result reduce by construction to a fitted parameter or to a self-citation whose content is unverified. The load-bearing assumption—that the doubly robust scores remain conditionally mean-zero under the adaptive filtration—is presented as a substantive extension rather than a tautology, and the abstract and described procedure build on prior kernel and doubly robust literature without renaming known results or smuggling ansatzes via self-citation. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; limited visibility into specific assumptions or parameters.

axioms (2)
  • domain assumption Doubly robust property of RKHS scores holds under adaptive sampling
    Invoked to justify the scores used in the inference procedure.
  • domain assumption Sequential normalization yields valid type-I error under adaptivity
    Central to the claim of valid inference on the second fold.

pith-pipeline@v0.9.0 · 5664 in / 1298 out tokens · 33486 ms · 2026-05-18T07:43:57.443896+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Semiparametric Efficient Test for Interpretable Distributional Treatment Effects

    stat.ML 2026-05 unverdicted novelty 7.0

    DR-ME is the first semiparametrically efficient finite-location kernel test for interpretable distributional treatment effects, using orthogonal doubly robust features derived from observational data.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper

  1. [1]

    Athey, D

    S. Athey, D. Eckles, and G. W. Imbens. Design and analysis of experiments in the digital age.Annual Review of Economics, 14:779–806, 2022. doi: 10.1146/annurev-economics-051520-023803

  2. [2]

    Berlinet and C

    A. Berlinet and C. Thomas-Agnan.Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011

  3. [3]

    Bibaut and N

    A. Bibaut and N. Kallus. Demystifying inference after adaptive experiments.Annual Review of Statistics and its Application, 12(1):407–423, 2025

  4. [4]

    Bibaut, M

    A. Bibaut, M. Dimakopoulou, N. Kallus, A. Chambaz, and M. van Der Laan. Post-contextual-bandit inference.Advances in neural information processing systems, 34:28548–28559, 2021. 11

  5. [5]

    Bosq.Linear processes in function spaces: theory and applications, volume 149

    D. Bosq.Linear processes in function spaces: theory and applications, volume 149. Springer Science & Business Media, 2000

  6. [6]

    Caria, B

    S. Caria, B. Gordon, M. Kasy, et al. Adaptive experiments in economics.Annual Review of Economics, 15:615–647, 2023. doi: 10.1146/annurev-economics-091622-031912

  7. [7]

    Chernozhukov, I

    V. Chernozhukov, I. Fernández-Val, and B. Melly. Inference on counterfactual distributions.Econometrica, 81(6):2205–2268, 2013

  8. [8]

    Chow and M

    S.-C. Chow and M. Chang.Adaptive Design Methods in Clinical Trials. Chapman & Hall/CRC, 2nd edition, 2011

  9. [9]

    DoubleDebiasedMachineLearningNonparametricInferencewithContinuous Treatments

    K.ColangeloandY.-Y.Lee. DoubleDebiasedMachineLearningNonparametricInferencewithContinuous Treatments. Technical report, 2020. URLhttps://arxiv.org/pdf/2004.03036

  10. [10]

    Dabney, G

    W. Dabney, G. Ostrovski, D. Silver, and R. Munos. Implicit quantile networks for distributional reinforcement learning. InInternational conference on machine learning, pages 1096–1105. PMLR, 2018

  11. [11]

    R. M. Dudley.Real Analysis and Probability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2002

  12. [12]

    Fawkes, R

    J. Fawkes, R. Hu, R. J. Evans, and D. Sejdinovic. Doubly robust kernel statistics for testing distributional treatment effects.Transactions on Machine Learning Research, 2024

  13. [13]

    Garivier and E

    A. Garivier and E. Kaufmann. Optimal best arm identification with fixed confidence. InProceedings of the 29th Conference on Learning Theory (COLT), pages 998–1027, 2016

  14. [14]

    T. Gärtner. A survey of kernels for structured data.ACM SIGKDD explorations newsletter, 5(1):49–58, 2003

  15. [15]

    A. Gretton. Introduction to rkhs, and some simple kernel algorithms.Adv. Top. Mach. Learn. Lecture Conducted from University College London, 16(5-3):2, 2013

  16. [16]

    Gretton, K

    A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012

  17. [17]

    Gretton, K

    A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test.The Journal of Machine Learning Research, 13(1):723–773, 2012

  18. [18]

    Hadad, D

    V. Hadad, D. A. Hirshberg, R. Zhan, S. Wager, and S. Athey. Confidence intervals for policy evaluation in adaptive experiments.Proceedings of the national academy of sciences, 118(15):e2014602118, 2021

  19. [19]

    Hall and C

    P. Hall and C. C. Heyde.Martingale limit theory and its application. Academic press, 1980

  20. [20]

    J. L. Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011. doi: 10.1198/jcgs.2010.08162. URLhttps://doi.org/10.1198/jcgs. 2010.08162

  21. [21]

    Hirano and J

    K. Hirano and J. R. Porter. Asymptotic representations for sequential decisions, adaptive experiments, and batched bandits.arXiv preprint arXiv:2302.03117, 2023

  22. [22]

    S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform chernoff bounds via nonnegative supermartingales.Annals of Statistics, 49(2):1055–1080, 2021

  23. [23]

    Hsing and R

    T. Hsing and R. Eubank.Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, 2015

  24. [24]

    Huang, L

    A. Huang, L. Leqi, Z. Lipton, and K. Azizzadenesheli. Off-policy risk assessment in contextual bandits. InAdvances in Neural Information Processing Systems, volume 34, pages 23714–23726, 2021. 12

  25. [25]

    Kanagawa and K

    M. Kanagawa and K. Fukumizu. Recovering Distributions from Gaussian RKHS Embeddings. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume33, 2014

  26. [26]

    Kim and A

    I. Kim and A. Ramdas. Dimension-agnostic inference using cross u-statistics.Bernoulli, 30(1):683–711, 2024

  27. [27]

    Lattimore and C

    T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 2020. doi: 10.1017/ 9781108571401

  28. [28]

    L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. InProceedings of the 19th International Conference on World Wide Web (WWW), pages 661–670, 2010

  29. [29]

    Z. Li, D. Meunier, M. Mollenhauer, and A. Gretton. Optimal rates for regularized conditional mean embedding learning.Advances in Neural Information Processing Systems, 35:4433–4445, 2022

  30. [30]

    Luedtke and I

    A. Luedtke and I. Chung. One-step estimation of differentiable Hilbert-valued parameters.The Annals of Statistics, 52(4):1534 – 1563, 2024

  31. [31]

    Martinez Taboada, A

    D. Martinez Taboada, A. Ramdas, and E. Kennedy. An efficient doubly-robust test for the kernel treatment effect. InAdvances in Neural Information Processing Systems, volume 36, pages 59924–59952, 2023

  32. [32]

    Matthey, I

    L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017

  33. [33]

    Muandet, M

    K. Muandet, M. Kanagawa, S. Saengkyongam, and S. Marukatat. Counterfactual mean embeddings. Journal of Machine Learning Research, 22(162):1–71, 2021

  34. [34]

    Park and K

    J. Park and K. Muandet. A measure-theoretic approach to kernel conditional mean embeddings.Advances in Neural Information Processing Systems, 2020

  35. [35]

    J. Park, U. Shalit, B. Schölkopf, and K. Muandet. Conditional distributional treatment effect with kernel conditional mean embeddings and u-statistic regression. InInternational conference on machine learning, pages 8401–8412, 2021

  36. [36]

    Perchet, P

    V. Perchet, P. Rigollet, S. Chassang, and E. Snowberg. Batched bandit problems.The Annals of Statistics, 44:660–681, 04 2016

  37. [37]

    I. Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

  38. [38]

    Qiang and M

    S. Qiang and M. Bayati. Dynamic pricing with demand learning and strategic consumers: An application to online retail.Operations Research, 64(4):931–944, 2016. doi: 10.1287/opre.2016.1514

  39. [39]

    R. T. Rockafellar, S. Uryasev, et al. Optimization of conditional value-at-risk.Journal of risk, 2:21–42, 2000

  40. [40]

    C. Rothe. Nonparametric estimation of distributional policy effects.Journal of Econometrics, 155(1): 56–70, 2010

  41. [41]

    Shekhar, I

    S. Shekhar, I. Kim, and A. Ramdas. A permutation-free kernel independence test.Journal of Machine Learning Research, 24(369):1–68, 2023

  42. [42]

    Simon.Trace Ideals and Their Applications, volume 120 ofMathematical Surveys and Monographs

    B. Simon.Trace Ideals and Their Applications, volume 120 ofMathematical Surveys and Monographs. American Mathematical Society, 2nd edition, 2005. 13

  43. [43]

    Singh, L

    R. Singh, L. Xu, and A. Gretton. Kernel methods for causal functions: dose, heterogeneous and incremental response curves.Biometrika, 111(2):497–516, 2024

  44. [44]

    Smola, A

    A. Smola, A. Gretton, L. Song, and B. Schölkopf. A hilbert space embedding for distributions. In International conference on algorithmic learning theory, pages 13–31. Springer, 2007

  45. [45]

    L. Song, J. Huang, A. Smola, and K. Fukumizu. Hilbert space embeddings of conditional distributions with applications to dynamical systems. InProceedings of the 26th Annual International Conference on Machine Learning, pages 961–968, 2009

  46. [46]

    Sriperumbudur, A

    B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. Lanckriet. Hilbert space embeddings and metrics on probability measures.Journal of Machine Learning Research, 11:1517–1561, 2010

  47. [47]

    A. v. d. Vaart and J. A. Wellner. Weak convergence and empirical processes with applications to statistics. Journal of the Royal Statistical Society-Series A Statistics in Society, 160(3):596–608, 1997

  48. [48]

    A. W. van der Vaart.Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998. doi: 10.1017/CBO9780511802256

  49. [49]

    Waudby-Smith and A

    I. Waudby-Smith and A. Ramdas. Time-uniform central limit theorems and confidence sequences. In International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 10663–10672, 2021

  50. [50]

    Xu and A

    L. Xu and A. Gretton. Causal benchmark based on disentangled image dataset. 2023

  51. [51]

    Zenati, E

    H. Zenati, E. Diemert, M. Martin, J. Mairal, and P. Gaillard. Sequential counterfactual risk minimization. InInternational Conference on Machine Learning, pages 40681–40706. PMLR, 2023

  52. [52]

    Zenati, J

    H. Zenati, J. Abécassis, J. Josse, and B. Thirion. Double debiased machine learning for mediation analysis with continuous treatments.arXiv preprint arXiv:2503.06156, 2025

  53. [53]

    Zenati, B

    H. Zenati, B. Bozkurt, and A. Gretton. Doubly-robust estimation of counterfactual policy mean embeddings, 2025. URLhttps://arxiv.org/abs/2506.02793

  54. [54]

    Zhang, L

    K. Zhang, L. Janson, and S. Murphy. Inference for batched bandits.Advances in neural information processing systems, 33:9818–9829, 2020

  55. [55]

    square–integrable linear map

    K. Zhang, L. Janson, and S. Murphy. Statistical inference with m-estimators on adaptively collected data.Advances in neural information processing systems, 34:7460–7471, 2021. 14 Appendix This appendix is organized as follows: – Appendix 9: summary of the notations used in the paper and in the analysis. –Appendix 10: a review of reproducing kernel Hilbert...

  56. [56]

    (K(00) X,r +λI) −1K(00) X,r (K(00) X,r +λI) −1K(01) X,r 0 0 # , µ 1,r =

    are needed. PlainL2 nuisance consistency and the mild Cesàro stabilization of the logging policy suffice to deliver the predictable quadratic-variation limit and Bosq’s (B2). We now provide an additional lemma on the convergence of the inverse of the average of conditional variance estimators. Lemma 11.3(Average stabilizer).Let bωt be estimators with rati...