arxiv: 2510.10245 · v2 · submitted 2025-10-11 · 📊 stat.ML · cs.LG· stat.ME

Kernel Treatment Effects with Adaptively Collected Data

Houssam Zenati , Bariscan Bozkurt , Arthur Gretton This is my paper

Pith reviewed 2026-05-18 07:43 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords kernel methodstreatment effectsadaptive experimentsdistributional inferenceRKHScausal inferencetype-I errordoubly robust

0 comments

The pith

A kernel method for testing full distributional differences in treatment outcomes remains valid when assignments adapt based on past data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first kernel-based procedure that lets researchers compare entire outcome distributions between treatment groups even though the experiment assigns treatments adaptively. Classical kernel tests assume independent samples, but adaptivity creates dependence that can invalidate p-values and type-I error rates. The authors split the data, learn a witness function on one part, and build a projected scalar statistic on the other part using doubly robust scores in a reproducing kernel Hilbert space. Sequential normalization of this statistic restores valid type-I error control. Experiments confirm the procedure stays calibrated for both mean shifts and differences in higher moments while outperforming methods restricted to average effects.

Core claim

The authors introduce a kernel treatment effect framework for adaptive data collection that combines doubly robust RKHS scores with a witness function learned on one data fold; inference is then performed on a held-out fold via a projected, sequentially normalized scalar statistic that maintains valid type-I error despite the dependence induced by the adaptive policy.

What carries the argument

The projected, sequentially normalized scalar statistic constructed from doubly robust RKHS scores and a learned witness function.

If this is right

Valid type-I error control is achieved for tests of both mean shifts and higher-moment or shape differences in outcome distributions.
The procedure remains calibrated when treatment assignment depends on past outcomes.
Power gains appear relative to adaptive baselines that only target scalar average effects.
The same split-sample structure supports inference on interventional distributions represented in an RKHS.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sequential normalization step may generalize to other kernel-based causal estimators that must handle policy-induced dependence.
The framework could support adaptive experiments in settings such as online recommendation or sequential clinical trials where distributional outcomes matter.
Extensions might replace the fixed split with more efficient cross-fitting while preserving the type-I guarantee.

Load-bearing premise

The doubly robust property of the RKHS scores continues to hold even though the adaptive assignment rule creates statistical dependence across observations.

What would settle it

A Monte Carlo study under a known null hypothesis with a specific adaptive policy in which the empirical rejection rate of the new test exceeds the nominal significance level.

Figures

Figures reproduced from arXiv: 2510.10245 by Arthur Gretton, Bariscan Bozkurt, Houssam Zenati.

**Figure 2.** Figure 2: Illustration of 200 simulations of VS-DR-KTE under the null in the adaptive setting with T = 1000: (A) Histogram with KDE and standard normal pdf, (B) Normal Q-Q plot, (C) False positives against sample sizes. The results show approximate Gaussian behaviour and controlled type-I error. Sketch of proof. We first reduce to an oracle setting. Sample splitting fixes the nuisance µˆ (r) within each evaluation f… view at source ↗

**Figure 3.** Figure 3: True positive rates (200 simulations, Scenarios II–IV). Mean-focused baselines (CADR/AW-AIPW) achieve matching performance on II; VS-DR-KTE shows markedly higher power on III–IV (higher-moment shifts). 7.2 IHDP dataset We evaluate our method on the Infant Health and Development Program (IHDP) data [20], following the same design as in [31]: after removing missing rows we retain 908 units with 18 covariates… view at source ↗

**Figure 4.** Figure 4: Observational samples from the dSprite data in Scenario IV [PITH_FULL_IMAGE:figures/full_fig_p045_4.png] view at source ↗

**Figure 5.** Figure 5: Counterfactual pairs from the dSprite data in Scenario IV [PITH_FULL_IMAGE:figures/full_fig_p046_5.png] view at source ↗

**Figure 6.** Figure 6: Calibration of VS-DR-KTE under the null hypothesis (Scenario I) in the adaptive setting for the [PITH_FULL_IMAGE:figures/full_fig_p047_6.png] view at source ↗

**Figure 7.** Figure 7: Power comparison (true positive rates) for the linear model across Scenarios II–IV, based on [PITH_FULL_IMAGE:figures/full_fig_p047_7.png] view at source ↗

**Figure 8.** Figure 8: Demonstration of the Calibration of VS-DR-KTE in the adaptive setting for the sigmoidal model [PITH_FULL_IMAGE:figures/full_fig_p048_8.png] view at source ↗

**Figure 9.** Figure 9: Comparative Power results (true positive rates) for the sigmoidal model across Scenarios II–IV, using [PITH_FULL_IMAGE:figures/full_fig_p048_9.png] view at source ↗

**Figure 10.** Figure 10: Assessment of the Calibration of VS-DR-KTE under the null hypothesis (Scenario I) in the [PITH_FULL_IMAGE:figures/full_fig_p049_10.png] view at source ↗

**Figure 11.** Figure 11: Comparative Power Analysis (true positive rates) for the IHDP dataset across Scenarios II–IV, [PITH_FULL_IMAGE:figures/full_fig_p049_11.png] view at source ↗

read the original abstract

Adaptive experiments improve efficiency by adjusting treatment assignments based on past outcomes, but this adaptivity breaks the i.i.d.\ assumptions that underpin classical asymptotics. At the same time, many questions of interest are distributional, extending beyond average effects. Kernel treatment effects (KTE) provide a flexible framework by representing interventional outcome distributions in an RKHS and comparing them via kernel distances. We present the first kernel-based framework for distributional inference under adaptive data collection. Our method combines doubly robust RKHS scores with a witness function learned on one fold, and performs inference on a second fold using a projected, sequentially normalized scalar statistic with valid type-I error. Experiments show that the resulting procedure is well calibrated and effective for both mean shifts and higher-moment differences, outperforming adaptive baselines limited to scalar effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends kernel treatment effects to adaptive data via cross-fit doubly robust scores and a sequentially normalized martingale statistic, but the type-I guarantee depends on whether the cross-fold fully protects the conditional mean-zero property under arbitrary policies.

read the letter

The main thing to know is that this work gives a kernel method for comparing full outcome distributions when treatment assignment depends on past data, using a cross-fold split where a witness function is fit on one part and the test runs on the other with sequential normalization for claimed type-I control. It is the first explicit attempt to carry kernel treatment effects into the adaptive setting rather than assuming i.i.d. samples. They combine existing doubly robust RKHS scores with the witness projection and normalization step, and the experiments indicate the procedure stays calibrated for both location shifts and higher-moment differences while beating scalar-effect baselines. That combination is genuinely new relative to prior KTE papers and to adaptive inference work that stays with averages. The approach is technically clean on paper and the empirical checks are a reasonable start for showing practical behavior. The soft spot is exactly the one the stress-test flags: the martingale property for the normalized increments requires that the doubly robust scores remain conditionally mean-zero given the filtration induced by the adaptive policy. If the cross-fit does not fully orthogonalize the witness estimation error from future assignment probabilities, that zero-mean property can break even though it holds in the non-adaptive case. The abstract states the claim but does not show the derivation, so the strength of the result turns on whether the full proof closes this gap or only covers milder adaptivity. The rest of the technical setup looks standard and the citation pattern is appropriate. This is for people working on causal inference for sequential experiments or online platforms who already use kernel embeddings and want to move beyond average effects. A reader comfortable with RKHS methods and martingale arguments will get the most out of it. It is worth sending to peer review because the problem is real, the proposed fix is concrete, and the empirical results are at least suggestive; a referee can check the martingale argument in detail and ask for more stress tests on strong adaptivity.

Referee Report

2 major / 2 minor

Summary. The paper presents the first kernel-based framework for distributional inference on treatment effects under adaptive data collection. It combines doubly robust RKHS scores with a witness function learned on one data fold and performs inference on a held-out fold via a projected, sequentially normalized scalar statistic that is claimed to deliver valid type-I error control. Experiments are reported to show calibration for both mean shifts and higher-moment differences, with outperformance relative to adaptive baselines restricted to scalar effects.

Significance. If the type-I error guarantee is rigorously established, the work would constitute a meaningful extension of kernel treatment effect methods to adaptive experimental settings, enabling flexible nonparametric distributional comparisons where classical i.i.d. asymptotics fail. The combination of cross-fitting, doubly robust scores, and sequential normalization is a natural technical direction, and the empirical demonstration of calibration for non-mean effects is a positive feature.

major comments (2)

[§4 (Inference procedure) or Theorem on type-I error] The validity of type-I error for the sequentially normalized statistic (described in the abstract and presumably formalized in §4 or Theorem 2) rests on the doubly robust RKHS scores remaining conditionally mean-zero under the adaptive filtration. The cross-fit construction learns the witness on one fold and projects on the second, but the manuscript does not explicitly verify that estimation error in the witness remains orthogonal to future treatment probabilities induced by the adaptive policy; without this step the martingale property required for the normalized increments can fail even though double robustness holds in the non-adaptive case.
[Theorem 1 / Proposition on asymptotic validity] The claim that the procedure controls type-I error for arbitrary adaptive assignment rules is load-bearing for the central contribution. The provided description does not state additional assumptions on the adaptive policy (e.g., bounded propensity scores or limited dependence) that would be needed to close the argument; if such conditions are implicit they should be made explicit and the proof adjusted accordingly.

minor comments (2)

[§2 (Background)] Notation for the RKHS embeddings and the witness function projection should be introduced with a short table or diagram in §2 to improve readability for readers unfamiliar with kernel mean embeddings.
[§5 (Experiments)] The experimental section reports calibration and outperformance but omits the precise adaptive policies simulated and the number of Monte Carlo replications; adding these details would strengthen reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our work. The points raised regarding the martingale property and explicit assumptions for type-I error control under adaptivity are well-taken. We address each comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [§4 (Inference procedure) or Theorem on type-I error] The validity of type-I error for the sequentially normalized statistic (described in the abstract and presumably formalized in §4 or Theorem 2) rests on the doubly robust RKHS scores remaining conditionally mean-zero under the adaptive filtration. The cross-fit construction learns the witness on one fold and projects on the second, but the manuscript does not explicitly verify that estimation error in the witness remains orthogonal to future treatment probabilities induced by the adaptive policy; without this step the martingale property required for the normalized increments can fail even though double robustness holds in the non-adaptive case.

Authors: We appreciate the referee's identification of this expository gap. The cross-fit design ensures the witness function is estimated solely from the first fold and is therefore fixed (and measurable with respect to the past) when the score is evaluated on the second fold. Double robustness of the RKHS score then guarantees that its conditional expectation given the adaptive filtration is exactly zero, independent of the estimation error in the witness. This orthogonality preserves the martingale difference property for the normalized increments. We will add an explicit supporting lemma in the appendix that derives the required conditional orthogonality between the witness estimation error and the adaptive treatment probabilities. revision: yes
Referee: [Theorem 1 / Proposition on asymptotic validity] The claim that the procedure controls type-I error for arbitrary adaptive assignment rules is load-bearing for the central contribution. The provided description does not state additional assumptions on the adaptive policy (e.g., bounded propensity scores or limited dependence) that would be needed to close the argument; if such conditions are implicit they should be made explicit and the proof adjusted accordingly.

Authors: We agree that the assumptions on the adaptive policy must be stated explicitly rather than left implicit. The current proof relies on the propensity scores being uniformly bounded away from zero and one (to ensure the scores remain well-defined and the variance is controlled) and on the policy generating a filtration under which the cross-fit scores form a martingale difference sequence with bounded moments. These conditions are standard for adaptive inference but were not highlighted in the theorem statement. We will revise the statement of Theorem 2 to list these assumptions explicitly and update the proof to reference them at each step where they are used. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper's claimed framework combines existing doubly robust RKHS scores with cross-fitting (witness function on one fold, inference on the second) and sequential normalization to obtain a scalar statistic whose type-I error is asserted to remain valid under adaptive assignment via martingale properties. No equation or step is shown to define the target distributional inference or the normalized statistic in terms of itself, nor does any central result reduce by construction to a fitted parameter or to a self-citation whose content is unverified. The load-bearing assumption—that the doubly robust scores remain conditionally mean-zero under the adaptive filtration—is presented as a substantive extension rather than a tautology, and the abstract and described procedure build on prior kernel and doubly robust literature without renaming known results or smuggling ansatzes via self-citation. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; limited visibility into specific assumptions or parameters.

axioms (2)

domain assumption Doubly robust property of RKHS scores holds under adaptive sampling
Invoked to justify the scores used in the inference procedure.
domain assumption Sequential normalization yields valid type-I error under adaptivity
Central to the claim of valid inference on the second fold.

pith-pipeline@v0.9.0 · 5664 in / 1298 out tokens · 33486 ms · 2026-05-18T07:43:57.443896+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present the first kernel-based framework for distributional inference under adaptive data collection. Our method combines doubly robust RKHS scores with a witness function learned on one fold, and performs inference on a second fold using a projected, sequentially normalized scalar statistic
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.5 (Asymptotic normality of the stabilized RKHS estimator) ... Hilbert-space martingale CLT

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Semiparametric Efficient Test for Interpretable Distributional Treatment Effects
stat.ML 2026-05 unverdicted novelty 7.0

DR-ME is the first semiparametrically efficient finite-location kernel test for interpretable distributional treatment effects, using orthogonal doubly robust features derived from observational data.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper

[1]

Athey, D

S. Athey, D. Eckles, and G. W. Imbens. Design and analysis of experiments in the digital age.Annual Review of Economics, 14:779–806, 2022. doi: 10.1146/annurev-economics-051520-023803

work page doi:10.1146/annurev-economics-051520-023803 2022
[2]

Berlinet and C

A. Berlinet and C. Thomas-Agnan.Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011

work page 2011
[3]

Bibaut and N

A. Bibaut and N. Kallus. Demystifying inference after adaptive experiments.Annual Review of Statistics and its Application, 12(1):407–423, 2025

work page 2025
[4]

Bibaut, M

A. Bibaut, M. Dimakopoulou, N. Kallus, A. Chambaz, and M. van Der Laan. Post-contextual-bandit inference.Advances in neural information processing systems, 34:28548–28559, 2021. 11

work page 2021
[5]

Bosq.Linear processes in function spaces: theory and applications, volume 149

D. Bosq.Linear processes in function spaces: theory and applications, volume 149. Springer Science & Business Media, 2000

work page 2000
[6]

Caria, B

S. Caria, B. Gordon, M. Kasy, et al. Adaptive experiments in economics.Annual Review of Economics, 15:615–647, 2023. doi: 10.1146/annurev-economics-091622-031912

work page doi:10.1146/annurev-economics-091622-031912 2023
[7]

Chernozhukov, I

V. Chernozhukov, I. Fernández-Val, and B. Melly. Inference on counterfactual distributions.Econometrica, 81(6):2205–2268, 2013

work page 2013
[8]

Chow and M

S.-C. Chow and M. Chang.Adaptive Design Methods in Clinical Trials. Chapman & Hall/CRC, 2nd edition, 2011

work page 2011
[9]

DoubleDebiasedMachineLearningNonparametricInferencewithContinuous Treatments

K.ColangeloandY.-Y.Lee. DoubleDebiasedMachineLearningNonparametricInferencewithContinuous Treatments. Technical report, 2020. URLhttps://arxiv.org/pdf/2004.03036

work page arXiv 2020
[10]

Dabney, G

W. Dabney, G. Ostrovski, D. Silver, and R. Munos. Implicit quantile networks for distributional reinforcement learning. InInternational conference on machine learning, pages 1096–1105. PMLR, 2018

work page 2018
[11]

R. M. Dudley.Real Analysis and Probability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2002

work page 2002
[12]

Fawkes, R

J. Fawkes, R. Hu, R. J. Evans, and D. Sejdinovic. Doubly robust kernel statistics for testing distributional treatment effects.Transactions on Machine Learning Research, 2024

work page 2024
[13]

Garivier and E

A. Garivier and E. Kaufmann. Optimal best arm identification with fixed confidence. InProceedings of the 29th Conference on Learning Theory (COLT), pages 998–1027, 2016

work page 2016
[14]

T. Gärtner. A survey of kernels for structured data.ACM SIGKDD explorations newsletter, 5(1):49–58, 2003

work page 2003
[15]

A. Gretton. Introduction to rkhs, and some simple kernel algorithms.Adv. Top. Mach. Learn. Lecture Conducted from University College London, 16(5-3):2, 2013

work page 2013
[16]

Gretton, K

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012

work page 2012
[17]

Gretton, K

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test.The Journal of Machine Learning Research, 13(1):723–773, 2012

work page 2012
[18]

Hadad, D

V. Hadad, D. A. Hirshberg, R. Zhan, S. Wager, and S. Athey. Confidence intervals for policy evaluation in adaptive experiments.Proceedings of the national academy of sciences, 118(15):e2014602118, 2021

work page 2021
[19]

Hall and C

P. Hall and C. C. Heyde.Martingale limit theory and its application. Academic press, 1980

work page 1980
[20]

J. L. Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011. doi: 10.1198/jcgs.2010.08162. URLhttps://doi.org/10.1198/jcgs. 2010.08162

work page doi:10.1198/jcgs.2010.08162 2011
[21]

Hirano and J

K. Hirano and J. R. Porter. Asymptotic representations for sequential decisions, adaptive experiments, and batched bandits.arXiv preprint arXiv:2302.03117, 2023

work page arXiv 2023
[22]

S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform chernoff bounds via nonnegative supermartingales.Annals of Statistics, 49(2):1055–1080, 2021

work page 2021
[23]

Hsing and R

T. Hsing and R. Eubank.Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, 2015

work page 2015
[24]

Huang, L

A. Huang, L. Leqi, Z. Lipton, and K. Azizzadenesheli. Off-policy risk assessment in contextual bandits. InAdvances in Neural Information Processing Systems, volume 34, pages 23714–23726, 2021. 12

work page 2021
[25]

Kanagawa and K

M. Kanagawa and K. Fukumizu. Recovering Distributions from Gaussian RKHS Embeddings. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume33, 2014

work page 2014
[26]

Kim and A

I. Kim and A. Ramdas. Dimension-agnostic inference using cross u-statistics.Bernoulli, 30(1):683–711, 2024

work page 2024
[27]

Lattimore and C

T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 2020. doi: 10.1017/ 9781108571401

work page 2020
[28]

L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. InProceedings of the 19th International Conference on World Wide Web (WWW), pages 661–670, 2010

work page 2010
[29]

Z. Li, D. Meunier, M. Mollenhauer, and A. Gretton. Optimal rates for regularized conditional mean embedding learning.Advances in Neural Information Processing Systems, 35:4433–4445, 2022

work page 2022
[30]

Luedtke and I

A. Luedtke and I. Chung. One-step estimation of differentiable Hilbert-valued parameters.The Annals of Statistics, 52(4):1534 – 1563, 2024

work page 2024
[31]

Martinez Taboada, A

D. Martinez Taboada, A. Ramdas, and E. Kennedy. An efficient doubly-robust test for the kernel treatment effect. InAdvances in Neural Information Processing Systems, volume 36, pages 59924–59952, 2023

work page 2023
[32]

Matthey, I

L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017

work page 2017
[33]

Muandet, M

K. Muandet, M. Kanagawa, S. Saengkyongam, and S. Marukatat. Counterfactual mean embeddings. Journal of Machine Learning Research, 22(162):1–71, 2021

work page 2021
[34]

Park and K

J. Park and K. Muandet. A measure-theoretic approach to kernel conditional mean embeddings.Advances in Neural Information Processing Systems, 2020

work page 2020
[35]

J. Park, U. Shalit, B. Schölkopf, and K. Muandet. Conditional distributional treatment effect with kernel conditional mean embeddings and u-statistic regression. InInternational conference on machine learning, pages 8401–8412, 2021

work page 2021
[36]

Perchet, P

V. Perchet, P. Rigollet, S. Chassang, and E. Snowberg. Batched bandit problems.The Annals of Statistics, 44:660–681, 04 2016

work page 2016
[37]

I. Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

work page 1994
[38]

Qiang and M

S. Qiang and M. Bayati. Dynamic pricing with demand learning and strategic consumers: An application to online retail.Operations Research, 64(4):931–944, 2016. doi: 10.1287/opre.2016.1514

work page doi:10.1287/opre.2016.1514 2016
[39]

R. T. Rockafellar, S. Uryasev, et al. Optimization of conditional value-at-risk.Journal of risk, 2:21–42, 2000

work page 2000
[40]

C. Rothe. Nonparametric estimation of distributional policy effects.Journal of Econometrics, 155(1): 56–70, 2010

work page 2010
[41]

Shekhar, I

S. Shekhar, I. Kim, and A. Ramdas. A permutation-free kernel independence test.Journal of Machine Learning Research, 24(369):1–68, 2023

work page 2023
[42]

Simon.Trace Ideals and Their Applications, volume 120 ofMathematical Surveys and Monographs

B. Simon.Trace Ideals and Their Applications, volume 120 ofMathematical Surveys and Monographs. American Mathematical Society, 2nd edition, 2005. 13

work page 2005
[43]

Singh, L

R. Singh, L. Xu, and A. Gretton. Kernel methods for causal functions: dose, heterogeneous and incremental response curves.Biometrika, 111(2):497–516, 2024

work page 2024
[44]

Smola, A

A. Smola, A. Gretton, L. Song, and B. Schölkopf. A hilbert space embedding for distributions. In International conference on algorithmic learning theory, pages 13–31. Springer, 2007

work page 2007
[45]

L. Song, J. Huang, A. Smola, and K. Fukumizu. Hilbert space embeddings of conditional distributions with applications to dynamical systems. InProceedings of the 26th Annual International Conference on Machine Learning, pages 961–968, 2009

work page 2009
[46]

Sriperumbudur, A

B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. Lanckriet. Hilbert space embeddings and metrics on probability measures.Journal of Machine Learning Research, 11:1517–1561, 2010

work page 2010
[47]

A. v. d. Vaart and J. A. Wellner. Weak convergence and empirical processes with applications to statistics. Journal of the Royal Statistical Society-Series A Statistics in Society, 160(3):596–608, 1997

work page 1997
[48]

A. W. van der Vaart.Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998. doi: 10.1017/CBO9780511802256

work page doi:10.1017/cbo9780511802256 1998
[49]

Waudby-Smith and A

I. Waudby-Smith and A. Ramdas. Time-uniform central limit theorems and confidence sequences. In International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 10663–10672, 2021

work page 2021
[50]

Xu and A

L. Xu and A. Gretton. Causal benchmark based on disentangled image dataset. 2023

work page 2023
[51]

Zenati, E

H. Zenati, E. Diemert, M. Martin, J. Mairal, and P. Gaillard. Sequential counterfactual risk minimization. InInternational Conference on Machine Learning, pages 40681–40706. PMLR, 2023

work page 2023
[52]

Zenati, J

H. Zenati, J. Abécassis, J. Josse, and B. Thirion. Double debiased machine learning for mediation analysis with continuous treatments.arXiv preprint arXiv:2503.06156, 2025

work page arXiv 2025
[53]

Zenati, B

H. Zenati, B. Bozkurt, and A. Gretton. Doubly-robust estimation of counterfactual policy mean embeddings, 2025. URLhttps://arxiv.org/abs/2506.02793

work page arXiv 2025
[54]

Zhang, L

K. Zhang, L. Janson, and S. Murphy. Inference for batched bandits.Advances in neural information processing systems, 33:9818–9829, 2020

work page 2020
[55]

square–integrable linear map

K. Zhang, L. Janson, and S. Murphy. Statistical inference with m-estimators on adaptively collected data.Advances in neural information processing systems, 34:7460–7471, 2021. 14 Appendix This appendix is organized as follows: – Appendix 9: summary of the notations used in the paper and in the analysis. –Appendix 10: a review of reproducing kernel Hilbert...

work page 2021
[56]

(K(00) X,r +λI) −1K(00) X,r (K(00) X,r +λI) −1K(01) X,r 0 0 # , µ 1,r =

are needed. PlainL2 nuisance consistency and the mild Cesàro stabilization of the logging policy suffice to deliver the predictable quadratic-variation limit and Bosq’s (B2). We now provide an additional lemma on the convergence of the inverse of the average of conditional variance estimators. Lemma 11.3(Average stabilizer).Let bωt be estimators with rati...

work page