A Martingale Kernel Independence Test
Pith reviewed 2026-05-22 03:19 UTC · model grok-4.3
The pith
Martingale versions of HSIC and dHSIC converge to standard normal under independence, replacing permutation calibration with a single quantile lookup.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adapting the martingale MMD construction, the authors define mHSIC as the self-normalised sum of the lower-triangular part of the Hadamard product of two empirically centred Gram matrices; under independence and bounded fourth moments this converges in distribution to N(0,1). mdHSIC estimates the centring on one half-sample and applies the martingale on the other, shrinking the conditional-mean residual exponentially in dimension d so that the same normal limit holds for every fixed number of jointly tested variables.
What carries the argument
Self-normalised lower-triangular martingale sum of the Hadamard product of two empirically centred Gram matrices
If this is right
- mHSIC is consistent against every fixed alternative while matching the quadratic cost of the usual biased HSIC.
- mdHSIC remains asymptotically standard normal at any fixed number of jointly tested variables after one half-sample split.
- Both statistics achieve the same empirical type-I error rate and test power as permutation-calibrated HSIC and dHSIC.
- Per-test runtime drops by a factor of 25 to 60 because no permutations are required.
Where Pith is reading between the lines
- The construction could be applied directly to other degenerate V-statistics that currently rely on permutation calibration.
- In high-dimensional causal discovery pipelines the linear-in-d cost of mdHSIC would allow routine joint testing of many variables.
- The exponential shrinkage of the centring residual in d suggests the method may remain reliable even when the number of tested variables grows slowly with sample size.
Load-bearing premise
The kernels possess bounded fourth moments and the observations are i.i.d.
What would settle it
A Monte Carlo experiment that draws large i.i.d. samples from an independence model, computes mHSIC repeatedly with a bounded-fourth-moment kernel, and finds that the empirical distribution deviates substantially from standard normal.
Figures
read the original abstract
The Hilbert-Schmidt Independence Criterion (HSIC) and its joint-independence extension $d\mathrm{HSIC}$ are degenerate $V$-statistics whose data-dependent weighted-$\chi^2$ null limits force a permutation calibration that multiplies the per-test cost by the number of permutations, in practice two orders of magnitude. Adapting the recent martingale MMD construction for two-sample testing to the (joint) independence problem, we introduce two studentised statistics whose null distributions are standard normal regardless of the data law, so that a single normal-quantile lookup replaces the permutation step entirely. The first, $m\mathrm{HSIC}$, is a self-normalised lower-triangular sum of the Hadamard product of two empirically centred Gram matrices. Under independence and bounded-fourth-moment kernels it converges to a standard normal. It is consistent against every fixed alternative, and runs at quadratic cost in the sample size without any sample split, matching the biased HSIC $V$-statistic. Our second statistic, $md\mathrm{HSIC}$, achieves finite-sample consistency with a single half-sample split: the centring is estimated on one half and the lower-triangular self-normalised martingale is run on the other, shrinking the conditional-mean residual to a quantity that is exponentially small in $d$, so the statistic is asymptotically standard normal at every fixed number of jointly tested variables, with a per-test cost that grows only linearly in $d$. On synthetic data with per-variable input dimension from $1$ to $500$ and between $2$ and $10$ jointly tested variables, both statistics match the empirical type-I error rate and test power of permutation-calibrated baselines while running $25$ to $60\times$ faster.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces mHSIC, a self-normalised lower-triangular martingale statistic constructed from the Hadamard product of two empirically centred Gram matrices, and mdHSIC, its half-sample-split variant. Under independence and kernels with bounded fourth moments, mHSIC is claimed to converge in distribution to N(0,1), to be consistent against every fixed alternative, and to run at quadratic cost without permutation or splitting. mdHSIC is claimed to achieve the same asymptotic normality for any fixed number d of jointly tested variables because the half-sample centering residual is exponentially small in d, at a per-test cost linear in d. Synthetic experiments with input dimensions 1–500 and 2–10 jointly tested variables report type-I error and power matching permutation baselines at 25–60× speed-up.
Significance. If the stated convergence and consistency results hold under the given assumptions, the work supplies a practical, permutation-free calibration for kernel independence tests that scales better than existing V-statistic methods when many variables are tested jointly. The martingale construction and explicit avoidance of sample splitting for mHSIC are technically attractive features.
major comments (1)
- [Abstract] Abstract (mdHSIC paragraph): the assertion that half-sample centering produces a conditional-mean residual that is exponentially small in d relies on concentration that is not guaranteed by the stated assumptions of i.i.d. data and kernels possessing only bounded fourth moments. Standard fourth-moment inequalities yield O_p(n^{-1/2}) rates for empirical kernel means; exponential decay in d typically requires sub-Gaussian or bounded tails (Hoeffding/Bernstein), which are not implied. This bound is load-bearing for the claim that mdHSIC remains asymptotically N(0,1) at every fixed d.
minor comments (1)
- [Abstract] The abstract refers to 'synthetic data with per-variable input dimension from 1 to 500' but does not specify the kernel families or the precise sample sizes used in the timing and error-rate comparisons; adding these details would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable comments on our manuscript. We address the major comment point by point below and are happy to revise the manuscript accordingly where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract (mdHSIC paragraph): the assertion that half-sample centering produces a conditional-mean residual that is exponentially small in d relies on concentration that is not guaranteed by the stated assumptions of i.i.d. data and kernels possessing only bounded fourth moments. Standard fourth-moment inequalities yield O_p(n^{-1/2}) rates for empirical kernel means; exponential decay in d typically requires sub-Gaussian or bounded tails (Hoeffding/Bernstein), which are not implied. This bound is load-bearing for the claim that mdHSIC remains asymptotically N(0,1) at every fixed d.
Authors: We agree with the referee that the claim of an 'exponentially small' residual in d is not justified under the stated assumptions of bounded fourth moments alone, as this would typically require stronger tail conditions for exponential concentration inequalities. We will revise the abstract to remove this phrasing and instead note that the residual vanishes as O_p(n^{-1/2}) for fixed d, which is sufficient to preserve the asymptotic N(0,1) limit under the paper's assumptions since d is held fixed while n → ∞. The main theoretical results for mdHSIC remain valid, as the proof of asymptotic normality can be adjusted to account for this rate without requiring exponential decay. We will also check and clarify the corresponding statements in the main text if necessary. revision: yes
Circularity Check
No circularity: martingale CLT derivation is independent of fitted quantities
full rationale
The paper derives the N(0,1) limit for mHSIC and mdHSIC from martingale central limit theorems applied to self-normalised sums of centred kernel products, under explicit i.i.d. and bounded-fourth-moment assumptions. No parameter is fitted to data and then relabelled as a prediction; the studentisation uses the same empirical Gram matrices but is constructed as an external normalisation whose asymptotic effect is proved separately via martingale theory rather than by definition. The exponential-smallness claim for the mdHSIC half-sample residual is presented as a consequence of the stated moment conditions and independence, not smuggled in via self-citation or ansatz. The adaptation of the prior martingale MMD construction is cited for context but the independence-specific steps (Hadamard product of Gram matrices, lower-triangular summation) are developed directly in the manuscript without reducing to prior results by construction. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Kernels have bounded fourth moments
- domain assumption Observations are i.i.d.
Reference graph
Works this paper leans on
-
[1]
Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) , year =
Balsubramani, Akshay and Ramdas, Aaditya , title =. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) , year =
-
[2]
arXiv preprint arXiv:2510.11853 , year =
Chatterjee, Anirban and Ramdas, Aaditya , title =. arXiv preprint arXiv:2510.11853 , year =
-
[3]
Science China Mathematics , volume =
Fan, Xiequan , title =. Science China Mathematics , volume =
-
[4]
A kernel statistical test of independence , booktitle =
Gretton, Arthur and Fukumizu, Kenji and Teo, Choon Hui and Song, Le and Sch. A kernel statistical test of independence , booktitle =
-
[5]
Gretton, Arthur and Borgwardt, Karsten M. and Rasch, Malte J. and Sch. A kernel two-sample test , journal =
-
[6]
and Barahona, Mauricio , title =
Liu, Zhaolu and Peach, Robert L. and Barahona, Mauricio , title =. Proceedings of the International Conference on Machine Learning (ICML) , year =
-
[7]
Kernel-based tests for joint independence , journal =
Pfister, Niklas and B. Kernel-based tests for joint independence , journal =
-
[8]
Journal of the Royal Statistical Society, Series B , volume =
Shekhar, Shubhanshu and Kim, Ilmun and Ramdas, Aaditya , title =. Journal of the Royal Statistical Society, Series B , volume =
-
[9]
IEEE Transactions on Information Theory , volume =
Shekhar, Shubhanshu and Ramdas, Aaditya , title =. IEEE Transactions on Information Theory , volume =
-
[10]
Journal of Machine Learning Research , volume =
Song, Le and Smola, Alex and Gretton, Arthur and Bedo, Justin and Borgwardt, Karsten , title =. Journal of Machine Learning Research , volume =
-
[11]
and Muandet, Krikamol , title =
Tolstikhin, Ilya and Sriperumbudur, Bharath K. and Muandet, Krikamol , title =. Journal of Machine Learning Research , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.