A Martingale Kernel Independence Test

Felix Laumann; Mauricio Barahona; Zhaolu Liu

arxiv: 2605.22549 · v1 · pith:NR257K5Qnew · submitted 2026-05-21 · 📊 stat.ML · cs.LG

A Martingale Kernel Independence Test

Felix Laumann , Zhaolu Liu , Mauricio Barahona This is my paper

Pith reviewed 2026-05-22 03:19 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords independence testingkernel methodsmartingaleHSICdHSICasymptotic normalitypermutation-free testing

0 comments

The pith

Martingale versions of HSIC and dHSIC converge to standard normal under independence, replacing permutation calibration with a single quantile lookup.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs two studentised kernel statistics, mHSIC and mdHSIC, whose null distributions are standard normal for kernels with bounded fourth moments. mHSIC uses a self-normalised lower-triangular sum on the full sample and runs at quadratic cost. mdHSIC adds a half-sample split to centre the statistic, achieving the same normality for any fixed number of jointly tested variables at linear cost in dimension. A sympathetic reader would care because the usual HSIC and dHSIC require expensive permutations to calibrate their data-dependent limits, while these versions keep comparable type-I error and power at 25- to 60-fold speed-ups.

Core claim

Adapting the martingale MMD construction, the authors define mHSIC as the self-normalised sum of the lower-triangular part of the Hadamard product of two empirically centred Gram matrices; under independence and bounded fourth moments this converges in distribution to N(0,1). mdHSIC estimates the centring on one half-sample and applies the martingale on the other, shrinking the conditional-mean residual exponentially in dimension d so that the same normal limit holds for every fixed number of jointly tested variables.

What carries the argument

Self-normalised lower-triangular martingale sum of the Hadamard product of two empirically centred Gram matrices

If this is right

mHSIC is consistent against every fixed alternative while matching the quadratic cost of the usual biased HSIC.
mdHSIC remains asymptotically standard normal at any fixed number of jointly tested variables after one half-sample split.
Both statistics achieve the same empirical type-I error rate and test power as permutation-calibrated HSIC and dHSIC.
Per-test runtime drops by a factor of 25 to 60 because no permutations are required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The construction could be applied directly to other degenerate V-statistics that currently rely on permutation calibration.
In high-dimensional causal discovery pipelines the linear-in-d cost of mdHSIC would allow routine joint testing of many variables.
The exponential shrinkage of the centring residual in d suggests the method may remain reliable even when the number of tested variables grows slowly with sample size.

Load-bearing premise

The kernels possess bounded fourth moments and the observations are i.i.d.

What would settle it

A Monte Carlo experiment that draws large i.i.d. samples from an independence model, computes mHSIC repeatedly with a bounded-fourth-moment kernel, and finds that the empirical distribution deviates substantially from standard normal.

Figures

Figures reproduced from arXiv: 2605.22549 by Felix Laumann, Mauricio Barahona, Zhaolu Liu.

**Figure 1.** Figure 1: Empirical rejection rate at α = 0.05 (horizontal dotted line) as a function of dependence strength a ∈ {0.0, 0.2, 0.4, 0.6, 0.8, 1.0} with M = 1,000 trials per cell of mHSIC on the randommixture DGP. One panel per ambient dimension dambient ∈ {1, 10, 50, 100, 500} and one curve per sample size n ∈ {10, 50, 100, 200, 500, 1,000, 2,000, 4,000, 8,000, 16,000} (dark → light gradient). The dominant cost is the… view at source ↗

**Figure 2.** Figure 2: Split-martingale mdHSIC on the linear-Gaussian joint-independence DGP with p = 5, M = 1,000 repetitions per cell, α = 0.05. One panel per number of variables d ∈ {2, 3, 5, 8, 10}; one curve per sample size n ∈ {100, 500, 2,000, 5,000} (dark → light gradient). The horizontal dotted line marks the nominal level. The value at a = 0 is the empirical type-I rate (range 0.036– 0.066 across all cells); values at … view at source ↗

**Figure 3.** Figure 3: Independence test comparison: mHSIC (blue) vs. xHSIC [Shekhar et al., 2023] (orange) vs. HSIC-perm [Gretton et al., 2008] (green). M = 500 trials, random-mixture DGP, α = 0.05, B = 200 permutations; one panel per (dambient, n) cell with dambient ∈ {1, 10, 500}. In the wellcalibrated regime, mHSIC and HSIC-perm have essentially identical power at every a > 0; xHSIC trails at small n and matches HSIC-perm b… view at source ↗

**Figure 4.** Figure 4: Joint-independence test comparison: split-martingale [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

The Hilbert-Schmidt Independence Criterion (HSIC) and its joint-independence extension $d\mathrm{HSIC}$ are degenerate $V$-statistics whose data-dependent weighted-$\chi^2$ null limits force a permutation calibration that multiplies the per-test cost by the number of permutations, in practice two orders of magnitude. Adapting the recent martingale MMD construction for two-sample testing to the (joint) independence problem, we introduce two studentised statistics whose null distributions are standard normal regardless of the data law, so that a single normal-quantile lookup replaces the permutation step entirely. The first, $m\mathrm{HSIC}$, is a self-normalised lower-triangular sum of the Hadamard product of two empirically centred Gram matrices. Under independence and bounded-fourth-moment kernels it converges to a standard normal. It is consistent against every fixed alternative, and runs at quadratic cost in the sample size without any sample split, matching the biased HSIC $V$-statistic. Our second statistic, $md\mathrm{HSIC}$, achieves finite-sample consistency with a single half-sample split: the centring is estimated on one half and the lower-triangular self-normalised martingale is run on the other, shrinking the conditional-mean residual to a quantity that is exponentially small in $d$, so the statistic is asymptotically standard normal at every fixed number of jointly tested variables, with a per-test cost that grows only linearly in $d$. On synthetic data with per-variable input dimension from $1$ to $500$ and between $2$ and $10$ jointly tested variables, both statistics match the empirical type-I error rate and test power of permutation-calibrated baselines while running $25$ to $60\times$ faster.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Martingale-based HSIC tests promise big speed gains over permutations for independence testing, but the exponential centering claim for the joint version rests on assumptions that may not deliver it.

read the letter

This paper gives a way to do kernel independence testing with a normal approximation instead of permutations, which cuts the runtime by a factor of 25 to 60 on their tests. The main contribution is adapting martingale ideas from two-sample testing to the independence setting. They construct mHSIC as a self-normalised lower-triangular sum based on the Hadamard product of two empirically centred Gram matrices. Under the null of independence and with kernels that have bounded fourth moments, this statistic converges in distribution to a standard normal. It is also consistent against fixed alternatives and keeps the quadratic computational cost of the usual biased HSIC without any splitting. For the joint independence case with multiple variables, mdHSIC estimates the centering on one half of the sample and runs the martingale on the other half. The paper claims this makes the conditional mean residual exponentially small in the number of variables d, allowing the normal limit to hold at any fixed d with cost linear in d. On the synthetic experiments, both versions match the type I error and power of permutation-based HSIC and dHSIC while being substantially faster. This addresses a real practical bottleneck for these tests. One soft spot is the justification for that exponential decay in the mdHSIC residual. The assumptions are i.i.d. data and bounded fourth moments, which typically support only polynomial concentration rates. Exponential decay in d would seem to require additional tail conditions or a special structure under the null. If the full paper has a tight argument for this, it would strengthen the result; otherwise it might restrict the range of d where the approximation is reliable. This work is for researchers in statistics and machine learning who perform independence testing on large or high-dimensional data and want to avoid the computational overhead of permutations. A reader interested in kernel methods or scalable statistical testing would get direct value from the proposed statistics and the reported speed-ups. The paper engages honestly with the literature on HSIC and martingale MMD. It deserves a serious referee to verify the proofs and explore the practical limits of the claims. I recommend sending it for peer review.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces mHSIC, a self-normalised lower-triangular martingale statistic constructed from the Hadamard product of two empirically centred Gram matrices, and mdHSIC, its half-sample-split variant. Under independence and kernels with bounded fourth moments, mHSIC is claimed to converge in distribution to N(0,1), to be consistent against every fixed alternative, and to run at quadratic cost without permutation or splitting. mdHSIC is claimed to achieve the same asymptotic normality for any fixed number d of jointly tested variables because the half-sample centering residual is exponentially small in d, at a per-test cost linear in d. Synthetic experiments with input dimensions 1–500 and 2–10 jointly tested variables report type-I error and power matching permutation baselines at 25–60× speed-up.

Significance. If the stated convergence and consistency results hold under the given assumptions, the work supplies a practical, permutation-free calibration for kernel independence tests that scales better than existing V-statistic methods when many variables are tested jointly. The martingale construction and explicit avoidance of sample splitting for mHSIC are technically attractive features.

major comments (1)

[Abstract] Abstract (mdHSIC paragraph): the assertion that half-sample centering produces a conditional-mean residual that is exponentially small in d relies on concentration that is not guaranteed by the stated assumptions of i.i.d. data and kernels possessing only bounded fourth moments. Standard fourth-moment inequalities yield O_p(n^{-1/2}) rates for empirical kernel means; exponential decay in d typically requires sub-Gaussian or bounded tails (Hoeffding/Bernstein), which are not implied. This bound is load-bearing for the claim that mdHSIC remains asymptotically N(0,1) at every fixed d.

minor comments (1)

[Abstract] The abstract refers to 'synthetic data with per-variable input dimension from 1 to 500' but does not specify the kernel families or the precise sample sizes used in the timing and error-rate comparisons; adding these details would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and valuable comments on our manuscript. We address the major comment point by point below and are happy to revise the manuscript accordingly where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract (mdHSIC paragraph): the assertion that half-sample centering produces a conditional-mean residual that is exponentially small in d relies on concentration that is not guaranteed by the stated assumptions of i.i.d. data and kernels possessing only bounded fourth moments. Standard fourth-moment inequalities yield O_p(n^{-1/2}) rates for empirical kernel means; exponential decay in d typically requires sub-Gaussian or bounded tails (Hoeffding/Bernstein), which are not implied. This bound is load-bearing for the claim that mdHSIC remains asymptotically N(0,1) at every fixed d.

Authors: We agree with the referee that the claim of an 'exponentially small' residual in d is not justified under the stated assumptions of bounded fourth moments alone, as this would typically require stronger tail conditions for exponential concentration inequalities. We will revise the abstract to remove this phrasing and instead note that the residual vanishes as O_p(n^{-1/2}) for fixed d, which is sufficient to preserve the asymptotic N(0,1) limit under the paper's assumptions since d is held fixed while n → ∞. The main theoretical results for mdHSIC remain valid, as the proof of asymptotic normality can be adjusted to account for this rate without requiring exponential decay. We will also check and clarify the corresponding statements in the main text if necessary. revision: yes

Circularity Check

0 steps flagged

No circularity: martingale CLT derivation is independent of fitted quantities

full rationale

The paper derives the N(0,1) limit for mHSIC and mdHSIC from martingale central limit theorems applied to self-normalised sums of centred kernel products, under explicit i.i.d. and bounded-fourth-moment assumptions. No parameter is fitted to data and then relabelled as a prediction; the studentisation uses the same empirical Gram matrices but is constructed as an external normalisation whose asymptotic effect is proved separately via martingale theory rather than by definition. The exponential-smallness claim for the mdHSIC half-sample residual is presented as a consequence of the stated moment conditions and independence, not smuggled in via self-citation or ansatz. The adaptation of the prior martingale MMD construction is cited for context but the independence-specific steps (Hadamard product of Gram matrices, lower-triangular summation) are developed directly in the manuscript without reducing to prior results by construction. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claims rest on standard kernel and martingale assumptions plus one data-splitting device; no new free parameters or invented entities are introduced beyond the usual kernel bandwidth or scale choices.

axioms (2)

domain assumption Kernels have bounded fourth moments
Invoked to guarantee the martingale central limit theorem applies to the self-normalised sum.
domain assumption Observations are i.i.d.
Standard assumption for the V-statistic and martingale construction to have the stated asymptotic behaviour.

pith-pipeline@v0.9.0 · 5844 in / 1386 out tokens · 31671 ms · 2026-05-22T03:19:22.628384+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) , year =

Balsubramani, Akshay and Ramdas, Aaditya , title =. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) , year =

work page
[2]

arXiv preprint arXiv:2510.11853 , year =

Chatterjee, Anirban and Ramdas, Aaditya , title =. arXiv preprint arXiv:2510.11853 , year =

work page arXiv
[3]

Science China Mathematics , volume =

Fan, Xiequan , title =. Science China Mathematics , volume =

work page
[4]

A kernel statistical test of independence , booktitle =

Gretton, Arthur and Fukumizu, Kenji and Teo, Choon Hui and Song, Le and Sch. A kernel statistical test of independence , booktitle =

work page
[5]

and Rasch, Malte J

Gretton, Arthur and Borgwardt, Karsten M. and Rasch, Malte J. and Sch. A kernel two-sample test , journal =

work page
[6]

and Barahona, Mauricio , title =

Liu, Zhaolu and Peach, Robert L. and Barahona, Mauricio , title =. Proceedings of the International Conference on Machine Learning (ICML) , year =

work page
[7]

Kernel-based tests for joint independence , journal =

Pfister, Niklas and B. Kernel-based tests for joint independence , journal =

work page
[8]

Journal of the Royal Statistical Society, Series B , volume =

Shekhar, Shubhanshu and Kim, Ilmun and Ramdas, Aaditya , title =. Journal of the Royal Statistical Society, Series B , volume =

work page
[9]

IEEE Transactions on Information Theory , volume =

Shekhar, Shubhanshu and Ramdas, Aaditya , title =. IEEE Transactions on Information Theory , volume =

work page
[10]

Journal of Machine Learning Research , volume =

Song, Le and Smola, Alex and Gretton, Arthur and Bedo, Justin and Borgwardt, Karsten , title =. Journal of Machine Learning Research , volume =

work page
[11]

and Muandet, Krikamol , title =

Tolstikhin, Ilya and Sriperumbudur, Bharath K. and Muandet, Krikamol , title =. Journal of Machine Learning Research , volume =

work page

[1] [1]

Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) , year =

Balsubramani, Akshay and Ramdas, Aaditya , title =. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) , year =

work page

[2] [2]

arXiv preprint arXiv:2510.11853 , year =

Chatterjee, Anirban and Ramdas, Aaditya , title =. arXiv preprint arXiv:2510.11853 , year =

work page arXiv

[3] [3]

Science China Mathematics , volume =

Fan, Xiequan , title =. Science China Mathematics , volume =

work page

[4] [4]

A kernel statistical test of independence , booktitle =

Gretton, Arthur and Fukumizu, Kenji and Teo, Choon Hui and Song, Le and Sch. A kernel statistical test of independence , booktitle =

work page

[5] [5]

and Rasch, Malte J

Gretton, Arthur and Borgwardt, Karsten M. and Rasch, Malte J. and Sch. A kernel two-sample test , journal =

work page

[6] [6]

and Barahona, Mauricio , title =

Liu, Zhaolu and Peach, Robert L. and Barahona, Mauricio , title =. Proceedings of the International Conference on Machine Learning (ICML) , year =

work page

[7] [7]

Kernel-based tests for joint independence , journal =

Pfister, Niklas and B. Kernel-based tests for joint independence , journal =

work page

[8] [8]

Journal of the Royal Statistical Society, Series B , volume =

Shekhar, Shubhanshu and Kim, Ilmun and Ramdas, Aaditya , title =. Journal of the Royal Statistical Society, Series B , volume =

work page

[9] [9]

IEEE Transactions on Information Theory , volume =

Shekhar, Shubhanshu and Ramdas, Aaditya , title =. IEEE Transactions on Information Theory , volume =

work page

[10] [10]

Journal of Machine Learning Research , volume =

Song, Le and Smola, Alex and Gretton, Arthur and Bedo, Justin and Borgwardt, Karsten , title =. Journal of Machine Learning Research , volume =

work page

[11] [11]

and Muandet, Krikamol , title =

Tolstikhin, Ilya and Sriperumbudur, Bharath K. and Muandet, Krikamol , title =. Journal of Machine Learning Research , volume =

work page