Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

Ikjun Choi; Ilmun Kim

arxiv: 2407.08976 · v2 · pith:4WDWDMSRnew · submitted 2024-07-12 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

Ikjun Choi , Ilmun Kim This is my paper

Pith reviewed 2026-05-23 23:13 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords two-sample testingmaximum mean discrepancyrandom Fourier featuresminimax separation ratescomputational-statistical trade-offSobolev ball

0 comments

The pith

By selecting the right number of random Fourier features, the approximated MMD two-sample test matches the exact test's minimax separation rates at sub-quadratic time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the trade-off between computation and statistical power when approximating the maximum mean discrepancy test with random Fourier features for two-sample testing. It first shows that the approximated test only becomes pointwise consistent when the number of features grows without bound. It then analyzes uniform power under the minimax framework and demonstrates that a suitable finite number of features can recover the same separation rates as the full quadratic-time MMD test. The result holds for densities in Sobolev balls and similar smoothness classes. A reader would care because the quadratic cost of exact MMD has limited its use on large data sets.

Core claim

The approximated MMD test using random Fourier features attains the same minimax separation rates as the exact MMD test at sub-quadratic time cost when the number of random features is chosen appropriately. This holds under distributional assumptions such as densities belonging to a Sobolev ball. The analysis begins from the observation that pointwise power consistency requires the number of features to tend to infinity, then shifts to uniform power analysis within the minimax testing framework to establish the computational-statistical trade-off.

What carries the argument

Random Fourier features approximation to the kernel in the MMD statistic, analyzed via the minimax testing framework for uniform power.

If this is right

The test is pointwise consistent in power only when the number of random features tends to infinity.
Uniform minimax power can be preserved at sub-quadratic cost by tuning the feature count.
The matching rates are shown explicitly when densities lie in a Sobolev ball.
Simulation studies confirm the theoretical trade-off.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar feature-count tuning may apply to other kernel-based tests that currently scale quadratically.
The same trade-off analysis could be repeated for different kernel approximations or in high-dimensional regimes.
If the smoothness class is misspecified, the optimal feature count may need to adapt to unknown regularity.

Load-bearing premise

The underlying distributions are assumed to have densities in a Sobolev ball or similar smoothness class so that the minimax separation rates can be matched.

What would settle it

A proof or simulation showing that, for any fixed number of random features, the separation rate of the approximated test is strictly worse than the MMD rate under Sobolev-ball assumptions would falsify the claim.

read the original abstract

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread adoption, the primary limitation of the MMD test has been its quadratic-time complexity, which poses challenges for large-scale analysis. While various approaches have been proposed to expedite the procedure, it has been unclear whether it is possible to attain the same power guarantee as the MMD test at sub-quadratic time cost. To fill this gap, we revisit the approximated MMD test using random Fourier features, and investigate its computational-statistical trade-off. We start by revealing that the approximated MMD test is pointwise consistent in power only when the number of random features approaches infinity. We then consider the uniform power of the test and study the time-power trade-off under the minimax testing framework. Our result shows that, by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time. We demonstrate this point under different distributional assumptions such as densities in a Sobolev ball. Our theoretical findings are corroborated by simulation studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that random Fourier features can match exact MMD minimax rates in sub-quadratic time by scaling the feature count appropriately under Sobolev assumptions.

read the letter

The main result is that you can recover the same minimax separation rates as the full quadratic-time MMD test by letting the number of random Fourier features grow with n in a controlled way, keeping total time o(n²). They separate the pointwise consistency requirement (m must go to infinity) from the uniform-power regime where m is chosen as a function of n and the smoothness index. This fills the gap they flag in prior work on approximated MMD. The analysis is carried out under standard Sobolev-ball assumptions on the densities, which is the natural setting for these rates. The simulations are presented as qualitative support rather than sharp rate checks. The central claim looks internally consistent with the minimax framework and the stated assumptions; the stress-test note confirms no obvious contradiction in the separation of pointwise versus uniform regimes. Soft spots are the lack of visible proof details or explicit constants in the provided material, plus the fact that the simulations stay qualitative. That makes it harder to judge how tight the bounds are or whether hidden logarithmic factors appear in practice. The work is aimed at researchers who care about scalable nonparametric two-sample tests for large or high-dimensional data. A reader already familiar with kernel embeddings and minimax testing will get the most out of the explicit trade-off. It deserves a serious referee because the question is practically relevant and the approach is direct, even if the proofs need close checking.

Referee Report

0 major / 2 minor

Summary. The manuscript studies the approximated MMD two-sample test based on random Fourier features. It first shows that the test is pointwise consistent in power only when the number of features m tends to infinity. It then analyzes uniform power under the minimax framework and proves that, for densities in a Sobolev ball (and related classes), a suitable scaling of m with n yields the same separation rate as the exact MMD test while keeping total runtime sub-quadratic in n.

Significance. If the derivations hold, the result is significant because it supplies the first explicit computational-statistical trade-off that preserves minimax optimality for kernel two-sample testing. The explicit separation of the m→∞ regime (pointwise consistency) from the m = m(n, smoothness) regime (uniform power) is a useful conceptual contribution, and the simulation studies provide qualitative support for the rates.

minor comments (2)

[Abstract] The dependence of the required m on the smoothness parameter s and dimension d is stated in the main theorem but could be highlighted more explicitly in the abstract and introduction for readers interested in practical tuning.
[§4] Notation for the random-feature kernel approximation (e.g., the definition of the empirical MMD with m features) is introduced in §2 but reused without re-statement in the proof sketches of §4; a short reminder equation would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment, recognition of the significance of the computational-statistical trade-off result, and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external minimax framework

full rationale

The paper's central result—that a suitable choice of random Fourier features m(n) recovers the minimax separation rate of the exact MMD test under Sobolev-ball assumptions while remaining sub-quadratic—is obtained by direct analysis inside the standard minimax testing framework. No equation reduces a claimed prediction to a fitted parameter by construction, no load-bearing step rests on a self-citation whose content is itself unverified, and the separation rates are imported from the external literature on kernel testing rather than defined in terms of the paper's own outputs. The abstract explicitly distinguishes the m → ∞ consistency regime from the uniform-power regime, confirming the argument is not self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the minimax testing framework and the assumption that densities lie in Sobolev balls; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Densities belong to a Sobolev ball (or similar smoothness class)
Invoked to obtain the matching minimax separation rates under different distributional assumptions.

pith-pipeline@v0.9.0 · 5753 in / 1279 out tokens · 25615 ms · 2026-05-23T23:13:36.848917+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time... under different distributional assumptions such as densities in a Sobolev ball
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 6... R ≥ n^{4d/(4s+d)}... uniform separation ρ(Δ,β,CL2,δL2) ≲ n^{-2s/(4s+d)}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Scalable Nystrom-Based Kernel Two-Sample Test with Permutations
stat.ML 2025-02 unverdicted novelty 6.0

Nyström approximation of MMD enables scalable two-sample testing with permutation p-values and a finite-sample power bound matching the minimax optimal separation rate.