Large sample analysis of the median heuristic

Damien Garreau , Wittawat Jitkrittum , Motonobu Kanagawa

Authors on Pith no claims yet

classification 🧮 math.ST stat.MLstat.TH

keywords heuristicmediankernelsettingtestanalysisbandwidthchosen

read the original abstract

In kernel methods, the median heuristic has been widely used as a way of setting the bandwidth of RBF kernels. While its empirical performances make it a safe choice under many circumstances, there is little theoretical understanding of why this is the case. Our aim in this paper is to advance our understanding of the median heuristic by focusing on the setting of kernel two-sample test. We collect new findings that may be of interest for both theoreticians and practitioners. In theory, we provide a convergence analysis that shows the asymptotic normality of the bandwidth chosen by the median heuristic in the setting of kernel two-sample test. Systematic empirical investigations are also conducted in simple settings, comparing the performances based on the bandwidths chosen by the median heuristic and those by the maximization of test power.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BAMIFun: Bayesian Multiple Imputation for Functional Data
stat.ME 2026-05 unverdicted novelty 7.0

BAMIFun provides Bayesian multiple imputation for functional data via low-rank penalized spline models, achieving accurate imputation and improved coverage in simulations and real datasets compared to single-imputatio...
Detecting Changes in Causal Dependence with Kernels and Copulas
stat.ME 2026-05 unverdicted novelty 7.0

A kernel-copula embedding statistic equals zero exactly when causal dependence between X and Y is stable and is strictly positive otherwise, with a near-linear estimator and convergence rates provided.
Convex-Geometric Error Bounds for Positive-Weight Kernel Quadrature
math.NA 2026-05 unverdicted novelty 7.0

Positive simplex weights for kernel quadrature achieve O(d/N) convex-hull approximation error in feature space, transferring to RKHS worst-case bounds that beat Monte Carlo under exponential spectral decay.
The Generalised Kernel Covariance Measure
stat.ML 2026-04 conditional novelty 7.0

GKCM generalizes kernel CI testing to arbitrary regression models, provides uniform asymptotic level guarantees under stated conditions, and outperforms state-of-the-art methods in simulations when using tree-based re...
LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models
cs.CL 2026-05 unverdicted novelty 6.0

LLM-XTM integrates LLM-guided topic refinement with self-consistency uncertainty quantification to improve coherence and alignment in cross-lingual topic models while reducing dependence on bilingual resources and rep...
Concentration and Calibration in Predictive Bayesian Inference
stat.ME 2026-05 unverdicted novelty 6.0

Predictive Bayesian inference posteriors concentrate onto a forward-model-dependent quantity and produce miscalibrated credible sets unless the predictive model contains the true data-generating process.
A unified perspective on fine-tuning and sampling with diffusion and flow models
stat.ML 2026-04 unverdicted novelty 6.0

A unified framework for exponential tilting in diffusion and flow models that includes bias-variance decompositions showing finite gradient variance for some methods, norm bounds on adjoint ODEs, and adapted losses wi...
Non-asymptotic two-sample kernel testing with the spectrally truncated normalized MMD
math.ST 2026-04 unverdicted novelty 6.0

Derives exponential upper bounds under the null for the spectrally truncated normalized MMD and supplies a practical data-adaptive quantile estimator with hyperparameter tuning that does not require splitting.
Physics-informed neural particle flow for the Bayesian update step
cs.LG 2026-02 unverdicted novelty 6.0

A neural network approximates the velocity field of log-homotopy particle flow by enforcing a derived master PDE from the continuity equation, enabling unsupervised amortized Bayesian updates with reduced stiffness.