On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests
read the original abstract
Nonparametric two sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being intelligently designed and analyzed, both for the unidimensional and the multivariate setting. Our contribution is to tie together many of these tests, drawing connections between seemingly very different statistics. In this work, our central object is the Wasserstein distance, as we form a chain of connections from univariate methods like the Kolmogorov-Smirnov test, PP/QQ plots and ROC/ODC curves, to multivariate tests involving energy statistics and kernel based maximum mean discrepancy. Some connections proceed through the construction of a \textit{smoothed} Wasserstein distance, and others through the pursuit of a "distribution-free" Wasserstein test. Some observations in this chain are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two sample testing's classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework
The authors introduce a register-aware evaluation framework that compares LLM outputs to human reference corpora via Biber's lexico-grammatical features and MMD across five English registers.
-
Simulations Approaching Data: Cortical Slow Waves in Inferred Models of the Whole Hemisphere of Mouse
Two-loop inference of a mean-field model with periodic neuromodulation reproduces spatio-temporal features of cortical slow waves from whole-hemisphere mouse calcium imaging data.
-
Physics-driven Comparative Analysis of Various Statistical Distance Metrics and Normalizing Functions
A data-driven comparison of Hellinger, Wasserstein, Jensen-Shannon, Kolmogorov-Smirnov and other distance metrics on Kr-83 decay spectra finds varying stability of a chosen parameter of interest depending on sample si...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.