pith. sign in

arxiv: 1509.02237 · v2 · pith:OHZCK2MInew · submitted 2015-09-08 · 🧮 math.ST · stat.ML· stat.TH

On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests

classification 🧮 math.ST stat.MLstat.TH
keywords connectionswassersteinnonparametricotherssamplestatisticstestingtests
0
0 comments X
read the original abstract

Nonparametric two sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being intelligently designed and analyzed, both for the unidimensional and the multivariate setting. Our contribution is to tie together many of these tests, drawing connections between seemingly very different statistics. In this work, our central object is the Wasserstein distance, as we form a chain of connections from univariate methods like the Kolmogorov-Smirnov test, PP/QQ plots and ROC/ODC curves, to multivariate tests involving energy statistics and kernel based maximum mean discrepancy. Some connections proceed through the construction of a \textit{smoothed} Wasserstein distance, and others through the pursuit of a "distribution-free" Wasserstein test. Some observations in this chain are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two sample testing's classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

    cs.CL 2026-05 unverdicted novelty 6.0

    The authors introduce a register-aware evaluation framework that compares LLM outputs to human reference corpora via Biber's lexico-grammatical features and MMD across five English registers.

  2. Simulations Approaching Data: Cortical Slow Waves in Inferred Models of the Whole Hemisphere of Mouse

    q-bio.NC 2021-04 unverdicted novelty 4.0

    Two-loop inference of a mean-field model with periodic neuromodulation reproduces spatio-temporal features of cortical slow waves from whole-hemisphere mouse calcium imaging data.

  3. Physics-driven Comparative Analysis of Various Statistical Distance Metrics and Normalizing Functions

    nucl-ex 2026-04 unverdicted novelty 3.0

    A data-driven comparison of Hellinger, Wasserstein, Jensen-Shannon, Kolmogorov-Smirnov and other distance metrics on Kr-83 decay spectra finds varying stability of a chosen parameter of interest depending on sample si...