Delaunay Weighted Two-sample Test for High-dimensional Data by Incorporating Geometric Information

Guosheng Yin; Jiaqi Gu; Ruoxu Tan

arxiv: 2404.03198 · v2 · submitted 2024-04-04 · 📊 stat.ME

Delaunay Weighted Two-sample Test for High-dimensional Data by Incorporating Geometric Information

Jiaqi Gu , Ruoxu Tan , Guosheng Yin This is my paper

Pith reviewed 2026-05-24 02:13 UTC · model grok-4.3

classification 📊 stat.ME

keywords two-sample testDelaunay triangulationhigh-dimensional datamanifold learningnonparametric statisticsgeometric proximityasymptotic normalityconsistency

0 comments

The pith

A nonparametric test statistic from the Delaunay weight matrix is asymptotically normal under the null of equal distributions and consistent under alternatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

High-dimensional data are assumed to lie on a low-dimensional manifold, and the paper uses that structure to build a two-sample test. Instead of relying only on pairwise distances, it defines weights via Delaunay triangulation that capture both distance and directional information among points. A computational procedure estimates the manifold from the samples and approximates these weights. The resulting test statistic is proved to follow a normal distribution when the two groups come from the same distribution and to detect departures reliably otherwise. Simulations indicate robustness to manifold estimation error and extra power when differences appear in directions rather than magnitudes alone.

Core claim

The paper establishes a novel nonparametric test statistic constructed from the Delaunay weight matrix on the learned manifold; this statistic has asymptotic normality under the null hypothesis that the two high-dimensional samples arise from the same distribution and is consistent under the alternative that the distributions differ.

What carries the argument

Delaunay weight matrix obtained from triangulation on the estimated low-dimensional manifold, encoding geometric proximity that includes both distance and direction.

If this is right

The test gains power relative to distance-only methods when distribution differences have a directional component on the manifold.
Moderate inaccuracies in manifold recovery do not destroy the asymptotic guarantees or practical performance.
The procedure yields a usable test for real high-dimensional problems such as protein-expression comparisons.
Large-sample p-values can be obtained directly from the normal limit without resampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The directional sensitivity may make the approach useful for problems involving oriented data or shape differences.
The same weight construction could be inserted into other nonparametric procedures that currently rely on pairwise similarities.
Performance comparisons with alternative manifold approximations or triangulation methods would clarify the method's sensitivity to implementation choices.

Load-bearing premise

The observed high-dimensional points lie on a low-dimensional manifold whose structure can be recovered accurately enough from the sample to produce reliable Delaunay weights.

What would settle it

A dataset generated from a distribution without low-dimensional manifold support in which the Delaunay-weighted statistic fails to converge in distribution to normality under the null of equal samples.

Figures

Figures reproduced from arXiv: 2404.03198 by Guosheng Yin, Jiaqi Gu, Ruoxu Tan.

**Figure 2.** Figure 2: Graphical illustration of the Delaunay simplices (triangles) [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: The Delaunay weight matrix ΓZ computed from the Z in [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Graphical illustration of the global advantage of the Delaunay weight matrix in [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Graphical illustration of the stereographic projected [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Rejection proportions of different implementations of the Delaunay weighted test [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

**Figure 7.** Figure 7: Low-dimensional Euclidean representations of protein expression level of 1077 [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗

**Figure 8.** Figure 8: The rejection proportion of our Delaunay weighted test and other approaches over [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗

read the original abstract

Two-sample hypothesis testing is a fundamental problem with various applications, which faces new challenges in the high-dimensional context. To mitigate the issue of the curse of dimensionality, high-dimensional data are typically assumed to lie on a low-dimensional manifold. To incorporate geometric information in the data, we propose to apply the Delaunay triangulation and develop the Delaunay weight to measure the geometric proximity among data points. In contrast to existing similarity measures that only utilize pairwise distances, the Delaunay weight can take both the distance and direction information into account. A detailed computation procedure is developed to learn the unknown manifold and approximate the Delaunay weight. We further propose a novel nonparametric test statistic using the Delaunay weight matrix. Asymptotic normality under the null and consistency under the alternative of the test statistic are developed. Applied on simulated data, the new test shows robustness to the learning of the unknown manifold and exhibits substantial power gain if the distributions differ directions. The proposed test also shows great power on a real dataset of mice protein expression levels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Delaunay weighting scheme adds directional information to two-sample tests, but the asymptotic normality claim for the estimated weights lacks the required perturbation bounds.

read the letter

The new element here is the use of Delaunay triangulation to define weights that capture both distance and direction on an estimated low-dimensional manifold. This is a step beyond standard pairwise-distance similarities, and the simulations indicate a power advantage when the two distributions differ mainly in direction rather than scale or location. The real-data example on mice protein levels is a reasonable check, and the method appears stable to the manifold-learning step in the reported experiments. That part is concrete and worth noting. The central theoretical claim is that the test statistic built from the estimated weight matrix is asymptotically normal under the null and consistent under the alternative. This requires the estimated weights to be sufficiently close to the oracle weights so that the perturbation does not break the central limit theorem. The paper outlines a manifold-learning procedure but supplies no explicit rates on the approximation error or a lemma showing that the difference in the quadratic form stays negligible at the needed order. In high dimensions that rate condition is not automatic, and without it the asymptotic result does not follow from the stated assumptions. The circularity burden is low and there are no obvious invented entities or post-hoc fitting issues. The work is for researchers already working on geometric or manifold-based nonparametric tests who might want to experiment with this weighting. It shows clear engagement with the problem but the missing quantitative control on the estimation error is a real gap rather than a minor detail. I would bring it to a reading group to see the full derivations and would not cite it until the perturbation analysis is supplied or the claims are weakened to empirical only. A serious editor should send it for review with a specific request to address the rate condition.

Referee Report

2 major / 2 minor

Summary. The paper proposes a nonparametric two-sample test for high-dimensional data assumed to lie on a low-dimensional manifold. It defines Delaunay weights via triangulation to incorporate both distance and directional geometric information (beyond pairwise distances), develops a manifold-learning procedure to estimate these weights from data, constructs a test statistic T_n from the estimated weight matrix, and claims to establish asymptotic normality of T_n/σ_n under the null and consistency under the alternative. Simulations indicate robustness to manifold estimation and power gains when alternatives differ in direction; an application to mice protein expression data is included.

Significance. If the asymptotic claims hold with estimated weights, the method would offer a geometrically aware nonparametric test that can detect directional differences in high-dimensional settings where standard distance-based approaches lose power, addressing a practical gap in manifold-aware inference.

major comments (2)

[Asymptotics / Theorem on normality] Asymptotic normality and consistency section (theorems establishing T_n/σ_n → N(0,1) under H0 and divergence under H1): the central argument requires that the estimated Delaunay weight matrix Ŵ satisfies ||Ŵ − W|| = o_p(1/√n) (or an analogous rate) so that the perturbation to the quadratic form or U-statistic remains negligible. No quantitative error bounds, convergence rates for the manifold-learning step, or perturbation lemma are supplied to verify this condition holds under the stated high-dimensional regime.
[Manifold learning procedure] Manifold learning and weight approximation procedure (the section detailing the computation of Ŵ): the procedure is described but supplies neither explicit rates on the geometric approximation error nor verification that the resulting error is small enough relative to 1/√n; in high dimensions this rate is typically slower, rendering the CLT claim load-bearing and unverified.

minor comments (2)

[Abstract] The abstract states that 'asymptotic normality ... are developed' but does not indicate whether the proofs are fully rigorous or rely on additional unstated assumptions; a brief pointer to the key technical conditions would improve clarity.
[Methods / Weight definition] Notation for the Delaunay weight matrix W and its estimator Ŵ should be introduced with an explicit definition (e.g., via an equation) before the test statistic is defined, to avoid ambiguity when reading the asymptotic statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the asymptotic claims and manifold learning procedure. We respond point by point below.

read point-by-point responses

Referee: [Asymptotics / Theorem on normality] Asymptotic normality and consistency section (theorems establishing T_n/σ_n → N(0,1) under H0 and divergence under H1): the central argument requires that the estimated Delaunay weight matrix Ŵ satisfies ||Ŵ − W|| = o_p(1/√n) (or an analogous rate) so that the perturbation to the quadratic form or U-statistic remains negligible. No quantitative error bounds, convergence rates for the manifold-learning step, or perturbation lemma are supplied to verify this condition holds under the stated high-dimensional regime.

Authors: We agree that the asymptotic normality result with estimated weights requires ||Ŵ − W|| = o_p(1/√n) to ensure the perturbation term is negligible. The theorems in the manuscript are stated under the assumption that the manifold-learning step achieves a rate sufficient for this condition. We did not supply explicit quantitative bounds or a dedicated perturbation lemma. In the revision we will add a lemma that bounds the effect of the weight estimation error on the test statistic and discuss the required rates under the low-dimensional manifold assumption. revision: yes
Referee: [Manifold learning procedure] Manifold learning and weight approximation procedure (the section detailing the computation of Ŵ): the procedure is described but supplies neither explicit rates on the geometric approximation error nor verification that the resulting error is small enough relative to 1/√n; in high dimensions this rate is typically slower, rendering the CLT claim load-bearing and unverified.

Authors: The referee is correct that the manifold-learning section describes the algorithmic steps without explicit error rates or verification against the 1/√n threshold. We will revise the section to cite standard convergence results for manifold estimation (under fixed or slowly growing intrinsic dimension) and show that these rates meet the o_p(1/√n) requirement when the manifold dimension is controlled. Additional assumptions needed for the high-dimensional regime will be stated explicitly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained

full rationale

The paper defines a Delaunay-weighted test statistic from a manifold-learning procedure to approximate weights, then states asymptotic normality under the null and consistency under the alternative. No step reduces a claimed prediction or first-principles result to its own inputs by construction, nor renames a fitted quantity as a prediction. No load-bearing self-citation chain or uniqueness theorem imported from the authors' prior work appears in the provided text. The manifold approximation is treated as an independent computational step whose error is assumed controlled, without the statistic itself being defined in terms of its own asymptotic properties. This is the typical non-circular case where the central claims rest on separate statistical arguments rather than tautological re-expression of fitted inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that data reside on a learnable low-dimensional manifold and on the ad-hoc construction of the Delaunay weight; no free parameters or invented entities beyond the weight itself are mentioned in the abstract.

axioms (1)

domain assumption High-dimensional data lie on a low-dimensional manifold
Explicitly stated in the abstract as the premise used to mitigate the curse of dimensionality.

invented entities (1)

Delaunay weight no independent evidence
purpose: Measure geometric proximity among data points using both distance and direction via triangulation
Newly defined quantity introduced to replace pairwise-distance similarity measures.

pith-pipeline@v0.9.0 · 5707 in / 1237 out tokens · 22863 ms · 2026-05-24T02:13:03.711757+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose to apply the Delaunay triangulation and develop the Delaunay weight to measure the geometric proximity among data points... asymptotic normality under the null and consistency under the alternative
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

high-dimensional data are typically assumed to lie on a low-dimensional manifold... Delaunay triangulation on a Riemannian manifold

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

and L'Hour, J

Abadie, A. and L'Hour, J. (2021). A Penalized Synthetic Control Estimator for Disaggregated Data . Journal of the American Statistical Association , 116(536):1817--1834

work page 2021
[2]

Arias-Castro, E., Pelletier, B., and Saligrama, V. (2018). Remember the Curse of Dimensionality: The Case of Goodness-of-fit Testing in Arbitrary Dimension . Journal of Nonparametric Statistics , 30(2):448--471

work page 2018
[3]

and Patrangenaru, V

Bhattacharya, R. and Patrangenaru, V. (2003). Large sample theory of intrinsic and extrinsic sample means on manifolds. The Annals of Statistics , 31(1):1--29

work page 2003
[4]

Bickel, P. J. (1969). A Distribution Free Version of the Smirnov Two Sample Test in the p -Variate Case . The Annals of Mathematical Statistics , 40(1):1--23

work page 1969
[5]

T., Liu, W., and Xia, Y

Cai, T. T., Liu, W., and Xia, Y. (2013a). Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings . Journal of the American Statistical Association , 108(501):265--277

work page
[6]

T., Liu, W., and Xia, Y

Cai, T. T., Liu, W., and Xia, Y. (2013b). Two-sample Test of High Dimensional Means under Dependence . Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 76(2):349--372

work page
[7]

Cao, Y., Nemirovski, A., Xie, Y., Guigues, V., and Juditsky, A. (2018). Change Detection via Affine and Quadratic Detectors . Electronic Journal of Statistics , 12(1):1--57

work page 2018
[8]

Chang, J., Zheng, C., Zhou, W.-X., and Zhou, W. (2017). Simulation-based Hypothesis Testing of High Dimensional Means under Covariance Heterogeneity . Biometrics , 73(4):1300--1310

work page 2017
[9]

H., Watson, L

Chang, T. H., Watson, L. T., Lux, T. C. H., Butt, A. R., Cameron, K. W., and Hong, Y. (2020). Algorithm 1012: DELAUNAYSPARSE : I nterpolation via a Sparse Subset of the D elaunay Triangulation in Medium to High Dimensions . ACM Transactions on Mathematical Software , 46(4):1--20

work page 2020
[10]

Chen, H. (2019). Sequential Change-point Detection based on Nearest Neighbors . The Annals of Statistics , 47(3):1381--1407

work page 2019
[11]

Chen, H., Chen, X., and Su, Y. (2018). A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data . Journal of the American Statistical Association , 113(523):1146--1155

work page 2018
[12]

and Friedman, J

Chen, H. and Friedman, J. H. (2017). A New Graph-Based Two-Sample Test for Multivariate and Object Data . Journal of the American Statistical Association , 112(517):397--409

work page 2017
[13]

and Xie, Y

Cheng, X. and Xie, Y. (2021). Kernel Two-Sample Tests for Manifold Data . arXiv:2105.03425

work page arXiv 2021
[14]

Chwialkowski, K., Strathmann, H., and Gretton, A. (2016). A Kernel Test of Goodness of Fit . In Proceedings of The 33rd International Conference on Machine Learning , pages 2606--2615, New York, New York, USA. PMLR

work page 2016
[15]

Dimeglio, C., Gall \' o n, S., Loubes, J.-M., and Maza, E. (2014). A Robust Algorithm for Template Curve Estimation based on Manifold Embedding . Computational Statistics & Data Analysis , 70:373--386

work page 2014
[16]

Facco, E., d'Errico, M., Rodriguez, A., and Laio, A. (2017). Estimating the Intrinsic Dimension of Datasets by a Minimal Neighborhood Information . Scientific Reports , 7:12140

work page 2017
[17]

Friedman, J. H. and Rafsky, L. C. (1979). Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests . The Annals of Statistics , 7(4):697--717

work page 1979
[18]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \"o lkopf, B., and Smola, A. (2012). A Kernel Two-Sample Test . Journal of Machine Learning Research , 13(25):723--773

work page 2012
[19]

Gretton, A., Fukumizu, K., Harchaoui, Z., and Sriperumbudur, B. K. (2009). A Fast, Consistent Kernel Two-Sample Test . In Proceedings of the 22nd International Conference on Neural Information Processing Systems , volume 22, page 673–681, Red Hook, New York, USA. Curran Associates, Inc

work page 2009
[20]

and Tajvidi, N

Hall, P. and Tajvidi, N. (2002). Permutation Tests for Equality of Distributions in High-dimensional Settings . Biometrika , 89(2):359--374

work page 2002
[21]

Hediger, S., Michel, L., and Näf, J. (2022). On the Use of Random Forest for Two-sample Testing . Computational Statistics & Data Analysis , 170:107435

work page 2022
[22]

Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences . The Annals of Statistics , 16(2):772--783

work page 1988
[23]

and Penrose, M

Henze, N. and Penrose, M. D. (1999). On the Multivariate Runs Test . The Annals of Statistics , 27(1):290--298

work page 1999
[24]

and Kalina, J

Jure c kov \' a , J. and Kalina, J. (2012). Nonparametric Multivariate Rank Tests and Their Unbiasedness . Bernoulli , 18(1):229--251

work page 2012
[25]

B., and Lei, J

Kim, I., Lee, A. B., and Lei, J. (2019). Global and Local Two-sample Tests via Regression . Electronic Journal of Statistics , 13(2):5253--5305

work page 2019
[26]

Kolmogorov, A. N. (1933). Sulla Determinazione Empirica di Una Legge di Distribuzione . Giornale dell’Istituto Italiano degli Attuari , 4:83--91

work page 1933
[27]

Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika , 29:1--27

work page 1964
[28]

and Letscher, D

Leibon, G. and Letscher, D. (2000). Delaunay Triangulations and Voronoi Diagrams for Riemannian Manifolds . In Proceedings of the sixteenth annual symposium on Computational geometry , pages 341--349, New York, New York, USA. Association for Computing Machinery

work page 2000
[29]

Liu, W., Yu, X., Zhong, W., and Li, R. (2022). Projection Test for Mean Vector in High Dimensions . Journal of the American Statistical Association

work page 2022
[30]

Mann, H. B. and Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other . The Annals of Mathematical Statistics , 18(1):50--60

work page 1947
[31]

Marozzi, M. (2015). Multivariate Multidistance Tests for High-dimensional Low Sample Size Case-control Studies . Statistics in Medicine , 34(9):1511--1526

work page 2015
[32]

R., Stanley III, J

Moon, K. R., Stanley III, J. S., Burkhardt, D., van Dijk, D., Wolf, G., and Krishnaswamy, S. (2018). Manifold Learning-based Methods for Analyzing Single-cell RNA-sequencing Data . Current Opinion in Systems Biology , 7:36--46

work page 2018
[33]

Morgan, K. L. and Rubin, D. B. (2012). Rerandomization to Improve Covariate Balance in Experiments . The Annals of Statistics , 40(2):1263--1282

work page 2012
[34]

and Souvenir, R

Pless, R. and Souvenir, R. (2009). A Survey of Manifold Learning for Images . IPSJ Transactions on Computer Vision and Applications , 1:83--94

work page 2009
[35]

Rosenbaum, P. R. (2005). An Exact Distribution-free Test Comparing Wwo Multivariate Distributions based on Adjacency . Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 67(4):515--530

work page 2005
[36]

Schilling, M. F. (1986). Multivariate Two-sample Tests based on Nearest Neighbors . Journal of the American Statistical Association , 81(395):799--806

work page 1986
[37]

Sibson, R. (1978). Locally Equiangular Triangulations . The Computer Journal , 21(3):243--245

work page 1978
[38]

Smirnov, N. (1948). Table for Estimating the Goodness of Fit of Empirical Distributions . The Annals of Mathematical Statistics , 19(2):279--281

work page 1948
[39]

Sz \'e kely, G. J. and Rizzo, M. L. (2004). Testing for Equal Distributions in High Dimension . InterStat , November(5)

work page 2004
[40]

and Wolfowitz, J

Wald, A. and Wolfowitz, J. (1940). On a Test Whether Two Samples are from the Same Population . The Annals of Mathematical Statistics , 11(2):147--162

work page 1940
[41]

Wang, W., Lin, N., and Tang, X. (2019). Robust Two-sample Test of High-dimensional Mean Vectors under Dependence . Journal of Multivariate Analysis , 169:312--329

work page 2019
[42]

Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods . Biometrics Bulletin , 1(6):80--83

work page 1945
[43]

and Zhang, X

Yan, J. and Zhang, X. (2023). Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders. Biometrika , 110(2):411--430

work page 2023
[44]

Zhang, Z., Song, Y., and Qi, H. (2017). Age Progression/Regression by Conditional Adversarial Autoencoder . In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4352--4360. IEEE

work page 2017
[45]

Zhao, D., Wang, J., Lin, H., Chu, Y., Wang, Y., Zhang, Y., and Yang, Z. (2021). Sentence Representation with Manifold Learning for Biomedical Texts . Knowledge-Based Systems , 218:106869

work page 2021

[1] [1]

and L'Hour, J

Abadie, A. and L'Hour, J. (2021). A Penalized Synthetic Control Estimator for Disaggregated Data . Journal of the American Statistical Association , 116(536):1817--1834

work page 2021

[2] [2]

Arias-Castro, E., Pelletier, B., and Saligrama, V. (2018). Remember the Curse of Dimensionality: The Case of Goodness-of-fit Testing in Arbitrary Dimension . Journal of Nonparametric Statistics , 30(2):448--471

work page 2018

[3] [3]

and Patrangenaru, V

Bhattacharya, R. and Patrangenaru, V. (2003). Large sample theory of intrinsic and extrinsic sample means on manifolds. The Annals of Statistics , 31(1):1--29

work page 2003

[4] [4]

Bickel, P. J. (1969). A Distribution Free Version of the Smirnov Two Sample Test in the p -Variate Case . The Annals of Mathematical Statistics , 40(1):1--23

work page 1969

[5] [5]

T., Liu, W., and Xia, Y

Cai, T. T., Liu, W., and Xia, Y. (2013a). Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings . Journal of the American Statistical Association , 108(501):265--277

work page

[6] [6]

T., Liu, W., and Xia, Y

Cai, T. T., Liu, W., and Xia, Y. (2013b). Two-sample Test of High Dimensional Means under Dependence . Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 76(2):349--372

work page

[7] [7]

Cao, Y., Nemirovski, A., Xie, Y., Guigues, V., and Juditsky, A. (2018). Change Detection via Affine and Quadratic Detectors . Electronic Journal of Statistics , 12(1):1--57

work page 2018

[8] [8]

Chang, J., Zheng, C., Zhou, W.-X., and Zhou, W. (2017). Simulation-based Hypothesis Testing of High Dimensional Means under Covariance Heterogeneity . Biometrics , 73(4):1300--1310

work page 2017

[9] [9]

H., Watson, L

Chang, T. H., Watson, L. T., Lux, T. C. H., Butt, A. R., Cameron, K. W., and Hong, Y. (2020). Algorithm 1012: DELAUNAYSPARSE : I nterpolation via a Sparse Subset of the D elaunay Triangulation in Medium to High Dimensions . ACM Transactions on Mathematical Software , 46(4):1--20

work page 2020

[10] [10]

Chen, H. (2019). Sequential Change-point Detection based on Nearest Neighbors . The Annals of Statistics , 47(3):1381--1407

work page 2019

[11] [11]

Chen, H., Chen, X., and Su, Y. (2018). A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data . Journal of the American Statistical Association , 113(523):1146--1155

work page 2018

[12] [12]

and Friedman, J

Chen, H. and Friedman, J. H. (2017). A New Graph-Based Two-Sample Test for Multivariate and Object Data . Journal of the American Statistical Association , 112(517):397--409

work page 2017

[13] [13]

and Xie, Y

Cheng, X. and Xie, Y. (2021). Kernel Two-Sample Tests for Manifold Data . arXiv:2105.03425

work page arXiv 2021

[14] [14]

Chwialkowski, K., Strathmann, H., and Gretton, A. (2016). A Kernel Test of Goodness of Fit . In Proceedings of The 33rd International Conference on Machine Learning , pages 2606--2615, New York, New York, USA. PMLR

work page 2016

[15] [15]

Dimeglio, C., Gall \' o n, S., Loubes, J.-M., and Maza, E. (2014). A Robust Algorithm for Template Curve Estimation based on Manifold Embedding . Computational Statistics & Data Analysis , 70:373--386

work page 2014

[16] [16]

Facco, E., d'Errico, M., Rodriguez, A., and Laio, A. (2017). Estimating the Intrinsic Dimension of Datasets by a Minimal Neighborhood Information . Scientific Reports , 7:12140

work page 2017

[17] [17]

Friedman, J. H. and Rafsky, L. C. (1979). Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests . The Annals of Statistics , 7(4):697--717

work page 1979

[18] [18]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \"o lkopf, B., and Smola, A. (2012). A Kernel Two-Sample Test . Journal of Machine Learning Research , 13(25):723--773

work page 2012

[19] [19]

Gretton, A., Fukumizu, K., Harchaoui, Z., and Sriperumbudur, B. K. (2009). A Fast, Consistent Kernel Two-Sample Test . In Proceedings of the 22nd International Conference on Neural Information Processing Systems , volume 22, page 673–681, Red Hook, New York, USA. Curran Associates, Inc

work page 2009

[20] [20]

and Tajvidi, N

Hall, P. and Tajvidi, N. (2002). Permutation Tests for Equality of Distributions in High-dimensional Settings . Biometrika , 89(2):359--374

work page 2002

[21] [21]

Hediger, S., Michel, L., and Näf, J. (2022). On the Use of Random Forest for Two-sample Testing . Computational Statistics & Data Analysis , 170:107435

work page 2022

[22] [22]

Henze, N. (1988). A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences . The Annals of Statistics , 16(2):772--783

work page 1988

[23] [23]

and Penrose, M

Henze, N. and Penrose, M. D. (1999). On the Multivariate Runs Test . The Annals of Statistics , 27(1):290--298

work page 1999

[24] [24]

and Kalina, J

Jure c kov \' a , J. and Kalina, J. (2012). Nonparametric Multivariate Rank Tests and Their Unbiasedness . Bernoulli , 18(1):229--251

work page 2012

[25] [25]

B., and Lei, J

Kim, I., Lee, A. B., and Lei, J. (2019). Global and Local Two-sample Tests via Regression . Electronic Journal of Statistics , 13(2):5253--5305

work page 2019

[26] [26]

Kolmogorov, A. N. (1933). Sulla Determinazione Empirica di Una Legge di Distribuzione . Giornale dell’Istituto Italiano degli Attuari , 4:83--91

work page 1933

[27] [27]

Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika , 29:1--27

work page 1964

[28] [28]

and Letscher, D

Leibon, G. and Letscher, D. (2000). Delaunay Triangulations and Voronoi Diagrams for Riemannian Manifolds . In Proceedings of the sixteenth annual symposium on Computational geometry , pages 341--349, New York, New York, USA. Association for Computing Machinery

work page 2000

[29] [29]

Liu, W., Yu, X., Zhong, W., and Li, R. (2022). Projection Test for Mean Vector in High Dimensions . Journal of the American Statistical Association

work page 2022

[30] [30]

Mann, H. B. and Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other . The Annals of Mathematical Statistics , 18(1):50--60

work page 1947

[31] [31]

Marozzi, M. (2015). Multivariate Multidistance Tests for High-dimensional Low Sample Size Case-control Studies . Statistics in Medicine , 34(9):1511--1526

work page 2015

[32] [32]

R., Stanley III, J

Moon, K. R., Stanley III, J. S., Burkhardt, D., van Dijk, D., Wolf, G., and Krishnaswamy, S. (2018). Manifold Learning-based Methods for Analyzing Single-cell RNA-sequencing Data . Current Opinion in Systems Biology , 7:36--46

work page 2018

[33] [33]

Morgan, K. L. and Rubin, D. B. (2012). Rerandomization to Improve Covariate Balance in Experiments . The Annals of Statistics , 40(2):1263--1282

work page 2012

[34] [34]

and Souvenir, R

Pless, R. and Souvenir, R. (2009). A Survey of Manifold Learning for Images . IPSJ Transactions on Computer Vision and Applications , 1:83--94

work page 2009

[35] [35]

Rosenbaum, P. R. (2005). An Exact Distribution-free Test Comparing Wwo Multivariate Distributions based on Adjacency . Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 67(4):515--530

work page 2005

[36] [36]

Schilling, M. F. (1986). Multivariate Two-sample Tests based on Nearest Neighbors . Journal of the American Statistical Association , 81(395):799--806

work page 1986

[37] [37]

Sibson, R. (1978). Locally Equiangular Triangulations . The Computer Journal , 21(3):243--245

work page 1978

[38] [38]

Smirnov, N. (1948). Table for Estimating the Goodness of Fit of Empirical Distributions . The Annals of Mathematical Statistics , 19(2):279--281

work page 1948

[39] [39]

Sz \'e kely, G. J. and Rizzo, M. L. (2004). Testing for Equal Distributions in High Dimension . InterStat , November(5)

work page 2004

[40] [40]

and Wolfowitz, J

Wald, A. and Wolfowitz, J. (1940). On a Test Whether Two Samples are from the Same Population . The Annals of Mathematical Statistics , 11(2):147--162

work page 1940

[41] [41]

Wang, W., Lin, N., and Tang, X. (2019). Robust Two-sample Test of High-dimensional Mean Vectors under Dependence . Journal of Multivariate Analysis , 169:312--329

work page 2019

[42] [42]

Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods . Biometrics Bulletin , 1(6):80--83

work page 1945

[43] [43]

and Zhang, X

Yan, J. and Zhang, X. (2023). Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders. Biometrika , 110(2):411--430

work page 2023

[44] [44]

Zhang, Z., Song, Y., and Qi, H. (2017). Age Progression/Regression by Conditional Adversarial Autoencoder . In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4352--4360. IEEE

work page 2017

[45] [45]

Zhao, D., Wang, J., Lin, H., Chu, Y., Wang, Y., Zhang, Y., and Yang, Z. (2021). Sentence Representation with Manifold Learning for Biomedical Texts . Knowledge-Based Systems , 218:106869

work page 2021