High-Dimensional Clustering via Nearest-Neighbor Asymmetry

Hao Chen; Xiancheng Lin

arxiv: 2305.00578 · v3 · pith:KVAWN2HWnew · submitted 2023-04-30 · 📊 stat.ME

High-Dimensional Clustering via Nearest-Neighbor Asymmetry

Hao Chen , Xiancheng Lin This is my paper

Pith reviewed 2026-05-24 08:37 UTC · model grok-4.3

classification 📊 stat.ME

keywords high-dimensional clusteringnearest-neighbor graphasymmetric separationdispersion differencespermutation standardizationdirected graph clusteringscale separationgene expression clustering

0 comments

The pith

Clustering high-dimensional data succeeds by scoring partitions on asymmetric neighbor counts in a directed k-nearest-neighbor graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that groups in high dimensions can separate through differences in dispersion rather than location, producing one-way neighbor patterns that a directed graph records. NAC builds this graph once, then scores candidate partitions with two permutation-standardized quantities: a weighted count of within-cluster edges and a contrast that measures directed imbalance between clusters. Their combination adapts automatically to mean separation, scale separation, or both, without fitting a mixture model or projecting to low dimensions. Simulations confirm competitive performance on location differences and clear gains when asymmetry dominates; gene-expression examples illustrate the same behavior on real small-sample data.

Core claim

The NAC objective, formed by adding a standardized within-edge enrichment statistic and a standardized contrast statistic on the directed kNN graph, recovers the correct partition by targeting complementary nearest-neighbor patterns that arise under both location and scale separation; population analysis shows each statistic isolates its target signal after permutation centering.

What carries the argument

The directed k-nearest-neighbor graph together with its two permutation-standardized statistics (weighted within-edge count and directed contrast) that are summed to form the clustering objective.

If this is right

NAC remains competitive with standard methods when clusters differ only in location.
NAC gains clear advantage precisely when nearest-neighbor asymmetry induced by scale differences is the main signal.
The method requires neither a parametric mixture model nor an explicit low-dimensional embedding.
The same two standardized statistics apply directly to small-sample high-dimensional problems such as gene-expression matrices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The permutation-based standardization may allow the same machinery to flag other forms of local heterogeneity beyond scale, such as differing tail behavior.
Because the graph is built once, NAC could serve as a fast post-processing step after any initial embedding that preserves local distances.
In regimes where multiple separation mechanisms coexist, the relative weight of the two statistics could be tuned by cross-validation on held-out edge counts.

Load-bearing premise

The directed kNN graph must carry the dominant separation signal in its asymmetric neighbor counts, and permutation standardization must remove finite-sample noise without needing Euclidean well-separation.

What would settle it

Generate data from two components that differ only in dispersion; if the NAC objective selects the true labels no more often than a random partition across repeated draws, the central claim fails.

Figures

Figures reproduced from arXiv: 2305.00578 by Hao Chen, Xiancheng Lin.

**Figure 2.** Figure 2: Clustering results based on the standard [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Plots of 𝑝11 (black) and 𝑝22 (blue) considering 𝑘 nearest neighbors under different settings. The dashed lines represent the baseline when the two clusters are from the same distribution 𝑝 (0) 11 = (𝑚 − 1)/(𝑁 − 1) (black dashed line) and 𝑝 (0) 22 = (𝑛 − 1)/(𝑁 − 1) (blue dashed line). The two dashed lines overlap when 𝑚 = 𝑛. When the two distributions differ in scale, 𝑝11 is larger than 𝑝 (0) 11 , but 𝑝22 i… view at source ↗

**Figure 4.** Figure 4: Values of 𝑍𝑤, 𝑍𝑑, and 𝑀𝜅 (top panel) and mis-clustering rate (bottom panel) over 𝑘. 𝑚 = 𝑛 = 50. varying 𝜆 from 0 to 1 in different settings. In two-sample testing, Zhu & Chen (2021) suggested using 𝑘 = [𝑁 0.5 ] with the 𝑘-MST, while Zhou & Chen (2022) suggested 𝜆 = 0.65 for the 𝑘-NN graph. Here, we explore the relation between 𝑘 and mis-clustering rate. In particular, we aim to find out whether the 𝑘 that … view at source ↗

**Figure 5.** Figure 5: Values of 𝑍𝑤, 𝑍𝑑, and 𝑀𝜅 (top panel) and mis-clustering rate (bottom panel) over 𝑘. 𝑚 = 30, 𝑛 = 70. 2.6. Choice of 𝜅 From Section 2.4, max𝑥 𝑍𝑤 (𝑥) can be used to cluster scenario (1), and max𝑥 𝑍𝑑 (𝑥) is suitable for scenarios (2) and (3). Since we don’t know which scenario we are dealing with in reality, 𝑀𝜅 (𝑥) = max(𝑍𝑤 (𝑥), 𝜅𝑍𝑑 (𝑥)) was proposed for these three scenarios. Ideally, when the scenario is (1)… view at source ↗

**Figure 6.** Figure 6: Plots of ratio (𝑟 = 𝑍𝑤/𝑍𝑑) versus signal strength. Left panel: location difference with 𝑏 = 1, and 𝑎 increasing from 0.05 to 0.3. The blue dashed line at 𝑟 = 1.8 represents the minimum value of the boxplot when 𝑎 = 0.2. The “mis rate” indicates the mean mis-clustering rate when using 𝑍𝑤 as the criterion, with green representing higher errors and orange indicating lower errors. Right panel: scale difference… view at source ↗

**Figure 7.** Figure 7: Mis-clustering rates for Gaussian mixtures [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Mis-clustering rates for multivariate 𝑡-distribution mixtures 𝑡20 (0, Σ) and 𝑡20 (𝑎1, 𝑏Σ) [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

High-dimensional clustering often relies on geometric or local-similarity structure, but the dominant separation between groups may not always be location-based. Differences in dispersion can create asymmetric local-neighborhood patterns: points from a more dispersed component may be closer to points in a more concentrated component than to points from their own component. We turn this high-dimensional phenomenon into a clustering principle. The proposed method, NAC (Nearest-neighbor Asymmetry Clustering), constructs a directed $k$-nearest-neighbor graph and evaluates candidate partitions using two permutation-standardized statistics: a weighted within-edge statistic that captures overall within-cluster enrichment and a contrast statistic that captures asymmetric separation. The resulting objective combines these two standardized signals, allowing the method to adapt to different separation regimes without specifying a mixture model or a low-dimensional representation. We provide a population-level analysis showing how the two statistics target complementary nearest-neighbor patterns. Simulation studies across mean, scale, and combined location-scale differences show that NAC is competitive under location separation and especially effective when nearest-neighbor asymmetry is present; gene-expression applications further illustrate its usefulness in small-sample, high-dimensional clustering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NAC uses directed kNN asymmetry to cluster high-dim data where dispersion differences matter more than location.

read the letter

The punchline is that this paper introduces NAC for high-dimensional clustering by exploiting asymmetry in nearest-neighbor counts from dispersion differences, rather than the usual location or similarity assumptions. They construct a directed kNN graph and define two permutation-standardized statistics: a weighted within-edge statistic for within-cluster enrichment and a contrast statistic for asymmetric separation. These are combined in the objective to adapt to different regimes. The population analysis explains the complementary targeting, simulations across mean, scale, and combined differences show competitive or better performance, and gene-expression applications illustrate the use case in small-sample high-dim data. This is new relative to the geometric and local-similarity methods, and it does well by providing a clear principle and evidence from analysis and sims without requiring mixture models. On the soft spots, the fact that permutation standardization happens on the same data as the graph construction could lead to some circularity if k or weights are tuned on those realizations, though the standardization is intended to isolate the signal. The abstract claims support, but verifying the high-dim validity would require the full derivations. Nothing looks broken, just needs the details checked. This paper is for statisticians focused on clustering methods for genomics and high-dim data where standard assumptions fail. Readers interested in alternative separation principles would get value. It deserves a serious referee because the contribution is distinct and the supporting work appears solid enough to review.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces the NAC method for high-dimensional clustering. It constructs a directed k-nearest-neighbor graph from the data and evaluates candidate partitions via an objective that combines two permutation-standardized statistics: a weighted within-edge statistic capturing within-cluster enrichment and a contrast statistic capturing asymmetric separation. A population-level analysis is provided to show that these statistics target complementary nearest-neighbor patterns. Simulations across mean, scale, and combined location-scale separations, together with gene-expression applications, are used to demonstrate competitiveness under location separation and particular effectiveness when nearest-neighbor asymmetry is present.

Significance. If the central claims hold, the work is significant because it supplies a model-free clustering principle that exploits dispersion-induced asymmetry in high dimensions without requiring a mixture model or low-dimensional embedding. The explicit population-level analysis demonstrating complementary targeting by the two statistics is a clear strength, as are the simulation studies that systematically vary separation regimes and the real-data illustrations. These elements together position NAC as a practical addition for small-sample, high-dimensional settings such as genomics. The permutation-standardization concern raised in the stress-test note does not appear to land as a load-bearing inconsistency on the basis of the provided description, because the standardization is presented as a device to isolate the asymmetry signal rather than as a data-dependent tuning step.

minor comments (2)

The abstract packs the description of the two statistics and the objective into a single long sentence; splitting this material would improve readability without altering content.
Notation for the two standardized statistics (e.g., symbols for the weighted within-edge and contrast quantities) should be introduced once in the main text and used consistently thereafter.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive evaluation of our manuscript. The summary and significance assessment accurately reflect the paper's contributions. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines NAC via a directed kNN graph followed by two permutation-standardized statistics whose population-level behavior is analyzed directly; the standardization step normalizes against a permutation null on the observed graph rather than fitting a parameter that is then re-used as a prediction. No self-definitional reduction, fitted-input-called-prediction, or load-bearing self-citation chain appears in the derivation. The objective is constructed from the two standardized signals without renaming a known result or smuggling an ansatz through citation. The method therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract introduces no explicit free parameters, invented entities, or non-standard axioms beyond the usual assumption that the observed nearest-neighbor counts reflect the underlying separation mechanism.

pith-pipeline@v0.9.0 · 5714 in / 1194 out tokens · 20259 ms · 2026-05-24T08:37:05.941029+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 '...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

, Barkai, N

Alon, U. , Barkai, N. , Notterman, D. A. , Gish, K. , Ybarra, S. , Mack, D. & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96, 6745--6750

work page 1999
[4]

Bajwa, M. S. , Agarwal, A. P. & Manchanda, S. (2015). Ternary search algorithm: Improvement of binary search. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE

work page 2015
[5]

, Richards, W

Bhattacharjee, A. , Richards, W. G. , Staunton, J. , Li, C. , Monti, S. , Vasa, P. , Ladd, C. , Beheshti, J. , Bueno, R. , Gillette, M. et al. (2001). Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences 98, 13790--13795

work page 2001
[6]

Chan, P. K. , Schlag, M. D. & Zien, J. Y. (1993). Spectral k-way ratio-cut partitioning and clustering. In Proceedings of the 30th international Design Automation Conference

work page 1993
[7]

& Friedman, J

Chen, H. & Friedman, J. H. (2017). A new graph-based two-sample test for multivariate and object data. Journal of the American statistical association 112, 397--409

work page 2017
[8]

Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics 20, 3583--3593

work page 2004
[9]

& Jin, J

Donoho, D. & Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proceedings of the National Academy of Sciences 105, 14790--14795

work page 2008
[10]

Golub, T. R. , Slonim, D. K. , Tamayo, P. , Huard, C. , Gaasenbeek, M. , Mesirov, J. P. , Coller, H. , Loh, M. L. , Downing, J. R. , Caligiuri, M. A. et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science 286, 531--537

work page 1999
[11]

Gordon, G. J. , Jensen, R. V. , Hsiao, L.-L. , Gullans, S. R. , Blumenstock, J. E. , Ramaswamy, S. , Richards, W. G. , Sugarbaker, D. J. & Bueno, R. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer research 62, 4963--4967

work page 2002
[12]

, Tibshirani, R

Hastie, T. , Tibshirani, R. , Friedman, J. H. & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer

work page 2009
[13]

Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences. The Annals of Statistics 16, 772--783

work page 1988
[14]

& Wang, W

Jin, J. & Wang, W. (2016). Influential features pca for high dimensional clustering. The Annals of Statistics 44, 2323--2359

work page 2016
[15]

Leighton, F. T. & Rao, S. (1988). An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In FOCS, vol. 88

work page 1988
[16]

& Chen, H

Liu, Y.-W. & Chen, H. (2022). A fast and efficient change-point detection framework based on approximate k -nearest neighbor graphs. IEEE Transactions on Signal Processing 70, 1976--1986

work page 2022
[17]

Maaten, L. v. d. & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research 9, 2579--2605

work page 2008
[18]

, Soares, J

Mwangi, B. , Soares, J. C. & Hasan, K. M. (2014). Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data. Journal of neuroscience methods 236, 19--25

work page 2014
[19]

, Chatterjee, S

Rohe, K. , Chatterjee, S. & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel

work page 2011
[20]

Schilling, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. Journal of the American Statistical Association 81, 799--806

work page 1986
[21]

& Malik, J

Shi, J. & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22, 888--905

work page 2000
[22]

, Febbo, P

Singh, D. , Febbo, P. G. , Ross, K. , Jackson, D. G. , Manola, J. , Ladd, C. , Tamayo, P. , Renshaw, A. A. , D'Amico, A. V. , Richie, J. P. et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1, 203--209

work page 2002
[23]

& Verma, N

Singh, V. & Verma, N. K. (2022). Gene expression data analysis using feature weighted robust fuzzy-means clustering. IEEE Transactions on NanoBioscience 22, 99--105

work page 2022
[24]

Su, A. I. , Welsh, J. B. , Sapinoso, L. M. , Kern, S. G. , Dimitrov, P. , Lapp, H. , Schultz, P. G. , Powell, S. M. , Moskaluk, C. A. , Frierson Jr, H. F. et al. (2001). Molecular classification of human carcinomas by use of gene expression signatures. Cancer research 61, 7388--7393

work page 2001
[25]

Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing 17, 395--416

work page 2007
[26]

, Klijn, J

Wang, Y. , Klijn, J. G. , Zhang, Y. , Sieuwerts, A. M. , Look, M. P. , Yang, F. , Talantov, D. , Timmermans, M. , Meijer-van Gelder, M. E. , Yu, J. et al. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet 365, 671--679

work page 2005
[27]

& Leahy, R

Wu, Z. & Leahy, R. (1993). An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE transactions on pattern analysis and machine intelligence 15, 1101--1113

work page 1993
[28]

Yousefi, M. R. , Hua, J. , Sima, C. & Dougherty, E. R. (2010). Reporting bias when using real data sets to analyze classification performance. Bioinformatics 26, 68--76

work page 2010
[29]

& Chen, H

Zhang, Y. & Chen, H. (2021). Graph-based multiple change-point detection. arXiv preprint arXiv:2110.01170

work page arXiv 2021
[30]

& Chen, H

Zhou, D. & Chen, H. (2022). Ring-cpd: Asymptotic distribution-free change-point detection for multivariate and non-euclidean data. arXiv preprint arXiv:2206.03038

work page arXiv 2022
[31]

& Chen, H

Zhu, Y. & Chen, H. (2021). Limiting distributions of graph-based test statistics. arXiv preprint arXiv:2108.07446

work page arXiv 2021

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 '...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

, Barkai, N

Alon, U. , Barkai, N. , Notterman, D. A. , Gish, K. , Ybarra, S. , Mack, D. & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96, 6745--6750

work page 1999

[4] [4]

Bajwa, M. S. , Agarwal, A. P. & Manchanda, S. (2015). Ternary search algorithm: Improvement of binary search. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE

work page 2015

[5] [5]

, Richards, W

Bhattacharjee, A. , Richards, W. G. , Staunton, J. , Li, C. , Monti, S. , Vasa, P. , Ladd, C. , Beheshti, J. , Bueno, R. , Gillette, M. et al. (2001). Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences 98, 13790--13795

work page 2001

[6] [6]

Chan, P. K. , Schlag, M. D. & Zien, J. Y. (1993). Spectral k-way ratio-cut partitioning and clustering. In Proceedings of the 30th international Design Automation Conference

work page 1993

[7] [7]

& Friedman, J

Chen, H. & Friedman, J. H. (2017). A new graph-based two-sample test for multivariate and object data. Journal of the American statistical association 112, 397--409

work page 2017

[8] [8]

Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics 20, 3583--3593

work page 2004

[9] [9]

& Jin, J

Donoho, D. & Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proceedings of the National Academy of Sciences 105, 14790--14795

work page 2008

[10] [10]

Golub, T. R. , Slonim, D. K. , Tamayo, P. , Huard, C. , Gaasenbeek, M. , Mesirov, J. P. , Coller, H. , Loh, M. L. , Downing, J. R. , Caligiuri, M. A. et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science 286, 531--537

work page 1999

[11] [11]

Gordon, G. J. , Jensen, R. V. , Hsiao, L.-L. , Gullans, S. R. , Blumenstock, J. E. , Ramaswamy, S. , Richards, W. G. , Sugarbaker, D. J. & Bueno, R. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer research 62, 4963--4967

work page 2002

[12] [12]

, Tibshirani, R

Hastie, T. , Tibshirani, R. , Friedman, J. H. & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer

work page 2009

[13] [13]

Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences. The Annals of Statistics 16, 772--783

work page 1988

[14] [14]

& Wang, W

Jin, J. & Wang, W. (2016). Influential features pca for high dimensional clustering. The Annals of Statistics 44, 2323--2359

work page 2016

[15] [15]

Leighton, F. T. & Rao, S. (1988). An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms. In FOCS, vol. 88

work page 1988

[16] [16]

& Chen, H

Liu, Y.-W. & Chen, H. (2022). A fast and efficient change-point detection framework based on approximate k -nearest neighbor graphs. IEEE Transactions on Signal Processing 70, 1976--1986

work page 2022

[17] [17]

Maaten, L. v. d. & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research 9, 2579--2605

work page 2008

[18] [18]

, Soares, J

Mwangi, B. , Soares, J. C. & Hasan, K. M. (2014). Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data. Journal of neuroscience methods 236, 19--25

work page 2014

[19] [19]

, Chatterjee, S

Rohe, K. , Chatterjee, S. & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel

work page 2011

[20] [20]

Schilling, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. Journal of the American Statistical Association 81, 799--806

work page 1986

[21] [21]

& Malik, J

Shi, J. & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22, 888--905

work page 2000

[22] [22]

, Febbo, P

Singh, D. , Febbo, P. G. , Ross, K. , Jackson, D. G. , Manola, J. , Ladd, C. , Tamayo, P. , Renshaw, A. A. , D'Amico, A. V. , Richie, J. P. et al. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1, 203--209

work page 2002

[23] [23]

& Verma, N

Singh, V. & Verma, N. K. (2022). Gene expression data analysis using feature weighted robust fuzzy-means clustering. IEEE Transactions on NanoBioscience 22, 99--105

work page 2022

[24] [24]

Su, A. I. , Welsh, J. B. , Sapinoso, L. M. , Kern, S. G. , Dimitrov, P. , Lapp, H. , Schultz, P. G. , Powell, S. M. , Moskaluk, C. A. , Frierson Jr, H. F. et al. (2001). Molecular classification of human carcinomas by use of gene expression signatures. Cancer research 61, 7388--7393

work page 2001

[25] [25]

Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing 17, 395--416

work page 2007

[26] [26]

, Klijn, J

Wang, Y. , Klijn, J. G. , Zhang, Y. , Sieuwerts, A. M. , Look, M. P. , Yang, F. , Talantov, D. , Timmermans, M. , Meijer-van Gelder, M. E. , Yu, J. et al. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet 365, 671--679

work page 2005

[27] [27]

& Leahy, R

Wu, Z. & Leahy, R. (1993). An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE transactions on pattern analysis and machine intelligence 15, 1101--1113

work page 1993

[28] [28]

Yousefi, M. R. , Hua, J. , Sima, C. & Dougherty, E. R. (2010). Reporting bias when using real data sets to analyze classification performance. Bioinformatics 26, 68--76

work page 2010

[29] [29]

& Chen, H

Zhang, Y. & Chen, H. (2021). Graph-based multiple change-point detection. arXiv preprint arXiv:2110.01170

work page arXiv 2021

[30] [30]

& Chen, H

Zhou, D. & Chen, H. (2022). Ring-cpd: Asymptotic distribution-free change-point detection for multivariate and non-euclidean data. arXiv preprint arXiv:2206.03038

work page arXiv 2022

[31] [31]

& Chen, H

Zhu, Y. & Chen, H. (2021). Limiting distributions of graph-based test statistics. arXiv preprint arXiv:2108.07446

work page arXiv 2021