pith. sign in

arxiv: 2501.06268 · v3 · submitted 2025-01-09 · 💻 cs.LG · stat.ME· stat.ML

Clustering with Uniformity- and Neighbor-Based Random Geometric Graphs

Pith reviewed 2026-05-23 05:40 UTC · model grok-4.3

classification 💻 cs.LG stat.MEstat.ML
keywords graph-based clusteringcluster catch digraphsnearest neighbor distancespatial randomness testmoderate-dimensional datauniform noiseclustering algorithms
0
0 comments X

The pith

A nearest-neighbor Monte Carlo test sets covering radii for Cluster Catch Digraphs in moderate dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Uniformity- and Neighbor-based Cluster Catch Digraphs that replace Ripley's K function with a nearest-neighbor-distance Monte Carlo spatial randomness test to choose covering radii. This change targets datasets of moderate size and dimension that contain complex cluster shapes plus uniform background noise. Monte Carlo simulations and runs on benchmark data show the resulting method delivers stable performance that matches several standard clustering algorithms. The approach requires almost no user-set parameters. The work also maps the computational costs and the data regimes where the method works best.

Core claim

UN-CCDs determine covering radii via a nearest-neighbor-distance Monte Carlo spatial randomness test, which avoids the dimensionality degradation of Ripley's K function. This allows the graph-based method to deliver stable and competitive clustering performance on moderate-dimensional datasets with complex geometry and uniform noise, as shown in Monte Carlo simulations and benchmark experiments, while remaining largely parameter-free.

What carries the argument

The nearest-neighbor-distance based Monte Carlo spatial randomness test (MC-SRT) used to set covering radii for Uniformity- and Neighbor-based Cluster Catch Digraphs.

If this is right

  • UN-CCDs achieve stable performance comparable to established clustering methods on evaluated benchmark datasets.
  • The approach stays largely parameter-free across the tested settings.
  • It suits datasets of moderate size and dimension with complex cluster geometry and uniform background noise.
  • Practical regimes of effectiveness are identified along with computational trade-offs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the MC-SRT holds beyond the tested range, the method could extend to additional moderate-to-high dimension regimes.
  • The uniformity-based radius choice might pair with other graph constructions to handle non-uniform noise.
  • Largely parameter-free operation could support automated pipelines where manual tuning is impractical.

Load-bearing premise

The nearest-neighbor-distance Monte Carlo spatial randomness test determines appropriate covering radii without performance loss as dimensionality grows.

What would settle it

New simulations showing UN-CCDs lose accuracy or stability at the same rate as prior CCD variants when dimension increases would challenge the central claim.

Figures

Figures reproduced from arXiv: 2501.06268 by Elvan Ceyhan, Nedret Billor, Rui Shi.

Figure 1
Figure 1. Figure 1: A illustration of clustering with UN-CCDs. [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Realizations of the simulation settings with 2, 3, and 5 uniform clusters in [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The line plots of the ARIs of KS-CCDs, under the uniform cluster settings. [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The line plots of the ARIs of RK-CCDs, under the uniform cluster settings. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The line plots of the ARIs of UN-CCDs, under the uniform cluster settings. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Realizations of the simulation settings with 2, 3, and 5 Gaussian clusters in [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The line plots of the ARIs of KS-CCDs, under the Gaussian cluster settings. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The line plots of the ARIs of RK-CCDs, under the Gaussian cluster settings. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The line plots of the ARIs of UN-CCDs, under the Gaussian cluster settings. [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Some realizations of the simulation studying the impact of different noise [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The figures of asymmetric, R15, and D31 datasets [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
read the original abstract

We propose a graph-based clustering method based on Cluster Catch Digraphs (CCDs) that extends their applicability to moderate-dimensional data settings. Existing CCD variants, such as RK-CCDs, rely on spatial randomness tests based on Ripley's K function, which exhibit performance degradation as dimensionality increases. To address this limitation, we introduce a nearest-neighbor-distance (NND) based Monte Carlo spatial randomness test (MC-SRT) for determining covering radii, resulting in the proposed Uniformity- and Neighbor-based CCDs (UN-CCDs). The proposed method is designed for datasets of moderate size and dimension, particularly in settings with complex cluster geometry and uniformly distributed background noise. Through Monte Carlo simulations and experiments on benchmark datasets, we show that UN-CCDs provide stable and competitive performance relative to several established clustering methods within the evaluated regimes, while remaining largely parameter-free. We also discuss computational trade-offs and identify the practical regimes in which the method is most effective. -- Keywords: Graph-based clustering; Cluster catch digraphs; Moderate-dimensional data; the nearest neighbor distance; Spatial randomness test.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes Uniformity- and Neighbor-based Cluster Catch Digraphs (UN-CCDs) as an extension of Cluster Catch Digraphs (CCDs) for moderate-dimensional data. It replaces the Ripley's K function used in prior RK-CCDs with a nearest-neighbor-distance based Monte Carlo spatial randomness test (MC-SRT) to set covering radii. The central empirical claim, supported by Monte Carlo simulations and benchmark dataset experiments, is that UN-CCDs deliver stable and competitive performance versus established clustering methods in regimes with complex cluster geometry and uniform background noise, while remaining largely parameter-free; the manuscript also discusses computational trade-offs and practical applicability regimes.

Significance. If the Monte Carlo and benchmark results are robust, the work supplies a concrete, largely parameter-free alternative for graph-based clustering in moderate dimensions where spatial randomness tests degrade, addressing a documented limitation of Ripley's K. The explicit scoping to evaluated regimes and discussion of trade-offs strengthens the practical contribution to the CCD and random geometric graph literature.

minor comments (2)
  1. [Abstract] Abstract: the claim of 'stable and competitive performance' is stated without any numerical results, error bars, dataset sizes, or baseline names, making immediate assessment of the central empirical claim difficult.
  2. [Method] The manuscript should clarify in §3 or §4 whether the MC-SRT covering-radius selection introduces any tunable parameters (e.g., number of Monte Carlo replicates or significance level) that are fixed across all experiments.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on reported experiments

full rationale

The paper introduces UN-CCDs via a new NND-based MC-SRT as an alternative to Ripley's K for determining covering radii, then evaluates performance through Monte Carlo simulations and benchmark datasets. No equations, fitted parameters, or self-citations are shown to reduce any central claim to its own inputs by construction. The method is scoped as largely parameter-free with explicit discussion of regimes and trade-offs; the performance statements are falsifiable on the external benchmarks and simulations rather than being tautological. This is the standard case of a self-contained empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are detailed. The approach relies on standard Monte Carlo testing and nearest-neighbor distances, which are established techniques in spatial statistics.

pith-pipeline@v0.9.0 · 5745 in / 1226 out tokens · 38560 ms · 2026-05-23T05:40:38.202102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 1 internal anchor

  1. [1]

    Automatic subspace clustering of high dimensional data for data mining ap- plications

    Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Ragha- van. Automatic subspace clustering of high dimensional data for data mining ap- plications. InProceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 94–105, 1998

  2. [2]

    Optics: Ordering points to identify the clustering structure.ACM Sigmod Record, 28(2):49– 60, 1999

    Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and J¨ org Sander. Optics: Ordering points to identify the clustering structure.ACM Sigmod Record, 28(2):49– 60, 1999

  3. [3]

    k-means++: The advantages of careful seeding

    David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. Technical report, Stanford, 2006

  4. [4]

    Clustering Via Nonparametric Density Estimation: the R Package pdfCluster

    Adelchi Azzalini and Giovanna Menardi. Clustering via nonparametric density estimation: The r package pdfcluster.ArXiv Preprint ArXiv:1301.6559, 2013

  5. [5]

    Simple monte carlo tests for spatial pattern

    Julian Besag and Peter J Diggle. Simple monte carlo tests for spatial pattern. Journal of the Royal Statistical Society Series C: Applied Statistics, 26(3):327–333, 1977

  6. [6]

    nearest neighbor

    Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is “nearest neighbor” meaningful? InDatabase Theory—ICDT’99: 7th International Conference Jerusalem, Israel, January 10–12, 1999 Proceedings 7, pages 217–235. Springer, 1999

  7. [7]

    Latent dirichlet allocation

    David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993–1022, 2003

  8. [8]

    Fast unfolding of communities in large networks.Journal of Statistical Me- chanics: Theory and Experiment, 2008(10):P10008, 2008

    Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefeb- vre. Fast unfolding of communities in large networks.Journal of Statistical Me- chanics: Theory and Experiment, 2008(10):P10008, 2008

  9. [9]

    Structural deep clustering network

    Deyu Bo, Xiao Wang, Chuan Shi, Meiqi Zhu, Emiao Lu, and Peng Cui. Structural deep clustering network. InProceedings of the Web Conference 2020, pages 1400– 1410, 2020

  10. [10]

    A greedy heuristic for the set-covering problem.Mathematics of Operations Research, 4(3):233–235, 1979

    Vasek Chvatal. A greedy heuristic for the set-covering problem.Mathematics of Operations Research, 4(3):233–235, 1979. 30

  11. [11]

    Distance to nearest neighbor as a measure of spatial relationships in populations.Ecology, 35(4):445–453, 1954

    Philip J Clark and Francis C Evans. Distance to nearest neighbor as a measure of spatial relationships in populations.Ecology, 35(4):445–453, 1954

  12. [12]

    Generalization of a nearest neighbor measure of dispersion for use inkdimensions.Ecology, 60(2):316–317, 1979

    Philip J Clark and Francis C Evans. Generalization of a nearest neighbor measure of dispersion for use inkdimensions.Ecology, 60(2):316–317, 1979

  13. [13]

    Mean shift: A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002

    Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002

  14. [14]

    PhD thesis, Johns Hopkins University, 2003

    Jason G DeVinney.The class cover problem and its applications in pattern recog- nition. PhD thesis, Johns Hopkins University, 2003

  15. [15]

    Ripley’s k function

    Philip M Dixon, Abdel H El-Shaarawi, and Walter W Piegorsch. Ripley’s k function. Encyclopedia of Environmetrics, 3, 2012

  16. [16]

    Berkeley: University of California Press, 1932

    Harold Edson Driver and Alfred Louis Kroeber.Quantitative expression of cultural relationships, volume 31. Berkeley: University of California Press, 1932

  17. [17]

    Uci machine learning repository, 2017

    Dheeru Dua and Casey Graff. Uci machine learning repository, 2017

  18. [18]

    A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters

    Joseph C Dunn. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973

  19. [19]

    Clus- ter analysis and display of genome-wide expression patterns.Proceedings of the National Academy of Sciences, 95(25):14863–14868, 1998

    Michael B Eisen, Paul T Spellman, Patrick O Brown, and David Botstein. Clus- ter analysis and display of genome-wide expression patterns.Proceedings of the National Academy of Sciences, 95(25):14863–14868, 1998

  20. [20]

    A density-based algorithm for discovering clusters in large spatial databases with noise

    Martin Ester, Hans-Peter Kriegel, J¨ org Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. InPro- ceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), volume 96, pages 226–231, Portland, Oregon, USA, 1996. AAAI Press

  21. [21]

    Absalom E Ezugwu, Abiodun M Ikotun, Olaide O Oyelade, Laith Abualigah, Jef- fery O Agushaka, Christopher I Eke, and Andronicus A Akinyelu. A comprehen- sive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects.Engineering Applications of Artificial Intelligence, 110:104743, 2022. 31

  22. [22]

    Knowledge acquisition via incremental conceptual clustering

    Douglas H Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172, 1987

  23. [23]

    K-means properties on six clustering benchmark datasets, 2018

    Pasi Fr¨ anti and Sami Sieranoja. K-means properties on six clustering benchmark datasets, 2018

  24. [24]

    Society for Industrial and Applied Mathematics, China, 2020

    Guojun Gan, Chaoqun Ma, and Jianhong Wu.Data clustering: theory, algorithms, and applications. Society for Industrial and Applied Mathematics, China, 2020

  25. [25]

    Spinger, New York City, New York, 2013

    James Gareth, Witten Daniela, Hastie Trevor, and Tibshirani Robert.An intro- duction to statistical learning: with applications in R. Spinger, New York City, New York, 2013

  26. [26]

    Community structure in social and biolog- ical networks.Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002

    Michelle Girvan and Mark EJ Newman. Community structure in social and biolog- ical networks.Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002

  27. [27]

    Cure: An efficient clustering algorithm for large databases.ACM Sigmod Record, 27(2):73–84, 1998

    Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. Cure: An efficient clustering algorithm for large databases.ACM Sigmod Record, 27(2):73–84, 1998

  28. [28]

    Rock: A robust clustering algorithm for categorical attributes.Information Systems, 25(5):345–366, 2000

    Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. Rock: A robust clustering algorithm for categorical attributes.Information Systems, 25(5):345–366, 2000

  29. [29]

    Bibliothek der Universit¨ at, Konstanz, Germany, 1998

    Alexander Hinneburg, Daniel A Keim, et al.An efficient approach to clustering in large multimedia databases with noise, volume 98. Bibliothek der Universit¨ at, Konstanz, Germany, 1998

  30. [30]

    Approximation algorithms for the set covering and vertex cover problems.SIAM Journal on Computing, 11(3):555–556, 1982

    Dorit S Hochbaum. Approximation algorithms for the set covering and vertex cover problems.SIAM Journal on Computing, 11(3):555–556, 1982

  31. [31]

    Network-adjusted covariates for community detec- tion.Biometrika, 111(4):1221–1240, 2024

    Yaofang Hu and Wanjie Wang. Network-adjusted covariates for community detec- tion.Biometrika, 111(4):1221–1240, 2024

  32. [32]

    Comparing partitions.Journal of Classifica- tion, 2:193–218, 1985

    Lawrence Hubert and Phipps Arabie. Comparing partitions.Journal of Classifica- tion, 2:193–218, 1985

  33. [33]

    Reducibility among combinatorial problems

    Richard M Karp. Reducibility among combinatorial problems. InComplexity of Computer Computations, pages 85–103. Springer, 1972

  34. [34]

    Chameleon: Hierarchical clus- tering using dynamic modeling.Computer, 32(8):68–75, 1999

    George Karypis, Eui-Hong Han, and Vipin Kumar. Chameleon: Hierarchical clus- tering using dynamic modeling.Computer, 32(8):68–75, 1999. 32

  35. [35]

    John Wiley & Sons, 2009

    Leonard Kaufman and Peter J Rousseeuw.Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, 2009

  36. [36]

    The possibilistic c-means algorithm: insights and recommendations.IEEE Transactions on Fuzzy Systems, 4(3):385–393, 1996

    Raghuram Krishnapuram and James M Keller. The possibilistic c-means algorithm: insights and recommendations.IEEE Transactions on Fuzzy Systems, 4(3):385–393, 1996

  37. [37]

    Classification and analysis of multivariate observations

    James MacQueen. Classification and analysis of multivariate observations. In5th Berkeley Symp. Math. Statist. Probability, pages 281–297, 1967

  38. [38]

    Classification of imbalanced data with a geometric digraph family.The Journal of Machine Learning Research, 17(1):6504– 6543, 2016

    Art¨ ur Manukyan and Elvan Ceyhan. Classification of imbalanced data with a geometric digraph family.The Journal of Machine Learning Research, 17(1):6504– 6543, 2016

  39. [39]

    Parameter free clustering with cluster catch digraphs (technical report).ArXiv Preprint ArXiv:1912.11926, 2019

    Art¨ ur Manukyan and Elvan Ceyhan. Parameter free clustering with cluster catch digraphs (technical report).ArXiv Preprint ArXiv:1912.11926, 2019

  40. [40]

    John Wiley & Sons, Hoboken, New Jersey, 2005

    David J Marchette.Random graphs for statistical pattern recognition. John Wiley & Sons, Hoboken, New Jersey, 2005

  41. [41]

    An examination of procedures for deter- mining the number of clusters in a data set.Psychometrika, 50:159–179, 1985

    Glenn W Milligan and Martha C Cooper. An examination of procedures for deter- mining the number of clusters in a data set.Psychometrika, 50:159–179, 1985

  42. [42]

    On spectral clustering: Analysis and an algorithm.Advances in neural information processing systems, 14, 2001

    Andrew Ng, Michael Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm.Advances in neural information processing systems, 14, 2001

  43. [43]

    Clarans: A method for clustering objects for spatial data mining.IEEE Transactions on Knowledge and Data Engineering, 14(5):1003– 1016, 2002

    Raymond T Ng and Jiawei Han. Clarans: A method for clustering objects for spatial data mining.IEEE Transactions on Knowledge and Data Engineering, 14(5):1003– 1016, 2002

  44. [44]

    A possibilistic fuzzy c-means clustering algorithm.IEEE Transactions on Fuzzy Systems, 13(4):517–530, 2005

    Nikhil R Pal, Kuhu Pal, James M Keller, and James C Bezdek. A possibilistic fuzzy c-means clustering algorithm.IEEE Transactions on Fuzzy Systems, 13(4):517–530, 2005

  45. [45]

    Clas- sification using class cover catch digraphs.Journal of Classification, 20:3–23, 2003

    Carey E Priebe, David J Marchette, Diego Socolinsky, and Jason DeVinney. Clas- sification using class cover catch digraphs.Journal of Classification, 20:3–23, 2003

  46. [46]

    An introduction to hidden markov mod- els.IEEE Assp Magazine, 3(1):4–16, 1986

    Lawrence Rabiner and Biinghwang Juang. An introduction to hidden markov mod- els.IEEE Assp Magazine, 3(1):4–16, 1986. 33

  47. [47]

    Compre- hensive survey on hierarchical clustering algorithms and the recent developments

    Xingcheng Ran, Yue Xi, Yonggang Lu, Xiangwen Wang, and Zhenyu Lu. Compre- hensive survey on hierarchical clustering algorithms and the recent developments. Artificial Intelligence Review, 56(8):8219–8264, 2023

  48. [48]

    The infinite gaussian mixture model.Advances in Neural Infor- mation Processing Systems, 12, 1999

    Carl Rasmussen. The infinite gaussian mixture model.Advances in Neural Infor- mation Processing Systems, 12, 1999

  49. [49]

    Can the number of clusters be determined by external indices?IEEE Access, 8:89239–89257, 2020

    Mohammad Rezaei and Pasi Fr¨ anti. Can the number of clusters be determined by external indices?IEEE Access, 8:89239–89257, 2020

  50. [50]

    The second-order analysis of stationary point processes.Journal of Applied Probability, 13(2):255–266, 1976

    Brian D Ripley. The second-order analysis of stationary point processes.Journal of Applied Probability, 13(2):255–266, 1976

  51. [51]

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics, 20:53–65, 1987

    Peter J Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics, 20:53–65, 1987

  52. [52]

    A review of clustering techniques and developments.Neurocomputing, 267:664–681, 2017

    Amit Saxena, Mukesh Prasad, Akshansh Gupta, Neha Bharill, Om Prakash Pa- tel, Aruna Tiwari, Meng Joo Er, Weiping Ding, and Chin-Teng Lin. A review of clustering techniques and developments.Neurocomputing, 267:664–681, 2017

  53. [53]

    Dbscan revisited, revisited: why and how you should (still) use dbscan.ACM Transactions on Database Systems (TODS), 42(3):1–21, 2017

    Erich Schubert, J¨ org Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. Dbscan revisited, revisited: why and how you should (still) use dbscan.ACM Transactions on Database Systems (TODS), 42(3):1–21, 2017

  54. [54]

    Wavecluster: a wavelet-based clustering approach for spatial data in very large databases.The VLDB Journal, 8(3):289–304, 2000

    Gholamhosein Sheikholeslami, Surojit Chatterjee, and Aidong Zhang. Wavecluster: a wavelet-based clustering approach for spatial data in very large databases.The VLDB Journal, 8(3):289–304, 2000

  55. [55]

    Bayesian community detection for networks with covariates.Bayesian Analysis, 20(3):735–762, 2025

    Luyi Shen, Arash Amini, Nathaniel Josephs, and Lizhen Lin. Bayesian community detection for networks with covariates.Bayesian Analysis, 20(3):735–762, 2025

  56. [56]

    Outlier detection with cluster catch digraphs.arXiv preprint arXiv:2409.11596, 2024

    Rui Shi, Nedret Billor, and Elvan Ceyhan. Outlier detection with cluster catch digraphs.arXiv preprint arXiv:2409.11596, 2024

  57. [57]

    Outlyingness scores with cluster catch digraphs.arXiv preprint arXiv:2501.05530, 2025

    Rui Shi, Nedret Billor, and Elvan Ceyhan. Outlyingness scores with cluster catch digraphs.arXiv preprint arXiv:2501.05530, 2025

  58. [58]

    Thorvald Sorensen. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons.Biologiske Skrifter, 5:1–34, 1948. 34

  59. [59]

    A robust spectral clustering algorithm for sub-gaussian mixture models with outliers.Oper- ations Research, 71(1):224–244, 2023

    Prateek R Srivastava, Purnamrita Sarkar, and Grani A Hanasusanto. A robust spectral clustering algorithm for sub-gaussian mixture models with outliers.Oper- ations Research, 71(1):224–244, 2023

  60. [60]

    From louvain to leiden: guaranteeing well-connected communities.Scientific Reports, 9(1):1–12, 2019

    Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvain to leiden: guaranteeing well-connected communities.Scientific Reports, 9(1):1–12, 2019

  61. [61]

    Veenman, Marcel J

    Cor J. Veenman, Marcel J. T. Reinders, and Eric Backer. A maximum variance clus- ter algorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1273–1280, 2002

  62. [62]

    Clustering of the self-organizing map.IEEE Transactions on Neural Networks, 11(3):586–600, 2000

    Juha Vesanto and Esa Alhoniemi. Clustering of the self-organizing map.IEEE Transactions on Neural Networks, 11(3):586–600, 2000

  63. [63]

    Progress in outlier detection techniques: A survey.IEEE Access, 7:107964–108000, 2019

    Hongzhi Wang, Mohamed Jaward Bah, and Mohamed Hammad. Progress in outlier detection techniques: A survey.IEEE Access, 7:107964–108000, 2019

  64. [64]

    Sting: A statistical information grid approach to spatial data mining

    Wei Wang, Jiong Yang, Richard Muntz, et al. Sting: A statistical information grid approach to spatial data mining. InVldb, volume 97, pages 186–195. Citeseer, 1997

  65. [65]

    Hierarchical grouping to optimize an objective function.Journal of the American Statistical Association, 58(301):236–244, 1963

    Joe H Ward Jr. Hierarchical grouping to optimize an objective function.Journal of the American Statistical Association, 58(301):236–244, 1963

  66. [66]

    Springer Science & Business Media, 2000

    Michel Wedel and Wagner A Kamakura.Market segmentation: Conceptual and methodological foundations. Springer Science & Business Media, 2000

  67. [67]

    Springer Publication, New York City, New York, 2013

    Daniela Witten and Gareth James.An introduction to statistical learning with applications in R. Springer Publication, New York City, New York, 2013

  68. [68]

    A distribution- based clustering algorithm for mining in large spatial databases

    Xiaowei Xu, Martin Ester, Hans-Peter Kriegel, and J¨ org Sander. A distribution- based clustering algorithm for mining in large spatial databases. InProceedings 14th International Conference on Data Engineering, pages 324–331. IEEE, 1998

  69. [69]

    A rapid review of clustering algorithms.ArXiv Preprint ArXiv:2401.07389, 2024

    Hui Yin, Amir Aryani, Stephen Petrie, Aishwarya Nambissan, Aland Astudillo, and Shengyuan Cao. A rapid review of clustering algorithms.ArXiv Preprint ArXiv:2401.07389, 2024

  70. [70]

    Graph-theoretical methods for detecting and describing gestalt clusters.IEEE Transactions on Computers, 100(1):68–86, 1971

    Charles T Zahn. Graph-theoretical methods for detecting and describing gestalt clusters.IEEE Transactions on Computers, 100(1):68–86, 1971. 35

  71. [71]

    A novel kernelized fuzzy c-means algorithm with application in medical image segmentation.Artificial Intelligence in Medicine, 32(1):37–50, 2004

    Dao-Qiang Zhang and Song-Can Chen. A novel kernelized fuzzy c-means algorithm with application in medical image segmentation.Artificial Intelligence in Medicine, 32(1):37–50, 2004

  72. [72]

    Advancements of outlier detection: A survey.ICST Transactions on Scalable Information Systems, 13(1):1–26, 2013

    Ji Zhang. Advancements of outlier detection: A survey.ICST Transactions on Scalable Information Systems, 13(1):1–26, 2013

  73. [73]

    Clustering in dynamic spatial databases

    Ji Zhang, Wynne Hsu, and Mong Li Lee. Clustering in dynamic spatial databases. Journal of Intelligent Information Systems, 24(1):5–27, 2005

  74. [74]

    Birch: an efficient data clustering method for very large databases.ACM Sigmod Record, 25(2):103–114, 1996

    Tian Zhang, Raghu Ramakrishnan, and Miron Livny. Birch: an efficient data clustering method for very large databases.ACM Sigmod Record, 25(2):103–114, 1996. 36