pith. sign in

arxiv: 2503.00379 · v3 · submitted 2025-03-01 · 💻 cs.LG · stat.ML

Improving clustering quality evaluation in noisy Gaussian mixtures

Pith reviewed 2026-05-23 01:36 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords clusteringcluster validity indicesfeature importance rescalingnoisy dataGaussian mixturesunsupervised evaluationfeature dispersion
0
0 comments X

The pith

Feature Importance Rescaling improves how well cluster validity indices match ground truth in noisy Gaussian mixtures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Feature Importance Rescaling (FIR) to adjust each feature's weight according to its dispersion before applying standard cluster validity indices. This step reduces the influence of noisy or irrelevant dimensions, which otherwise distort measures of compactness and separation. Experiments on synthetic Gaussian mixture data across varied noise levels and overlap show FIR raises the correlation between index values and true labels. A real-world case study confirms the same pattern. The approach leaves the underlying indices unchanged while making their outputs more reliable when external labels are absent.

Core claim

We introduce a theoretically grounded Feature Importance Rescaling (FIR) method that enhances the quality of clustering validation by adjusting feature contributions based on their dispersion. It attenuates noise features, clarifies clustering compactness and separation, and thereby aligns clustering validation more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations and a case study on real-world data, we demonstrate that FIR consistently improves the correlation between the values of cluster validity indices and the ground truth, particularly in settings with noisy or irrelevant features.

What carries the argument

Feature Importance Rescaling (FIR), a preprocessing step that rescales each feature inversely to its dispersion so that low-dispersion (noisy) features contribute less to subsequent validity calculations.

If this is right

  • FIR raises the correlation of Average Silhouette Width, Calinski-Harabasz and Davies-Bouldin indices with ground truth across multiple noise regimes.
  • The improvement holds when clusters overlap substantially.
  • Variability of index performance across different data realizations decreases after rescaling.
  • The method remains effective on real data containing both relevant and irrelevant features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • FIR could be inserted as a standard preprocessing step before any distance-based validity index, not only the three tested here.
  • The same dispersion-based weighting might improve the clustering step itself when used inside k-means or similar algorithms.
  • In extremely high-dimensional settings the method may need to be combined with explicit feature selection to avoid amplifying very low-dispersion but still uninformative coordinates.

Load-bearing premise

Rescaling features according to their dispersion will reduce the distorting effect of noise features on measures of cluster compactness and separation.

What would settle it

A controlled experiment on synthetic Gaussian mixtures in which applying FIR produces lower Pearson or Spearman correlation between validity index scores and ground-truth cluster quality than the unscaled indices.

Figures

Figures reproduced from arXiv: 2503.00379 by Renato Cordeiro de Amorim, Vladimir Makarenkov.

Figure 1
Figure 1. Figure 1: Projection of the data sets onto their first two principal components after applying PCA, and [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
read the original abstract

Clustering is a well-established technique in machine learning and data analysis, widely used across various domains. Cluster validity indices, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by different degrees of feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. We introduce a theoretically grounded Feature Importance Rescaling (FIR) method that enhances the quality of clustering validation by adjusting feature contributions based on their dispersion. It attenuates noise features, clarifies clustering compactness and separation, and thereby aligns clustering validation more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations and a case study on real-world data, we demonstrate that FIR consistently improves the correlation between the values of cluster validity indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement of clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is unavailable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Feature Importance Rescaling (FIR), a preprocessing step that rescales features in Gaussian mixture data according to their dispersion in order to attenuate noise/irrelevant features, thereby improving the correlation of standard cluster validity indices (Silhouette, Calinski-Harabasz, Davies-Bouldin) with ground-truth labels. The claim is supported by experiments on synthetic GMMs under varied noise, dimensionality, and overlap regimes plus one real-world case study.

Significance. If the rescaling mechanism is shown to attenuate noise without distorting separation, FIR would supply a lightweight, interpretable enhancement to internal cluster validation that is directly applicable to the common setting of noisy high-dimensional data; the experimental demonstration of consistent correlation gains across configurations is a concrete strength.

major comments (3)
  1. [§3] §3 (FIR definition): the mapping from per-feature dispersion to the rescaling multiplier must be stated explicitly (e.g., inverse dispersion, normalized dispersion, or other functional form). In a GMM the total dispersion of a feature equals within-cluster variance plus between-cluster variance; without the exact formula it is impossible to verify that the procedure preferentially down-weights noise rather than relevant separating dimensions.
  2. [Experimental results (Tables 2–4)] Experimental results (Tables 2–4 and associated figures): the reported Pearson/Spearman correlations improve under FIR, yet the tables do not include an ablation that reverses the rescaling direction (direct vs. inverse dispersion). This control is load-bearing for the central claim that improvement arises from noise attenuation rather than from amplifying between-cluster signal.
  3. [§4.3] §4.3 (overlap regime): when clusters exhibit substantial overlap the between-cluster component of dispersion shrinks; the paper must demonstrate that FIR still improves index–ground-truth correlation in this regime, or qualify the claim that the method remains effective “even when clusters exhibit significant overlap.”
minor comments (2)
  1. [§2] Notation for dispersion (e.g., σ_j^2) should be defined once in §2 and used consistently; several equations reuse the symbol without redefinition.
  2. [Figures] Figure captions should state the exact number of Monte-Carlo repetitions and the precise correlation coefficient (Pearson or Spearman) plotted.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the FIR method and its evaluation. We will revise the manuscript to address the concerns about explicit definition, ablation controls, and overlap regime analysis.

read point-by-point responses
  1. Referee: [§3] §3 (FIR definition): the mapping from per-feature dispersion to the rescaling multiplier must be stated explicitly (e.g., inverse dispersion, normalized dispersion, or other functional form). In a GMM the total dispersion of a feature equals within-cluster variance plus between-cluster variance; without the exact formula it is impossible to verify that the procedure preferentially down-weights noise rather than relevant separating dimensions.

    Authors: We agree that the exact functional mapping must be stated explicitly in §3 to permit verification of the noise-attenuation mechanism. The revised manuscript will include the precise formula relating per-feature dispersion to the rescaling multiplier. revision: yes

  2. Referee: [Experimental results (Tables 2–4)] Experimental results (Tables 2–4 and associated figures): the reported Pearson/Spearman correlations improve under FIR, yet the tables do not include an ablation that reverses the rescaling direction (direct vs. inverse dispersion). This control is load-bearing for the central claim that improvement arises from noise attenuation rather than from amplifying between-cluster signal.

    Authors: We acknowledge that an ablation reversing the rescaling direction would provide stronger evidence that gains arise specifically from noise attenuation. We will add this control experiment to the revised version of Tables 2–4 and the associated discussion. revision: yes

  3. Referee: [§4.3] §4.3 (overlap regime): when clusters exhibit substantial overlap the between-cluster component of dispersion shrinks; the paper must demonstrate that FIR still improves index–ground-truth correlation in this regime, or qualify the claim that the method remains effective “even when clusters exhibit significant overlap.”

    Authors: The current experiments already span multiple overlap regimes and the abstract reports effectiveness under significant overlap. To directly address the referee’s concern, we will expand §4.3 with additional tabulated results or a qualification of the claim for the high-overlap case. revision: partial

Circularity Check

0 steps flagged

No significant circularity in FIR derivation

full rationale

The paper defines FIR as an explicit rescaling procedure based on measured feature dispersion, then validates the resulting improvement in index-ground-truth correlation via experiments on synthetic Gaussian mixtures and one real-world case study. No equations or claims reduce the proposed adjustment to its own inputs by construction, no uniqueness theorems are imported from self-citations, and the central empirical result is not forced by the fitting process itself. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Ledger based on abstract only; full paper may have more.

axioms (1)
  • domain assumption Dispersion of a feature indicates its relevance or noise level
    The FIR method relies on this to adjust contributions.

pith-pipeline@v0.9.0 · 5752 in / 919 out tokens · 26697 ms · 2026-05-23T01:36:54.683271+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    Community detec- tion in feature-rich networks using data recov- ery approach,

    B. Mirkin and S. Shalileh, “Community detec- tion in feature-rich networks using data recov- ery approach,” Journal of Classification, vol. 39, no. 3, pp. 432–462, 2022

  2. [2]

    A comprehen- sive survey of image segmentation: clustering methods, performance parameters, and bench- mark datasets,

    H. Mittal, A. C. Pandey, M. Saraswat, S. Ku- mar, R. Pal, and G. Modwel, “A comprehen- sive survey of image segmentation: clustering methods, performance parameters, and bench- mark datasets,” Multimedia Tools and Applica- tions, pp. 1–26, 2022

  3. [3]

    K-means cluster- ing algorithms: A comprehensive review, vari- ants analysis, and advances in the era of big data,

    A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means cluster- ing algorithms: A comprehensive review, vari- ants analysis, and advances in the era of big data,” Information Sciences, vol. 622, pp. 178– 210, 2023

  4. [4]

    Between sound and spelling: combining phonetics and clustering algorithms to improve target word re- covery,

    M. Zampieri and R. C. De Amorim, “Between sound and spelling: combining phonetics and clustering algorithms to improve target word re- covery,” in Advances in Natural Language Pro- cessing: 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17- 19, 2014. Proceedings 9 , pp. 438–449, Springer, 2014

  5. [5]

    Comprehensive survey on hierarchical cluster- ing algorithms and the recent developments,

    X. Ran, Y. Xi, Y. Lu, X. Wang, and Z. Lu, “Comprehensive survey on hierarchical cluster- ing algorithms and the recent developments,” Artificial Intelligence Review , vol. 56, no. 8, pp. 8219–8264, 2023

  6. [6]

    Data cluster- ing: application and trends,

    G. J. Oyewole and G. A. Thopil, “Data cluster- ing: application and trends,” Artificial intelli- gence review, vol. 56, no. 7, pp. 6439–6475, 2023

  7. [7]

    An extensive compara- tive study of cluster validity indices,

    O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. P´ erez, and I. Perona, “An extensive compara- tive study of cluster validity indices,” Pattern recognition, vol. 46, no. 1, pp. 243–256, 2013

  8. [8]

    Extended multivariate compar- ison of 68 cluster validity indices. a review,

    R. Todeschini, D. Ballabio, V. Termopoli, and V. Consonni, “Extended multivariate compar- ison of 68 cluster validity indices. a review,” Chemometrics and Intelligent Laboratory Sys- tems, vol. 251, p. 105117, 2024

  9. [9]

    Inertia-based indices to deter- mine the number of clusters in k-means: an ex- perimental evaluation,

    A. Rykov, R. C. De Amorim, V. Makarenkov, and B. Mirkin, “Inertia-based indices to deter- mine the number of clusters in k-means: an ex- perimental evaluation,” IEEE Access, vol. 12, pp. 11761–11773, 2024

  10. [10]

    Some methods for clas- sification and analysis of multivariate observa- tions,

    J. MacQueen et al. , “Some methods for clas- sification and analysis of multivariate observa- tions,” in Proceedings of the fifth Berkeley sym- posium on mathematical statistics and probabil- ity, vol. 1, pp. 281–297, Oakland, CA, USA, 1967

  11. [11]

    Cluster analysis: A modern statistical review,

    A. Jaeger and D. Banks, “Cluster analysis: A modern statistical review,” Wiley Interdis- ciplinary Reviews: Computational Statistics , vol. 15, no. 3, p. e1597, 2023

  12. [12]

    Data clustering: 50 years beyond k-means,

    A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern recognition letters , vol. 31, no. 8, pp. 651–666, 2010

  13. [13]

    An extensive empirical comparison of k-means initialization algorithms,

    S. Harris and R. C. De Amorim, “An extensive empirical comparison of k-means initialization algorithms,” IEEE Access, vol. 10, pp. 58752– 58768, 2022

  14. [14]

    How much can k- means be improved by using better initializa- tion and repeats?,

    P. Fr¨ anti and S. Sieranoja, “How much can k- means be improved by using better initializa- tion and repeats?,” Pattern Recognition, vol. 93, pp. 95–112, 2019

  15. [15]

    k-means++: the advantages of careful seeding,

    D. Arthur, “k-means++: the advantages of careful seeding,” in Proceedings of the eigh- teenth annual ACM-SIAM symposium on Dis- crete algorithms, New Orleans, Louisiana, 2007 , pp. 1027–1035, Society for Industrial and Ap- plied Mathematics, 2007

  16. [16]

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

    P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987

  17. [17]

    A dendrite method for cluster analysis,

    T. Cali´ nski and J. Harabasz, “A dendrite method for cluster analysis,” Communications 14 in Statistics-theory and Methods , vol. 3, no. 1, pp. 1–27, 1974

  18. [18]

    A cluster separation measure,

    D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE transactions on pat- tern analysis and machine intelligence , no. 2, pp. 224–227, 1979

  19. [19]

    An impossibility theorem for clus- tering,

    J. Kleinberg, “An impossibility theorem for clus- tering,” Advances in neural information process- ing systems, vol. 15, 2002

  20. [20]

    Comparing parti- tions,

    L. Hubert and P. Arabie, “Comparing parti- tions,” Journal of classification, vol. 2, pp. 193– 218, 1985

  21. [21]

    Visualizing data using t-sne.,

    L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11, 2008. 15