Scalable unsupervised feature selection via weight stability

Renato Cordeiro de Amorim; Xudong Zhang

arxiv: 2506.06114 · v5 · pith:XHENOZMVnew · submitted 2025-06-06 · 💻 cs.LG

Scalable unsupervised feature selection via weight stability

Xudong Zhang , Renato Cordeiro de Amorim This is my paper

Pith reviewed 2026-05-22 01:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords unsupervised feature selectionMinkowski weighted k-meansweight stabilityclusteringfeature relevancescalable algorithmshigh-dimensional data

0 comments

The pith

Minkowski weighted k-means assigns higher weights to relevant features than noise features across a range of exponents under explicit assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops methods to pick out useful features from high-dimensional data for clustering without labels. It builds on Minkowski weighted k-means by first using a smart way to pick starting points based on how relevant each feature seems. Then it runs this with different exponents in the Minkowski distance and keeps features whose weights stay high no matter which exponent is used. A theory part shows why relevant features should stand out this way if the data has certain noise and cluster properties. They also make a faster version that works on data samples instead of the whole set.

Core claim

Under explicit assumptions on noise features and cluster structure, relevant features are assigned consistently higher weights than noise features across a range of Minkowski exponents in the weighted k-means algorithm.

What carries the argument

Aggregation of feature weights from the Minkowski weighted k-means++ initialisation over multiple Minkowski exponents to detect stable relevant features.

If this is right

FS-MWK++ identifies stable and informative features by weight aggregation.
SFS-MWK++ provides a scalable version using subsampling for larger datasets.
Clustering performance improves by focusing on relevant features identified this way.
The theoretical analysis supports the consistent higher weighting for relevant features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This stability criterion could extend to other distance-based clustering techniques beyond Minkowski.
Subsampling in SFS-MWK++ suggests the method can handle very large datasets without full computation.
Feature selection here might reduce the impact of the curse of dimensionality in unsupervised learning tasks.

Load-bearing premise

The explicit assumptions made about the properties of noise features and the underlying cluster structure in the data.

What would settle it

A dataset with clearly labeled relevant and noise features where the weights for relevant features do not remain consistently higher across different Minkowski exponents.

read the original abstract

Unsupervised feature selection is critical for improving clustering performance in high-dimensional data, where irrelevant features can obscure meaningful structure. In this work, we introduce the Minkowski weighted $k$-means++, a novel initialisation strategy for the Minkowski Weighted $k$-means. Our initialisation selects centroids probabilistically using feature relevance estimates derived from the data itself. Building on this, we propose two new feature selection algorithms, FS-MWK++, which aggregates feature weights across a range of Minkowski exponents to identify stable and informative features, and SFS-MWK++, a scalable variant based on subsampling. We support our approach with a theoretical analysis, demonstrating that, under explicit assumptions on noise features and cluster structure, relevant features are assigned consistently higher weights than noise features across a range of Minkowski exponents. Our software can be found at https://github.com/xzhang4-ops1/FSMWK.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FS-MWK++ and SFS-MWK++ offer a practical stability-based approach to unsupervised feature selection by aggregating weights across Minkowski exponents, with a new initialization, though the theory rests on assumptions about noise and clusters.

read the letter

This paper puts forward FS-MWK++ and SFS-MWK++ as ways to do unsupervised feature selection by checking which features keep high weights when you vary the Minkowski exponent in weighted k-means. The new initialization for Minkowski weighted k-means++ is also part of it. The initialization selects centroids in a probabilistic way based on feature relevance estimates pulled from the data. Then the main algorithms aggregate the weights across a range of exponents to find stable ones. The theory claims that under assumptions about noise features and how clusters are structured, relevant features will reliably get higher weights than noise ones. They include a scalable version using subsampling and point to code on GitHub. This is a solid extension of weighted k-means ideas into feature selection. The stability across exponents is a practical idea to make the selection more robust. Having the code available helps others try it out quickly. The soft spot is the reliance on those specific assumptions for the theoretical result. If noise features don't behave as assumed or if clusters aren't well separated only in the relevant features, the weight difference might not show up consistently. That could weaken the case for why the aggregation works. The method also uses the clustering fit to generate the weights, so there's built-in dependence there rather than a fully independent criterion. More tests on how it holds up when assumptions are mildly violated would strengthen it. Readers working on high-dimensional data clustering in areas like genomics or document analysis would get the most out of this. It offers a concrete method with some backing and code, so someone looking for tools in that space can use it. The work shows clear thinking on combining these elements, so it deserves to go through peer review.

Referee Report

2 major / 2 minor

Summary. The paper introduces Minkowski weighted k-means++ (MWK++), a probabilistic centroid initialization for Minkowski weighted k-means that derives feature relevance estimates directly from the data. Building on this, it proposes FS-MWK++ to aggregate feature weights across a range of Minkowski exponents for stable unsupervised feature selection, along with a scalable subsampling variant SFS-MWK++. A theoretical analysis is presented claiming that, under explicit assumptions on noise features and cluster structure, relevant features receive strictly higher weights than noise features for Minkowski exponents in a specified range; open-source code is provided.

Significance. If the theoretical result holds under the stated assumptions, the work offers a new stability-based approach to unsupervised feature selection that leverages variation in the Minkowski exponent, which could improve clustering on high-dimensional data with mixed relevant and noise features. The release of reproducible code at the cited GitHub repository is a clear strength for verification and extension.

major comments (2)

[Theoretical analysis] Theoretical analysis (section following method description): The central claim that relevant features obtain consistently higher weights than noise features rests on explicit assumptions about noise-feature variance and cluster separation in relevant dimensions only. The manuscript states that a theoretical demonstration exists but provides neither the full derivation steps nor an error analysis or sensitivity check, so it is not possible to confirm that the weight inequality follows directly from the weighted Minkowski objective and ++ initialization without additional unstated steps.
[Experiments] Experimental section (tables/figures reporting weight comparisons): No ablation or stress test is reported in which the core assumptions (e.g., noise features having higher variance or clusters being separable only in relevant dimensions) are deliberately violated; without such checks the empirical results cannot confirm that the observed weight superiority is robust rather than an artifact of the synthetic data generation process that implicitly satisfies the assumptions.

minor comments (2)

[Method] The precise interval of Minkowski exponents used for aggregation in FS-MWK++ should be stated explicitly in the algorithm description rather than left as 'a range'.
[Figures] Figure captions for the weight-stability plots could clarify the exact aggregation rule (mean, median, or threshold) applied across exponents.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the theoretical presentation and empirical validation.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical analysis (section following method description): The central claim that relevant features obtain consistently higher weights than noise features rests on explicit assumptions about noise-feature variance and cluster separation in relevant dimensions only. The manuscript states that a theoretical demonstration exists but provides neither the full derivation steps nor an error analysis or sensitivity check, so it is not possible to confirm that the weight inequality follows directly from the weighted Minkowski objective and ++ initialization without additional unstated steps.

Authors: We agree that the full derivation was not included in the submitted version. The manuscript states the result under the listed assumptions on noise variance and cluster separation, but omits the intermediate algebraic steps from the weighted Minkowski objective and the probabilistic initialization. In the revision we will insert the complete proof, showing how the weight inequality is obtained directly from the objective and the ++ selection rule, together with a brief sensitivity discussion that quantifies how the inequality degrades when the separation or variance assumptions are mildly perturbed. revision: yes
Referee: [Experiments] Experimental section (tables/figures reporting weight comparisons): No ablation or stress test is reported in which the core assumptions (e.g., noise features having higher variance or clusters being separable only in relevant dimensions) are deliberately violated; without such checks the empirical results cannot confirm that the observed weight superiority is robust rather than an artifact of the synthetic data generation process that implicitly satisfies the assumptions.

Authors: We acknowledge that the current experiments use synthetic data generated under the stated assumptions. To address this, the revised manuscript will include an additional set of controlled experiments that deliberately violate the noise-variance and cluster-separability conditions (e.g., by equalizing variances across relevant and noise features or by introducing overlap in relevant dimensions). We will report the resulting feature-weight distributions and discuss the observed degradation, thereby clarifying the boundary of the theoretical regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives feature weights via the Minkowski weighted k-means++ objective and aggregates them for stability-based selection, then supports the approach with a theoretical demonstration that relevant features receive higher weights than noise features under explicit assumptions on noise features and cluster structure. This theoretical result is presented as conditional on those assumptions rather than reducing by construction to the fitted weights or to a self-citation chain; the assumptions are stated as external to the fitting process and provide independent grounding for why the aggregation step identifies informative features. No equations or steps are shown to equate the output selection criterion directly to the input clustering fit without additional content, and the method remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about noise features and cluster structure plus the standard mathematical properties of Minkowski distances and k-means optimization; no new free parameters or invented entities are introduced beyond the algorithmic choices.

axioms (1)

domain assumption Explicit assumptions on noise features and cluster structure
Invoked to prove that relevant features receive consistently higher weights than noise features across Minkowski exponents.

pith-pipeline@v0.9.0 · 5674 in / 1239 out tokens · 53060 ms · 2026-05-22T01:07:10.991094+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

w_lv = 1 / sum_u (D_lv / D_lu)^{1/(p-1)} ... under explicit assumptions on noise features and cluster structure, relevant features are assigned consistently higher weights
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 1 ... w(p)_lt < 1/m ... noise features uncorrelated with cluster structure

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Comprehensive survey on hierarchical clustering algorithms and the recent developments,

X. Ran, Y. Xi, Y. Lu, X. Wang, and Z. Lu, “Comprehensive survey on hierarchical clustering algorithms and the recent developments,” Arti- ficial Intelligence Review, vol. 56, no. 8, pp. 8219–8264, 2023

work page 2023
[2]

Between sound and spelling: com- bining phonetics and clustering algorithms to improve target word re- covery,

M. Zampieri and R. C. De Amorim, “Between sound and spelling: com- bining phonetics and clustering algorithms to improve target word re- covery,” inAdvances in Natural Language Processing: 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19,

work page 2014
[3]

438–449, Springer, 2014

Proceedings 9, pp. 438–449, Springer, 2014. 22

work page 2014
[4]

Data clustering: application and trends,

G. J. Oyewole and G. A. Thopil, “Data clustering: application and trends,” Artificial intelligence review , vol. 56, no. 7, pp. 6439–6475, 2023

work page 2023
[5]

Hiercc: a multi-level clustering scheme for population assignments based on core genome mlst,

Z. Zhou, J. Charlesworth, and M. Achtman, “Hiercc: a multi-level clustering scheme for population assignments based on core genome mlst,” Bioinformatics, vol. 37, no. 20, pp. 3645–3646, 2021

work page 2021
[6]

Construction of stock portfolios based on k-means clustering of continuous trend features,

D. Wu, X. Wang, and S. Wu, “Construction of stock portfolios based on k-means clustering of continuous trend features,” Knowledge-Based Systems, vol. 252, p. 109358, 2022

work page 2022
[7]

Feature selection techniques for machine learning: a survey of more than two decades of research,

D. Theng and K. K. Bhoyar, “Feature selection techniques for machine learning: a survey of more than two decades of research,” Knowledge and Information Systems , vol. 66, no. 3, pp. 1575–1637, 2024

work page 2024
[8]

A survey on feature selection methods for mixed data,

S. Solorio-Fern´ andez, J. A. Carrasco-Ochoa, and J. F. Mart´ ınez- Trinidad, “A survey on feature selection methods for mixed data,” Ar- tificial Intelligence Review, pp. 1–26, 2022

work page 2022
[9]

G. Gan, C. Ma, and J. Wu, Data clustering: theory, algorithms, and applications. SIAM, 2020

work page 2020
[10]

Dbscan revisited, revisited: why and how you should (still) use dbscan,

E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “Dbscan revisited, revisited: why and how you should (still) use dbscan,” ACM Transactions on Database Systems (TODS) , vol. 42, no. 3, pp. 1–21, 2017

work page 2017
[11]

Some methods for classification and analysis of multi- variate observations,

J. MacQueen, “Some methods for classification and analysis of multi- variate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics , vol. 5, pp. 281–298, University of California press, 1967

work page 1967
[12]

Data clustering: 50 years beyond k-means,

A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern recog- nition letters, vol. 31, no. 8, pp. 651–666, 2010

work page 2010
[13]

The k-means algorithm: A comprehensive survey and performance evaluation,

M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electronics, vol. 9, no. 8, p. 1295, 2020

work page 2020
[14]

Minkowski metric, feature weight- ing and anomalous cluster initializing in k-means clustering,

R. C. De Amorim and B. Mirkin, “Minkowski metric, feature weight- ing and anomalous cluster initializing in k-means clustering,” Pattern Recognition, vol. 45, no. 3, pp. 1061–1075, 2012. 23

work page 2012
[15]

Transforming complex problems into k-means solutions,

H. Liu, J. Chen, J. Dy, and Y. Fu, “Transforming complex problems into k-means solutions,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 7, pp. 9149–9168, 2023

work page 2023
[16]

An extensive empirical comparison of k-means initialization algorithms,

S. Harris and R. C. De Amorim, “An extensive empirical comparison of k-means initialization algorithms,” IEEE Access, vol. 10, pp. 58752– 58768, 2022

work page 2022
[17]

An overview of cluster- ing methods with guidelines for application in mental health research,

C. X. Gao, D. Dwyer, Y. Zhu, C. L. Smith, L. Du, K. M. Filia, J. Bayer, J. M. Menssink, T. Wang, C. Bergmeir, et al., “An overview of cluster- ing methods with guidelines for application in mental health research,” Psychiatry Research, vol. 327, p. 115265, 2023

work page 2023
[18]

k-means++: The advantages of careful seeding,

D. Arthur and S. Vassilvitskii, “ k-means++: The advantages of careful seeding,” in Proceedings of the Eighteenth Annual ACM-SIAM Sympo- sium on Discrete Algorithms (SODA) , pp. 1027–1035, SIAM, 2007

work page 2007
[19]

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Hem- ing, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Information Sciences, vol. 622, pp. 178–210, 2023

work page 2023
[20]

J. Han, J. Pei, and H. Tong, Data mining: concepts and techniques . Morgan kaufmann, 2022

work page 2022
[21]

On k-means iterations and gaussian clusters,

R. C. de Amorim and V. Makarenkov, “On k-means iterations and gaussian clusters,” Neurocomputing, vol. 553, p. 126547, 2023

work page 2023
[22]

Fea- ture weighting methods: A review,

I. Ni˜ no-Adan, D. Manjarres, I. Landa-Torres, and E. Portillo, “Fea- ture weighting methods: A review,” Expert Systems with Applications , vol. 184, p. 115424, 2021

work page 2021
[23]

A survey on soft subspace clustering,

Z. Deng, K.-S. Choi, Y. Jiang, J. Wang, and S. Wang, “A survey on soft subspace clustering,” Information sciences, vol. 348, pp. 84–106, 2016

work page 2016
[24]

Adaptive explicit kernel minkowski weighted k-means,

A. Aradnia, M. A. Haeri, and M. M. Ebadzadeh, “Adaptive explicit kernel minkowski weighted k-means,” Information sciences , vol. 584, pp. 503–518, 2022

work page 2022
[25]

Uncovering large-scale conformational change in molecular dynamics without prior knowledge,

R. L. Melvin, R. C. Godwin, J. Xiao, W. G. Thompson, K. S. Beren- haut, and F. R. Salsbury Jr, “Uncovering large-scale conformational change in molecular dynamics without prior knowledge,” Journal of chemical theory and computation, vol. 12, no. 12, pp. 6130–6146, 2016. 24

work page 2016
[26]

Unsupervised feature selection using feature similarity,

P. Mitra, C. A. Murthy, and S. K. Pal, “Unsupervised feature selection using feature similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 301–312, Mar. 2002

work page 2002
[27]

Exploring feature se- lection with limited labels: A comprehensive survey of semi-supervised and unsupervised approaches,

G. Li, Z. Yu, K. Yang, M. Lin, and C. P. Chen, “Exploring feature se- lection with limited labels: A comprehensive survey of semi-supervised and unsupervised approaches,” IEEE Transactions on Knowledge and Data Engineering, 2024

work page 2024
[28]

Review of feature selection approaches based on grouping of features,

C. Kuzudisli, B. Bakir-Gungor, N. Bulut, B. Qaqish, and M. Yousef, “Review of feature selection approaches based on grouping of features,” PeerJ, vol. 11, p. e15666, 2023

work page 2023
[29]

Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions,

Z. Yuan, H. Chen, P. Xie, P. Zhang, J. Liu, and T. Li, “Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions,” Applied Soft Computing , vol. 107, p. 107353, 2021

work page 2021
[30]

Unsupervised feature selection for multi- cluster data,

D. Cai, C. Zhang, and X. He, “Unsupervised feature selection for multi- cluster data,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 333–342, 2010

work page 2010
[31]

Feature se- lection methods on gene expression microarray data for cancer clas- sification: A systematic review,

E. Alhenawi, R. Al-Sayyed, A. Hudaib, and S. Mirjalili, “Feature se- lection methods on gene expression microarray data for cancer clas- sification: A systematic review,” Computers in biology and medicine , vol. 140, p. 105051, 2022

work page 2022
[32]

Unsupervised feature selection based on adaptive similarity learning and subspace clustering,

M. G. Parsa, H. Zare, and M. Ghatee, “Unsupervised feature selection based on adaptive similarity learning and subspace clustering,” Engi- neering Applications of Artificial Intelligence , vol. 95, p. 103855, 2020

work page 2020
[33]

A review of the current status and future direc- tions of research on subspace clustering feature selection,

X. Song and X. Wang, “A review of the current status and future direc- tions of research on subspace clustering feature selection,” in2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS) , pp. 330–337, IEEE, 2023

work page 2023
[34]

Deep unsuper- vised feature selection by discarding nuisance and correlated features,

U. Shaham, O. Lindenbaum, J. Svirsky, and Y. Kluger, “Deep unsuper- vised feature selection by discarding nuisance and correlated features,” Neural Networks, vol. 152, pp. 34–43, 2022

work page 2022
[35]

Comparing partitions,

L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classifi- cation, vol. 2, no. 1, pp. 193–218, 1985. 25

work page 1985
[36]

The uci machine learn- ing repository

M. Kelly, R. Longjohn, and K. Nottingham, “The uci machine learn- ing repository.” https://archive.ics.uci.edu, 2025. Accessed May 2025. 26

work page 2025

[1] [1]

Comprehensive survey on hierarchical clustering algorithms and the recent developments,

X. Ran, Y. Xi, Y. Lu, X. Wang, and Z. Lu, “Comprehensive survey on hierarchical clustering algorithms and the recent developments,” Arti- ficial Intelligence Review, vol. 56, no. 8, pp. 8219–8264, 2023

work page 2023

[2] [2]

Between sound and spelling: com- bining phonetics and clustering algorithms to improve target word re- covery,

M. Zampieri and R. C. De Amorim, “Between sound and spelling: com- bining phonetics and clustering algorithms to improve target word re- covery,” inAdvances in Natural Language Processing: 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19,

work page 2014

[3] [3]

438–449, Springer, 2014

Proceedings 9, pp. 438–449, Springer, 2014. 22

work page 2014

[4] [4]

Data clustering: application and trends,

G. J. Oyewole and G. A. Thopil, “Data clustering: application and trends,” Artificial intelligence review , vol. 56, no. 7, pp. 6439–6475, 2023

work page 2023

[5] [5]

Hiercc: a multi-level clustering scheme for population assignments based on core genome mlst,

Z. Zhou, J. Charlesworth, and M. Achtman, “Hiercc: a multi-level clustering scheme for population assignments based on core genome mlst,” Bioinformatics, vol. 37, no. 20, pp. 3645–3646, 2021

work page 2021

[6] [6]

Construction of stock portfolios based on k-means clustering of continuous trend features,

D. Wu, X. Wang, and S. Wu, “Construction of stock portfolios based on k-means clustering of continuous trend features,” Knowledge-Based Systems, vol. 252, p. 109358, 2022

work page 2022

[7] [7]

Feature selection techniques for machine learning: a survey of more than two decades of research,

D. Theng and K. K. Bhoyar, “Feature selection techniques for machine learning: a survey of more than two decades of research,” Knowledge and Information Systems , vol. 66, no. 3, pp. 1575–1637, 2024

work page 2024

[8] [8]

A survey on feature selection methods for mixed data,

S. Solorio-Fern´ andez, J. A. Carrasco-Ochoa, and J. F. Mart´ ınez- Trinidad, “A survey on feature selection methods for mixed data,” Ar- tificial Intelligence Review, pp. 1–26, 2022

work page 2022

[9] [9]

G. Gan, C. Ma, and J. Wu, Data clustering: theory, algorithms, and applications. SIAM, 2020

work page 2020

[10] [10]

Dbscan revisited, revisited: why and how you should (still) use dbscan,

E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “Dbscan revisited, revisited: why and how you should (still) use dbscan,” ACM Transactions on Database Systems (TODS) , vol. 42, no. 3, pp. 1–21, 2017

work page 2017

[11] [11]

Some methods for classification and analysis of multi- variate observations,

J. MacQueen, “Some methods for classification and analysis of multi- variate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics , vol. 5, pp. 281–298, University of California press, 1967

work page 1967

[12] [12]

Data clustering: 50 years beyond k-means,

A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern recog- nition letters, vol. 31, no. 8, pp. 651–666, 2010

work page 2010

[13] [13]

The k-means algorithm: A comprehensive survey and performance evaluation,

M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” Electronics, vol. 9, no. 8, p. 1295, 2020

work page 2020

[14] [14]

Minkowski metric, feature weight- ing and anomalous cluster initializing in k-means clustering,

R. C. De Amorim and B. Mirkin, “Minkowski metric, feature weight- ing and anomalous cluster initializing in k-means clustering,” Pattern Recognition, vol. 45, no. 3, pp. 1061–1075, 2012. 23

work page 2012

[15] [15]

Transforming complex problems into k-means solutions,

H. Liu, J. Chen, J. Dy, and Y. Fu, “Transforming complex problems into k-means solutions,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 7, pp. 9149–9168, 2023

work page 2023

[16] [16]

An extensive empirical comparison of k-means initialization algorithms,

S. Harris and R. C. De Amorim, “An extensive empirical comparison of k-means initialization algorithms,” IEEE Access, vol. 10, pp. 58752– 58768, 2022

work page 2022

[17] [17]

An overview of cluster- ing methods with guidelines for application in mental health research,

C. X. Gao, D. Dwyer, Y. Zhu, C. L. Smith, L. Du, K. M. Filia, J. Bayer, J. M. Menssink, T. Wang, C. Bergmeir, et al., “An overview of cluster- ing methods with guidelines for application in mental health research,” Psychiatry Research, vol. 327, p. 115265, 2023

work page 2023

[18] [18]

k-means++: The advantages of careful seeding,

D. Arthur and S. Vassilvitskii, “ k-means++: The advantages of careful seeding,” in Proceedings of the Eighteenth Annual ACM-SIAM Sympo- sium on Discrete Algorithms (SODA) , pp. 1027–1035, SIAM, 2007

work page 2007

[19] [19]

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Hem- ing, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Information Sciences, vol. 622, pp. 178–210, 2023

work page 2023

[20] [20]

J. Han, J. Pei, and H. Tong, Data mining: concepts and techniques . Morgan kaufmann, 2022

work page 2022

[21] [21]

On k-means iterations and gaussian clusters,

R. C. de Amorim and V. Makarenkov, “On k-means iterations and gaussian clusters,” Neurocomputing, vol. 553, p. 126547, 2023

work page 2023

[22] [22]

Fea- ture weighting methods: A review,

I. Ni˜ no-Adan, D. Manjarres, I. Landa-Torres, and E. Portillo, “Fea- ture weighting methods: A review,” Expert Systems with Applications , vol. 184, p. 115424, 2021

work page 2021

[23] [23]

A survey on soft subspace clustering,

Z. Deng, K.-S. Choi, Y. Jiang, J. Wang, and S. Wang, “A survey on soft subspace clustering,” Information sciences, vol. 348, pp. 84–106, 2016

work page 2016

[24] [24]

Adaptive explicit kernel minkowski weighted k-means,

A. Aradnia, M. A. Haeri, and M. M. Ebadzadeh, “Adaptive explicit kernel minkowski weighted k-means,” Information sciences , vol. 584, pp. 503–518, 2022

work page 2022

[25] [25]

Uncovering large-scale conformational change in molecular dynamics without prior knowledge,

R. L. Melvin, R. C. Godwin, J. Xiao, W. G. Thompson, K. S. Beren- haut, and F. R. Salsbury Jr, “Uncovering large-scale conformational change in molecular dynamics without prior knowledge,” Journal of chemical theory and computation, vol. 12, no. 12, pp. 6130–6146, 2016. 24

work page 2016

[26] [26]

Unsupervised feature selection using feature similarity,

P. Mitra, C. A. Murthy, and S. K. Pal, “Unsupervised feature selection using feature similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 301–312, Mar. 2002

work page 2002

[27] [27]

Exploring feature se- lection with limited labels: A comprehensive survey of semi-supervised and unsupervised approaches,

G. Li, Z. Yu, K. Yang, M. Lin, and C. P. Chen, “Exploring feature se- lection with limited labels: A comprehensive survey of semi-supervised and unsupervised approaches,” IEEE Transactions on Knowledge and Data Engineering, 2024

work page 2024

[28] [28]

Review of feature selection approaches based on grouping of features,

C. Kuzudisli, B. Bakir-Gungor, N. Bulut, B. Qaqish, and M. Yousef, “Review of feature selection approaches based on grouping of features,” PeerJ, vol. 11, p. e15666, 2023

work page 2023

[29] [29]

Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions,

Z. Yuan, H. Chen, P. Xie, P. Zhang, J. Liu, and T. Li, “Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions,” Applied Soft Computing , vol. 107, p. 107353, 2021

work page 2021

[30] [30]

Unsupervised feature selection for multi- cluster data,

D. Cai, C. Zhang, and X. He, “Unsupervised feature selection for multi- cluster data,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 333–342, 2010

work page 2010

[31] [31]

Feature se- lection methods on gene expression microarray data for cancer clas- sification: A systematic review,

E. Alhenawi, R. Al-Sayyed, A. Hudaib, and S. Mirjalili, “Feature se- lection methods on gene expression microarray data for cancer clas- sification: A systematic review,” Computers in biology and medicine , vol. 140, p. 105051, 2022

work page 2022

[32] [32]

Unsupervised feature selection based on adaptive similarity learning and subspace clustering,

M. G. Parsa, H. Zare, and M. Ghatee, “Unsupervised feature selection based on adaptive similarity learning and subspace clustering,” Engi- neering Applications of Artificial Intelligence , vol. 95, p. 103855, 2020

work page 2020

[33] [33]

A review of the current status and future direc- tions of research on subspace clustering feature selection,

X. Song and X. Wang, “A review of the current status and future direc- tions of research on subspace clustering feature selection,” in2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS) , pp. 330–337, IEEE, 2023

work page 2023

[34] [34]

Deep unsuper- vised feature selection by discarding nuisance and correlated features,

U. Shaham, O. Lindenbaum, J. Svirsky, and Y. Kluger, “Deep unsuper- vised feature selection by discarding nuisance and correlated features,” Neural Networks, vol. 152, pp. 34–43, 2022

work page 2022

[35] [35]

Comparing partitions,

L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classifi- cation, vol. 2, no. 1, pp. 193–218, 1985. 25

work page 1985

[36] [36]

The uci machine learn- ing repository

M. Kelly, R. Longjohn, and K. Nottingham, “The uci machine learn- ing repository.” https://archive.ics.uci.edu, 2025. Accessed May 2025. 26

work page 2025