Cluster-Adaptive Feature Extraction and its Theoretical Foundation with Minkowski Weighted k-Means
Pith reviewed 2026-05-22 11:10 UTC · model grok-4.3
The pith
Minkowski weighted k-means feature weights rescale data to reverse within-cluster dispersion ordering and suppress noisy features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By expressing the mwk-means objective as a power-mean aggregation of within-cluster dispersions with order p, the feature weights are derived to depend only on relative dispersion ratios via a power-law, providing explicit guarantees on suppressing high-dispersion features, and the algorithm is shown to converge. This foundation enables CAFE, where rescaling the data with these weights reverses the within-cluster dispersion ordering, suppressing noisy features and amplifying informative ones for improved unsupervised feature extraction.
What carries the argument
The power-mean aggregation representation of the mwk-means objective function, which determines the feature weights' power-law dependence on dispersion ratios and enables the dispersion-order reversal in CAFE.
If this is right
- The choice of the Minkowski exponent p controls the transition between selective and uniform feature weighting.
- Feature weights in mwk-means are independent of absolute dispersion values and depend only on relative ratios.
- CAFE consistently improves the performance of traditional unsupervised feature extraction methods when within-cluster noise is present.
- The mwk-means algorithm converges to a local minimum under the derived bounds on the objective.
Where Pith is reading between the lines
- This rescaling technique could be applied to other clustering algorithms that produce feature weights to enhance feature extraction in noisy high-dimensional data.
- Choosing p based on the expected level of feature noise might optimize the suppression effect in practice.
- CAFE might integrate with supervised methods by using the weights for dimensionality reduction prior to classification.
- Similar power-mean reformulations could be explored for other distance-based clustering objectives to derive weighting schemes.
Load-bearing premise
The derivation of the power-law relationship for feature weights assumes that the mwk-means objective can be precisely expressed as a power-mean aggregation of the within-cluster dispersions for the given Minkowski exponent p.
What would settle it
A counterexample where applying the CAFE rescaling to data with high-dispersion noisy features does not reverse the dispersion ordering or fails to improve extraction results would falsify the central claim.
Figures
read the original abstract
The Minkowski weighted $k$-means ($mwk$-means) algorithm extends classical $k$-means by incorporating feature weights and a Minkowski distance. We first show that the $mwk$-means objective can be expressed as a power-mean aggregation of within-cluster dispersions, with the order determined by the Minkowski exponent $p$. This formulation reveals how $p$ controls the transition between selective and uniform use of features. Using this representation, we derive bounds for the objective function and characterise the structure of the feature weights, showing that they depend only on relative dispersion and follow a power-law relationship with dispersion ratios. This leads to explicit guarantees on the suppression of high-dispersion features, and we establish convergence of the algorithm. Building on these theoretical results, we introduce Cluster-Adaptive Feature Extraction (CAFE), a method that uses the $mwk$-means feature weights to rescale the data prior to unsupervised feature extraction. We prove that this rescaling reverses the within-cluster dispersion ordering, suppressing noisy features and amplifying informative ones. Numerous experiments conducted under controlled within-cluster noise show that CAFE consistently improves the results of traditional feature extraction methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reformulates the Minkowski weighted k-means (mwk-means) objective as a power-mean aggregation of within-cluster dispersions (order set by Minkowski exponent p), derives that feature weights depend only on relative dispersions and obey a power-law relationship with dispersion ratios, establishes suppression guarantees for high-dispersion features plus algorithm convergence, and introduces Cluster-Adaptive Feature Extraction (CAFE) that rescales data by these weights to reverse within-cluster dispersion ordering. Experiments under controlled within-cluster noise claim consistent improvements over standard feature extraction methods.
Significance. If the power-mean reformulation and reversal proof hold without hidden dependencies, the work supplies a principled theoretical basis for feature weighting in clustering and a practical pre-processing step that could improve unsupervised feature extraction in noisy data. The explicit bounds, weight characterization, and convergence result are positive contributions; the controlled experiments provide initial evidence but leave generalizability open.
major comments (2)
- [Theoretical foundation / power-mean representation] Theoretical foundation section (power-mean reformulation of mwk-means objective): the claim that this aggregation exactly yields feature weights depending only on relative dispersion and following a power-law with dispersion ratios is load-bearing for both the suppression guarantees and the CAFE reversal proof. The joint optimization of weights and centroids may introduce dependencies not captured by a static power-mean of dispersions, and the manuscript does not provide an explicit verification that the reformulation preserves the original stationary-point conditions for arbitrary p.
- [CAFE definition and reversal proof] CAFE reversal proof: the argument that rescaling by mwk-means weights reverses within-cluster dispersion ordering and suppresses noisy features assumes post-rescaling stability of the weights and cluster structure. No analysis is given of whether the rescaled data requires re-optimization of weights or whether the reversal holds after the first iteration, which directly affects whether the claimed suppression is guaranteed in the subsequent feature-extraction stage.
minor comments (2)
- Notation for the power-mean order p and the resulting weight formula should be stated explicitly with an equation number immediately after the reformulation, to avoid ambiguity when referring to the power-law relationship later.
- The experimental section would benefit from reporting the exact Minkowski exponent p values used and whether they were fixed or tuned, as p controls the selective-to-uniform transition highlighted in the theory.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment point by point below, offering clarifications based on the manuscript's derivations and indicating revisions to enhance clarity and completeness where appropriate.
read point-by-point responses
-
Referee: Theoretical foundation section (power-mean reformulation of mwk-means objective): the claim that this aggregation exactly yields feature weights depending only on relative dispersion and following a power-law with dispersion ratios is load-bearing for both the suppression guarantees and the CAFE reversal proof. The joint optimization of weights and centroids may introduce dependencies not captured by a static power-mean of dispersions, and the manuscript does not provide an explicit verification that the reformulation preserves the original stationary-point conditions for arbitrary p.
Authors: We appreciate the referee highlighting the need for explicit verification on this load-bearing claim. The power-mean reformulation follows directly from rewriting the mwk-means objective as an aggregation of within-cluster dispersions raised to the Minkowski exponent p; the closed-form weight update (for fixed centroids) then depends only on relative dispersions via the stated power-law relationship. Because the reformulation is algebraically equivalent to the original objective, the alternating optimization procedure yields stationary points of both. To address the concern directly, we will revise the theoretical foundation section to include a short lemma that verifies preservation of the stationary conditions for arbitrary p > 0, together with a brief remark on the absence of hidden dependencies introduced by joint optimization. This constitutes a partial revision. revision: partial
-
Referee: CAFE reversal proof: the argument that rescaling by mwk-means weights reverses within-cluster dispersion ordering and suppresses noisy features assumes post-rescaling stability of the weights and cluster structure. No analysis is given of whether the rescaled data requires re-optimization of weights or whether the reversal holds after the first iteration, which directly affects whether the claimed suppression is guaranteed in the subsequent feature-extraction stage.
Authors: The referee correctly notes that the reversal proof is stated for the initial rescaling step. The proof shows that multiplying each feature by the mwk-means weight (computed on the original data) inverts the ordering of within-cluster dispersions, thereby suppressing high-dispersion features before any downstream extraction occurs. CAFE is explicitly positioned as a one-pass preprocessing transformation; re-optimization of weights on the rescaled data is neither required nor assumed for the suppression guarantee. We agree that a short discussion of this design choice would improve the manuscript. We will add a clarifying paragraph in the CAFE section stating that the reversal applies to the rescaled representation used by subsequent methods and that empirical results remain consistent without re-clustering. This is a partial revision. revision: partial
Circularity Check
No circularity: derivations of power-mean reformulation, weight structure, and CAFE reversal are self-contained analysis of the mwk-means objective
full rationale
The paper derives the power-mean representation directly from the mwk-means objective function, then uses that representation to characterize feature weights as depending only on relative dispersions via a power-law relation. This is a standard mathematical unpacking of an existing objective rather than a fit or self-referential definition. The subsequent proof that CAFE rescaling reverses dispersion ordering follows from those derived properties without reducing to a tautology or to a self-citation chain. No load-bearing step equates a claimed result to its own inputs by construction; the analysis remains independent of the experimental outcomes and does not rename known patterns or smuggle ansatzes via prior self-citations.
Axiom & Free-Parameter Ledger
free parameters (1)
- Minkowski exponent p
axioms (1)
- domain assumption The mwk-means objective equals a power-mean aggregation of within-cluster dispersions whose order is fixed by p
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show that the mwk-means objective can be expressed as a power-mean aggregation of within-cluster dispersions, with the order determined by the Minkowski exponent p... wlv/wlu = (Dlu/Dlv)^{1/(p-1)}
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This leads to explicit guarantees on the suppression of high-dispersion features
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Clustering of single-cell multi-omics data with a multimodal deep learning method,
X. Lin, T. Tian, Z. Wei, and H. Hakonarson, “Clustering of single-cell multi-omics data with a multimodal deep learning method,”Nature com- munications, vol. 13, no. 1, p. 7705, 2022
work page 2022
-
[2]
A. L´ opez-Fern´ andez, F. A. Gomez-Vela, D. S. Rodriguez-Baena, F. M. Delgado-Chaves, and J. Gonzalez-Dominguez, “Biclustering in bioinformat- ics using big data and high performance computing applications: challenges and perspectives, a review: A. lopez-fernandez et al.,”The Journal of Su- percomputing, vol. 81, no. 10, p. 1123, 2025
work page 2025
-
[3]
Androidgyny: Reviewing clustering techniques for android malware family classification,
T. S. R. Pimenta, F. Ceschin, and A. Gregio, “Androidgyny: Reviewing clustering techniques for android malware family classification,”Digital Threats: Research and Practice, vol. 5, no. 1, pp. 1–35, 2024
work page 2024
-
[4]
A. E. Ezugwu, A. M. Ikotun, O. O. Oyelade, L. Abualigah, J. O. Agushaka, C. I. Eke, and A. A. Akinyelu, “A comprehensive survey of clustering al- gorithms: State-of-the-art machine learning applications, taxonomy, chal- lenges, and future research prospects,”Engineering applications of artificial intelligence, vol. 110, p. 104743, 2022
work page 2022
-
[5]
Identifying meaningful clusters in malware data,
R. C. de Amorim and C. D. L. Ruiz, “Identifying meaningful clusters in malware data,”Expert Systems with Applications, vol. 177, p. 114971, 2021
work page 2021
-
[6]
Deep image clustering: A survey,
H. Huang, C. Wang, X. Wei, and Y. Zhou, “Deep image clustering: A survey,”Neurocomputing, vol. 599, p. 128101, 2024
work page 2024
-
[7]
Some methods for classification and analysis of multivariate observations,
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” inProceedings of the Fifth Berkeley Symposium on Mathe- matical Statistics and Probability, Volume 1: Statistics, vol. 5, pp. 281–298, University of California press, 1967
work page 1967
-
[8]
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analy- sis, and advances in the era of big data,”Information Sciences, vol. 622, pp. 178–210, 2023. 15
work page 2023
-
[9]
Transforming complex problems into k-means solutions,
H. Liu, J. Chen, J. Dy, and Y. Fu, “Transforming complex problems into k-means solutions,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 7, pp. 9149–9168, 2023
work page 2023
-
[10]
The k-means algorithm: A com- prehensive survey and performance evaluation,
M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A com- prehensive survey and performance evaluation,”Electronics, vol. 9, no. 8, p. 1295, 2020
work page 2020
-
[11]
An extensive empirical comparison of k-means initialization algorithms,
S. Harris and R. C. De Amorim, “An extensive empirical comparison of k-means initialization algorithms,”Ieee Access, vol. 10, pp. 58752–58768, 2022
work page 2022
-
[12]
Silhouette coefficient- based weighting k-means algorithm,
H. Lai, T. Huang, B. Lu, S. Zhang, and R. Xiaog, “Silhouette coefficient- based weighting k-means algorithm,”Neural Computing and Applications, vol. 37, no. 5, pp. 3061–3075, 2025
work page 2025
-
[13]
A survey on feature selection ap- proaches for clustering,
E. Hancer, B. Xue, and M. Zhang, “A survey on feature selection ap- proaches for clustering,”Artificial intelligence review, vol. 53, no. 6, pp. 4519–4545, 2020
work page 2020
-
[14]
Fuzzy clustering based on feature weights for multi- variate time series,
H. Li and M. Wei, “Fuzzy clustering based on feature weights for multi- variate time series,”Knowledge-Based Systems, vol. 197, p. 105907, 2020
work page 2020
-
[15]
Feature-weight and cluster- weight learning in fuzzy c-means method for semi-supervised clustering,
A. G. Oskouei, N. Samadi, and J. Tanha, “Feature-weight and cluster- weight learning in fuzzy c-means method for semi-supervised clustering,” Applied Soft Computing, vol. 161, p. 111712, 2024
work page 2024
-
[16]
Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering,
R. C. De Amorim and B. Mirkin, “Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering,”Pattern Recognition, vol. 45, no. 3, pp. 1061–1075, 2012
work page 2012
-
[17]
Feature weighting methods: A review,
I. Ni˜ no-Adan, D. Manjarres, I. Landa-Torres, and E. Portillo, “Feature weighting methods: A review,”Expert Systems with Applications, vol. 184, p. 115424, 2021
work page 2021
-
[18]
Adaptive explicit kernel minkowski weighted k-means,
A. Aradnia, M. A. Haeri, and M. M. Ebadzadeh, “Adaptive explicit kernel minkowski weighted k-means,”Information sciences, vol. 584, pp. 503–518, 2022
work page 2022
-
[19]
A survey on soft subspace clustering,
Z. Deng, K.-S. Choi, Y. Jiang, J. Wang, and S. Wang, “A survey on soft subspace clustering,”Information sciences, vol. 348, pp. 84–106, 2016
work page 2016
-
[20]
Uncovering large-scale conformational change in molecular dynamics without prior knowledge,
R. L. Melvin, R. C. Godwin, J. Xiao, W. G. Thompson, K. S. Berenhaut, and F. R. Salsbury Jr, “Uncovering large-scale conformational change in molecular dynamics without prior knowledge,”Journal of chemical theory and computation, vol. 12, no. 12, pp. 6130–6146, 2016
work page 2016
-
[21]
Mutsα’s multi-domain allosteric response to three dna dam- age types revealed by machine learning,
R. L. Melvin, W. G. Thompson, R. C. Godwin, W. H. Gmeiner, and F. R. Salsbury Jr, “Mutsα’s multi-domain allosteric response to three dna dam- age types revealed by machine learning,”Frontiers in physics, vol. 5, p. 10, 2017. 16
work page 2017
-
[22]
S.-S. Jamali-Dinan, H. Soltanian-Zadeh, S. M. Bowyer, H. Almohri, H. De- hghani, K. Elisevich, and M.-R. Nazem-Zadeh, “A combination of particle swarm optimization and minkowski weighted k-means clustering: applica- tion in lateralization of temporal lobe epilepsy,”Brain topography, vol. 33, no. 4, pp. 519–532, 2020
work page 2020
-
[23]
S. Gowthaman and A. Das, “A novel method for optic disc localization us- ing fast circlet transform and chan-vese segmentation,”Scientific Reports, vol. 15, no. 1, p. 31399, 2025
work page 2025
-
[24]
Recovering the number of clusters in data sets with noise features using feature rescaling factors,
R. C. De Amorim and C. Hennig, “Recovering the number of clusters in data sets with noise features using feature rescaling factors,”Information sciences, vol. 324, pp. 126–145, 2015
work page 2015
-
[25]
Feature selection: A data perspective,
J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, “Feature selection: A data perspective,”ACM computing surveys (CSUR), vol. 50, no. 6, pp. 1–45, 2017
work page 2017
-
[26]
Unsupervised feature selection via discrete spectral clustering and feature weights,
R. Shang, J. Kong, L. Wang, W. Zhang, C. Wang, Y. Li, and L. Jiao, “Unsupervised feature selection via discrete spectral clustering and feature weights,”Neurocomputing, vol. 517, pp. 106–117, 2023. 17
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.