Robust Clustering Using Tau-Scales
Pith reviewed 2026-05-25 19:46 UTC · model grok-4.3
The pith
K Tau Centers uses tau-scales to cluster data robustly while adapting to efficiency needs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
K Tau Centers is a robust clustering procedure based on the tau-scale. It overcomes the fixed trade-off in trimmed K-means by achieving robustness and efficiency simultaneously in an adaptive manner. The centers found by the method are consistent estimators of the true centers, defined as the minimizers of the objective function at the population level.
What carries the argument
The tau-scale, a robust measure of dispersion, used to build the objective function that is minimized to locate the cluster centers.
Load-bearing premise
The population distribution admits well-defined minimizers of the tau-scale objective function that correspond to the true cluster centers.
What would settle it
A sequence of samples drawn from a fixed mixture distribution with known population minimizers, where the K Tau Centers estimates do not approach those minimizers as the sample size tends to infinity.
Figures
read the original abstract
K means is a popular non-parametric clustering procedure introduced by Steinhaus (1956) and further developed by MacQueen (1967). It is known, however, that K means does not perform well in the presence of outliers. Cuesta-Albertos et al (1997) introduced a robust alternative, trimmed K means, which can be tuned to be robust or efficient, but cannot achieve these two properties simultaneously in an adaptive way. To overcome this limitation we propose a new robust clustering procedure called K Tau Centers, which is based on the concept of Tau scale introduced by Yohai and Zamar (1988). We show that K Tau Centers performs well in extensive simulation studies and real data examples. We also show that the centers found by the proposed method are consistent estimators of the "true" centers defined as the minimizers of the the objective function at the population level.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes K Tau Centers, a robust clustering procedure extending the tau-scale of Yohai and Zamar (1988) to overcome limitations of K-means and trimmed K-means in the presence of outliers. It claims superior performance in simulations and real data examples, and proves consistency of the estimated centers to the population-level minimizers of the tau-scale objective function.
Significance. If the consistency result holds under verifiable conditions and the simulation evidence is reproducible, the method would offer an adaptive robust clustering approach that simultaneously achieves robustness and efficiency, extending prior tau-scale work in a practically useful direction.
major comments (1)
- [Abstract and consistency theorem] Abstract and consistency section: the claim that sample centers are consistent estimators of the 'true' centers (minimizers of the population tau-scale objective) requires existence and uniqueness of those minimizers for the distributions of interest, yet no conditions ensuring this (e.g., strict convexity or identifiability of the tau-scale functional under contamination or multimodality) are stated or verified. This assumption is load-bearing for the central consistency result.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract and consistency theorem] Abstract and consistency section: the claim that sample centers are consistent estimators of the 'true' centers (minimizers of the population tau-scale objective) requires existence and uniqueness of those minimizers for the distributions of interest, yet no conditions ensuring this (e.g., strict convexity or identifiability of the tau-scale functional under contamination or multimodality) are stated or verified. This assumption is load-bearing for the central consistency result.
Authors: We agree that the consistency theorem relies on the existence and uniqueness of the population-level minimizers of the tau-scale objective. The manuscript states consistency to these minimizers but does not explicitly list the required conditions on the underlying distribution. In the revision we will add a dedicated subsection stating verifiable assumptions (e.g., identifiability of the K tau-centers, a mild separation condition between clusters to ensure uniqueness under multimodality, and reference to the convexity/continuity properties of the tau-scale established by Yohai and Zamar (1988)) that guarantee the population objective possesses a unique set of K minimizers. These assumptions will be placed before the consistency theorem and will be checked to be compatible with the contamination models used in the simulations. revision: yes
Circularity Check
No significant circularity; consistency is a standard extension
full rationale
The paper defines true centers as population-level minimizers of the tau-scale objective (from the authors' 1988 prior work) and claims sample centers are consistent estimators of those. This is a conventional statistical consistency argument that does not reduce by construction to fitted inputs or self-citations. The tau-scale foundation is cited but the new clustering objective and consistency result are presented as independent extensions. No quoted equations or steps exhibit self-definition, fitted predictions renamed as results, or load-bearing self-citation chains that make the central claim tautological. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- tuning constant for tau-scale
axioms (1)
- domain assumption Existence of population-level minimizers of the tau-scale objective function that represent true centers
Reference graph
Works this paper leans on
-
[1]
Agostinelli, C., Leung, A., Yohai, V. J., and Zamar, R. H. (2015). Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. Test , 24(3):441--461
work page 2015
-
[2]
Al Hasan, M., Chaoji, V., Salem, S., and Zaki, M. J. (2009). Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters , 30(11):994--1002
work page 2009
-
[3]
Cheng, H.-D., Jiang, X. H., Sun, Y., and Wang, J. (2001). Color image segmentation: advances and prospects. Pattern recognition , 34(12):2259--2281
work page 2001
-
[4]
Cuesta-Albertos, J., Gordaliza, A., and Matr \'a n, C. (1997). Trimmed k -means: An attempt to robustify quantizers. The Annals of Statistics , 25(2):553--576
work page 1997
-
[5]
Fritz, H., Garc a-Escudero, L. A., and Mayo-Iscar, A. (2012). tclust: An r package for a trimming approach to cluster analysis. Journal of Statistical Software , 47(12):1--26
work page 2012
-
[6]
A., Gordaliza, A., Matr \'a n, C., and Mayo-Iscar, A
Garc \' a-Escudero, L. A., Gordaliza, A., Matr \'a n, C., and Mayo-Iscar, A. (2008). A general trimming approach to robust cluster analysis. The Annals of Statistics , pages 1324--1345
work page 2008
-
[7]
Hartigan, J. A. and Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 28(1):100--108
work page 1979
-
[8]
Leung, A., Danilov, M., Yohai, V., and Zamar, R. (2015). Gse: Robust estimation in the presence of cellwise and casewise contamination and missing data. Test , page R package
work page 2015
-
[9]
Lloyd, S. (1982). Least squares quantization in pcm. IEEE transactions on information theory , 28(2):129--137
work page 1982
-
[10]
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability , number 14, pages 281--297. Oakland, CA, USA
work page 1967
-
[11]
Maronna, R. A. and Yohai, V. J. (2017). Robust and efficient estimation of multivariate scatter and location. Computational Statistics & Data Analysis , 109:64--75
work page 2017
-
[12]
Munkres, J. (2000). Topology . Prentice Hall, Upper Saddle River, NJ
work page 2000
-
[13]
View of Curiosity's First Scoop Also Shows Bright Object
NASA (2016). View of Curiosity's First Scoop Also Shows Bright Object
work page 2016
-
[14]
Pollard, D. (1981). Strong consistency of k -means clustering. The Annals of Statistics , 9(1):135--140
work page 1981
-
[15]
Pollard, D. (1984). Convergence of stochastic processes . Springer, New York
work page 1984
-
[16]
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association , 66(336):846--850
work page 1971
-
[17]
Steinhaus, H. (1956). Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci , 1(804):801
work page 1956
-
[18]
Yohai, V. J. and Zamar, R. H. (1988). High breakdown-point estimates of regression by means of the minimization of an efficient scale. Journal of the American statistical association , 83(402):406--413
work page 1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.