An effective variant of the Hartigan k-means algorithm
Pith reviewed 2026-05-09 22:06 UTC · model grok-4.3
The pith
A minor variation of Hartigan's k-means algorithm adds another 2% to 5% improvement over the standard version.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a very minor variation of Hartigan's 1975 algorithm for the k-means problem yields clusterings whose objective value is typically 2% to 5% smaller than those found by the unmodified Hartigan method, with the relative improvement increasing as dimension or the number of clusters k grows.
What carries the argument
The minor variation of Hartigan's reassignment step, which alters how points are moved between clusters during the local search.
If this is right
- The variant produces better local minima than either Lloyd's or standard Hartigan's method in the great majority of trials.
- The advantage over standard Hartigan's method widens as data dimension increases.
- The advantage over standard Hartigan's method widens as the number of clusters k increases.
- The change requires almost no extra computation beyond the original Hartigan procedure.
Where Pith is reading between the lines
- Small adjustments to the order or condition of point reassignments can help the algorithm escape poorer local minima more effectively.
- The same style of minor change might improve other greedy local-search procedures used in clustering or partitioning problems.
- In applied settings the variant could be swapped in for Hartigan's method with no change to surrounding code or data pipelines.
- The pattern of larger gains at high dimension suggests the method may be especially useful for modern high-dimensional data sets.
Load-bearing premise
The measured gains come from the algorithmic change and not from implementation details, random seeds, or the particular datasets chosen for testing.
What would settle it
Re-running the standard Hartigan method and the variant on identical large-scale datasets with the same code base, fixed seeds, and identical stopping rules, then checking whether the variant still produces consistently lower objective values especially at high dimension or large k.
Figures
read the original abstract
The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a minor variant of Hartigan's 1975 k-means algorithm and claims that it yields an additional 2%–5% improvement in the clustering objective over standard Hartigan (itself 5%–10% better than Lloyd), with the gains tending to increase as dimension or k grows.
Significance. If the empirical gains are shown to be robustly caused by the algorithmic change rather than implementation artifacts, the result would be a simple, low-cost improvement to a standard clustering method with practical value in high-dimensional or large-k regimes. The paper correctly builds on the known superiority of Hartigan over Lloyd but currently supplies no machine-checked proofs, reproducible code, or parameter-free derivations.
major comments (2)
- Abstract: the central claim of consistent 2%–5% objective improvement (and its growth with dimension/k) rests on empirical comparison, yet the abstract supplies none of the required controls—identical initializations, convergence criteria, multiple random seeds with statistical testing, or explicit confirmation that the baseline Hartigan implementation differs from the variant only in the proposed change—leaving open the possibility that differences arise from coding artifacts or data specifics rather than the algorithmic modification.
- Experimental section (inferred from abstract and reader's take): without reported details on the number of datasets, preprocessing, exact matching of floating-point behavior, or averaging over seeds, the attribution of gains to the variant itself cannot be verified and undermines the generalization claim.
minor comments (2)
- Abstract: the citation to Telgarsky-Vattani should include a full reference (year, title, or arXiv number) for completeness.
- Throughout: ensure pseudocode or explicit description of the minor variation is provided so readers can implement the exact change.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and have revised the manuscript to incorporate additional experimental details and controls.
read point-by-point responses
-
Referee: Abstract: the central claim of consistent 2%–5% objective improvement (and its growth with dimension/k) rests on empirical comparison, yet the abstract supplies none of the required controls—identical initializations, convergence criteria, multiple random seeds with statistical testing, or explicit confirmation that the baseline Hartigan implementation differs from the variant only in the proposed change—leaving open the possibility that differences arise from coding artifacts or data specifics rather than the algorithmic modification.
Authors: We agree that the abstract should more explicitly reference the experimental controls. In the revised version we have updated the abstract to state that comparisons used identical initializations and convergence criteria across methods, were averaged over multiple random seeds with statistical testing, and that the sole difference between the baseline Hartigan implementation and the variant is the proposed algorithmic change. revision: yes
-
Referee: Experimental section (inferred from abstract and reader's take): without reported details on the number of datasets, preprocessing, exact matching of floating-point behavior, or averaging over seeds, the attribution of gains to the variant itself cannot be verified and undermines the generalization claim.
Authors: We accept that the original experimental section lacked sufficient detail for full verification. The revised manuscript now specifies the number of datasets, the preprocessing steps, how floating-point behavior was matched between the two implementations, and that all results are averaged over multiple seeds with standard deviations and statistical tests reported. These additions directly support attribution of the observed gains to the algorithmic variant. revision: yes
Circularity Check
No circularity; empirical claim rests on algorithmic variant and experiments
full rationale
The paper proposes a minor variation on Hartigan's k-means algorithm and reports 2-5% empirical gains (growing with dimension and k) over standard Hartigan. This is an algorithmic modification whose performance is asserted via direct comparison on data, not via any derivation, equation, or first-principles result that reduces to its own inputs by construction. No self-definitional steps, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear. The central claim is externally falsifiable by re-implementing the variant and measuring objective values, placing it outside the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
D. Arthur and S. Vassilvitskii, k-means++: The advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035
-
[3]
G. Ball and D. Hall, ISODATA, a novel method of data anlysis and pattern classification. Technical report NTIS AD 699616, 1965, Stanford Research Institute, Stanford, CA
work page 1965
-
[4]
R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics 7 (1936), p. 179–188
work page 1936
-
[5]
R. A. Fisher, The analysis of covariance method for the relation between a part and the whole, Biometrics 3 (1947), 65–68
work page 1947
-
[6]
E. Forgy, Cluster analysis of multivariate data: efficiency versus interpretability of classifica- tions, Biometrics. 21 (1965), 768–769
work page 1965
-
[7]
J. A. Hartigan. Clustering algorithms. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, 1975
work page 1975
-
[8]
H. Hartigan and M. Wong, Algorithm AS 136: A k-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C. 28 (1979), p. 100-108
work page 1979
-
[9]
Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters 31 (2010), p
A. Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters 31 (2010), p. 651–666
work page 2010
- [10]
-
[11]
R. R. Lederman,https://github.com/Lederman-Group/Catastrophic_Failure_KMeans, ac- cessed April 17, 2026
work page 2026
-
[12]
Lloyd, Least squares quantization in PCM
S. Lloyd, Least squares quantization in PCM. IEEE Trans. Inform. Theory 28 (1982), p. 129–137. Originally as an unpublished Bell laboratories Technical Note (1957). 8
work page 1982
-
[13]
MacQueen, Some methods for classification and analysis of multivariate observations
J. MacQueen, Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematics. Statistics and Probability, University of Califor- nia Press, 1967, pp. 281–297
work page 1967
-
[14]
T. Mitchell, Twenty Newsgroups. UCI Machine Learning Repository 1997. DOI: https://doi.org/10.24432/C5C323
-
[15]
J. Peng and Y. Wei, Approximating k-means-type clustering via semidefinite programming. SIAM journal on optimization, 18(1):186–205, 2007
work page 2007
-
[16]
F. S. Samaria and A. C. Harter, Parameterisation of a stochastic model for human face identification. In Proceedings of 1994 IEEE workshop on applications of computer vision, IEEE (1994), pp 138-142
work page 1994
-
[17]
Normalized cuts and image segmentation
J. Shi and J. Malik, Normalized cuts and image segmentation. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 22(8):888–905, August 2000. ISSN 1939-3539. doi: 10.1109/34.868688
- [18]
-
[19]
Steinhaus, Sur la division des corps mat´ eriels en parties
H. Steinhaus, Sur la division des corps mat´ eriels en parties. Bull. Acad. Polon. Sci. 4 (1956): p. 801–804
work page 1956
- [20]
-
[21]
M. Telgarsky and A. Vattani, Hartigan’s method: k-means clustering without voronoi. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings (2010), pp. 820-827. Department of Mathematics, University of W ashington, Seattle Email address:fclement@uw.edu Department of Math...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.