pith. sign in

arxiv: 2604.21798 · v1 · submitted 2026-04-23 · 💻 cs.LG

An effective variant of the Hartigan k-means algorithm

Pith reviewed 2026-05-09 22:06 UTC · model grok-4.3

classification 💻 cs.LG
keywords k-means clusteringHartigan algorithmLloyd algorithmlocal searchclustering qualityalgorithmic variant
0
0 comments X

The pith

A minor variation of Hartigan's k-means algorithm adds another 2% to 5% improvement over the standard version.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The k-means problem is the basic task of dividing data points into k groups that minimize total squared distance to group centers. Lloyd's algorithm from 1957 is the usual method, but Hartigan's 1975 version already improves results by 5% to 10% in most cases. The paper shows that one small change to Hartigan's procedure produces solutions that are typically another 2% to 5% better, and this extra gain grows when the data has more dimensions or when k is larger. A reader would care because k-means is used everywhere from image compression to customer grouping, so even modest quality lifts matter in practice.

Core claim

The authors establish that a very minor variation of Hartigan's 1975 algorithm for the k-means problem yields clusterings whose objective value is typically 2% to 5% smaller than those found by the unmodified Hartigan method, with the relative improvement increasing as dimension or the number of clusters k grows.

What carries the argument

The minor variation of Hartigan's reassignment step, which alters how points are moved between clusters during the local search.

If this is right

  • The variant produces better local minima than either Lloyd's or standard Hartigan's method in the great majority of trials.
  • The advantage over standard Hartigan's method widens as data dimension increases.
  • The advantage over standard Hartigan's method widens as the number of clusters k increases.
  • The change requires almost no extra computation beyond the original Hartigan procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Small adjustments to the order or condition of point reassignments can help the algorithm escape poorer local minima more effectively.
  • The same style of minor change might improve other greedy local-search procedures used in clustering or partitioning problems.
  • In applied settings the variant could be swapped in for Hartigan's method with no change to surrounding code or data pipelines.
  • The pattern of larger gains at high dimension suggests the method may be especially useful for modern high-dimensional data sets.

Load-bearing premise

The measured gains come from the algorithmic change and not from implementation details, random seeds, or the particular datasets chosen for testing.

What would settle it

Re-running the standard Hartigan method and the variant on identical large-scale datasets with the same code base, fixed seeds, and identical stopping rules, then checking whether the variant still produces consistently lower objective values especially at high dimension or large k.

Figures

Figures reproduced from arXiv: 2604.21798 by Fran\c{c}ois Cl\'ement, Stefan Steinerberger.

Figure 1
Figure 1. Figure 1: Thousands points in R 2 are clustered into k = 3 sets. 1.2. Lloyd’s algorithm. The most commonly used way to find an approximate solution is the algorithm proposed by Lloyd (and, independently, Forgy [6]), with a variant by MacQueen [13]. For any given cluster partition S1, . . . , Sk with associ￾ated centroids µ1, . . . , µk, the algorithm goes through every point and assigns xj to the cluster Sℓ whose ce… view at source ↗
Figure 2
Figure 2. Figure 2: A point x assigned to cluster 1 but having the same distance to µ2 that it has from µ1. A more formal explanation is as follows (see also §2 for pseudocode): if Si is a cluster, then we abbreviate its contribution to the k−means functional via ϕ(Si) = X x∈Si [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
read the original abstract

The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a minor variant of Hartigan's 1975 k-means algorithm and claims that it yields an additional 2%–5% improvement in the clustering objective over standard Hartigan (itself 5%–10% better than Lloyd), with the gains tending to increase as dimension or k grows.

Significance. If the empirical gains are shown to be robustly caused by the algorithmic change rather than implementation artifacts, the result would be a simple, low-cost improvement to a standard clustering method with practical value in high-dimensional or large-k regimes. The paper correctly builds on the known superiority of Hartigan over Lloyd but currently supplies no machine-checked proofs, reproducible code, or parameter-free derivations.

major comments (2)
  1. Abstract: the central claim of consistent 2%–5% objective improvement (and its growth with dimension/k) rests on empirical comparison, yet the abstract supplies none of the required controls—identical initializations, convergence criteria, multiple random seeds with statistical testing, or explicit confirmation that the baseline Hartigan implementation differs from the variant only in the proposed change—leaving open the possibility that differences arise from coding artifacts or data specifics rather than the algorithmic modification.
  2. Experimental section (inferred from abstract and reader's take): without reported details on the number of datasets, preprocessing, exact matching of floating-point behavior, or averaging over seeds, the attribution of gains to the variant itself cannot be verified and undermines the generalization claim.
minor comments (2)
  1. Abstract: the citation to Telgarsky-Vattani should include a full reference (year, title, or arXiv number) for completeness.
  2. Throughout: ensure pseudocode or explicit description of the minor variation is provided so readers can implement the exact change.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and have revised the manuscript to incorporate additional experimental details and controls.

read point-by-point responses
  1. Referee: Abstract: the central claim of consistent 2%–5% objective improvement (and its growth with dimension/k) rests on empirical comparison, yet the abstract supplies none of the required controls—identical initializations, convergence criteria, multiple random seeds with statistical testing, or explicit confirmation that the baseline Hartigan implementation differs from the variant only in the proposed change—leaving open the possibility that differences arise from coding artifacts or data specifics rather than the algorithmic modification.

    Authors: We agree that the abstract should more explicitly reference the experimental controls. In the revised version we have updated the abstract to state that comparisons used identical initializations and convergence criteria across methods, were averaged over multiple random seeds with statistical testing, and that the sole difference between the baseline Hartigan implementation and the variant is the proposed algorithmic change. revision: yes

  2. Referee: Experimental section (inferred from abstract and reader's take): without reported details on the number of datasets, preprocessing, exact matching of floating-point behavior, or averaging over seeds, the attribution of gains to the variant itself cannot be verified and undermines the generalization claim.

    Authors: We accept that the original experimental section lacked sufficient detail for full verification. The revised manuscript now specifies the number of datasets, the preprocessing steps, how floating-point behavior was matched between the two implementations, and that all results are averaged over multiple seeds with standard deviations and statistical tests reported. These additions directly support attribution of the observed gains to the algorithmic variant. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claim rests on algorithmic variant and experiments

full rationale

The paper proposes a minor variation on Hartigan's k-means algorithm and reports 2-5% empirical gains (growing with dimension and k) over standard Hartigan. This is an algorithmic modification whose performance is asserted via direct comparison on data, not via any derivation, equation, or first-principles result that reduces to its own inputs by construction. No self-definitional steps, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear. The central claim is externally falsifiable by re-implementing the variant and measuring objective values, placing it outside the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no new mathematical axioms, free parameters, or invented entities; the contribution is an algorithmic modification whose value is claimed through empirical comparison.

pith-pipeline@v0.9.0 · 5377 in / 927 out tokens · 36089 ms · 2026-05-09T22:06:19.013438+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    Aloise, A

    D. Aloise, A. Deshpande, P. Hansen and P. Popat, NP-hardness of Euclidean sum-of-squares clustering. Machine learning, 75 (2009), p. 245-248

  2. [2]

    Arthur and S

    D. Arthur and S. Vassilvitskii, k-means++: The advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035

  3. [3]

    Ball and D

    G. Ball and D. Hall, ISODATA, a novel method of data anlysis and pattern classification. Technical report NTIS AD 699616, 1965, Stanford Research Institute, Stanford, CA

  4. [4]

    R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics 7 (1936), p. 179–188

  5. [5]

    R. A. Fisher, The analysis of covariance method for the relation between a part and the whole, Biometrics 3 (1947), 65–68

  6. [6]

    Forgy, Cluster analysis of multivariate data: efficiency versus interpretability of classifica- tions, Biometrics

    E. Forgy, Cluster analysis of multivariate data: efficiency versus interpretability of classifica- tions, Biometrics. 21 (1965), 768–769

  7. [7]

    J. A. Hartigan. Clustering algorithms. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley, 1975

  8. [8]

    Hartigan and M

    H. Hartigan and M. Wong, Algorithm AS 136: A k-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C. 28 (1979), p. 100-108

  9. [9]

    Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters 31 (2010), p

    A. Jain, Data clustering: 50 years beyond K-means, Pattern Recognition Letters 31 (2010), p. 651–666

  10. [10]

    R. R. Lederman, D. Silva-S´ anchez, Z. Chen, G. Mordant, A. Balanov and T. Bendory, The Catastrophic Failure of thek-means Algorithm in High Dimensions and How Hartigan’s Algorithm Avoids it. ArXiV https://arxiv.org/abs/2602.09936 (2026)

  11. [11]

    R. R. Lederman,https://github.com/Lederman-Group/Catastrophic_Failure_KMeans, ac- cessed April 17, 2026

  12. [12]

    Lloyd, Least squares quantization in PCM

    S. Lloyd, Least squares quantization in PCM. IEEE Trans. Inform. Theory 28 (1982), p. 129–137. Originally as an unpublished Bell laboratories Technical Note (1957). 8

  13. [13]

    MacQueen, Some methods for classification and analysis of multivariate observations

    J. MacQueen, Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematics. Statistics and Probability, University of Califor- nia Press, 1967, pp. 281–297

  14. [14]

    Twenty Newsgroups,

    T. Mitchell, Twenty Newsgroups. UCI Machine Learning Repository 1997. DOI: https://doi.org/10.24432/C5C323

  15. [15]

    Peng and Y

    J. Peng and Y. Wei, Approximating k-means-type clustering via semidefinite programming. SIAM journal on optimization, 18(1):186–205, 2007

  16. [16]

    F. S. Samaria and A. C. Harter, Parameterisation of a stochastic model for human face identification. In Proceedings of 1994 IEEE workshop on applications of computer vision, IEEE (1994), pp 138-142

  17. [17]

    Normalized cuts and image segmentation

    J. Shi and J. Malik, Normalized cuts and image segmentation. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 22(8):888–905, August 2000. ISSN 1939-3539. doi: 10.1109/34.868688

  18. [18]

    Slonim, E

    N. Slonim, E. Aharoni and K. Crammer, Hartigan’s K-means vs. Lloyd’s K means–is it time for a change?. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013

  19. [19]

    Steinhaus, Sur la division des corps mat´ eriels en parties

    H. Steinhaus, Sur la division des corps mat´ eriels en parties. Bull. Acad. Polon. Sci. 4 (1956): p. 801–804

  20. [20]

    Street, W

    W. Street, W. Wolberg and O. Mangasarian, Nuclear feature extraction for breast tumor diagnosis. In Biomedical image processing and biomedical visualization, SPIE, 1905 (1993), pp. 861-870

  21. [21]

    Telgarsky and A

    M. Telgarsky and A. Vattani, Hartigan’s method: k-means clustering without voronoi. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings (2010), pp. 820-827. Department of Mathematics, University of W ashington, Seattle Email address:fclement@uw.edu Department of Math...