pith. sign in

arxiv: 1907.05094 · v1 · pith:C4F7DF5Anew · submitted 2019-07-11 · 💻 cs.DS · cs.CG· cs.LG

Analysis of Ward's Method

Pith reviewed 2026-05-24 23:14 UTC · model grok-4.3

classification 💻 cs.DS cs.CGcs.LG
keywords Ward's methodhierarchical clusteringk-meansapproximation algorithmswell-separated clusterscomplete linkagegreedy merging
0
0 comments X

The pith

If the optimal k-clustering is well separated, Ward's method produces a 2-approximation for the k-means objective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Ward's method is a popular greedy approach to hierarchical clustering that repeatedly merges the two clusters whose combination increases the overall k-means cost by the smallest amount. The analysis shows that when the best possible k-clustering has clusters that are well separated from each other, the output of this method at the step with k clusters is guaranteed to have cost at most twice the optimal. Adding a balance condition on cluster sizes allows the method to recover the exact optimal clustering. The guarantees apply even in high dimensions where many other algorithms struggle. Without the separation assumption the approximation ratio can grow exponentially with dimension.

Core claim

Ward's method computes a 2-approximation with respect to the k-means objective function if the optimal k-clustering is well separated. If additionally the optimal clustering also satisfies a balance condition, then Ward's method fully recovers the optimum solution. These results hold in arbitrary dimension.

What carries the argument

Ward's method, the complete-linkage heuristic that always merges the pair of clusters whose merge increases the k-means cost the least.

Load-bearing premise

The optimal k-clustering must have well-separated clusters for the approximation guarantee to hold.

What would settle it

A data set in high dimension where the optimal clusters are well separated but the hierarchy from Ward's method has a k-means cost more than twice the optimum at the k level.

Figures

Figures reproduced from arXiv: 1907.05094 by Anna Gro{\ss}wendt, Heiko R\"oglin, Melanie Schmidt.

Figure 1
Figure 1. Figure 1: Point set Pd from the family of worst-case examples, drawn for d = 2 and d = 3. The heavy points are drawn larger. To further simplify the exposition, the below definitions use points of infinite weight and assume that the optimal cluster centers coincide with these infinite weight points. For any finite realization of the example, that is not the case. To ensure that Ward actually behaves like described i… view at source ↗
Figure 2
Figure 2. Figure 2: The pricipal phases of development of a 2-composed cluster. The phases We will use the reordering lemma (Lemma 26) to sort the merges into phases and then analyze the cost of the solution after each phase. In the following, we call a cluster that contains points from more than one optimum cluster composed, more precisely, we call it an ℓ-composed cluster if it contains points from ℓ different optimum clust… view at source ↗
Figure 3
Figure 3. Figure 3: Different types of good merge situations. A part of a [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
read the original abstract

We study Ward's method for the hierarchical $k$-means problem. This popular greedy heuristic is based on the \emph{complete linkage} paradigm: Starting with all data points as singleton clusters, it successively merges two clusters to form a clustering with one cluster less. The pair of clusters is chosen to (locally) minimize the $k$-means cost of the clustering in the next step. Complete linkage algorithms are very popular for hierarchical clustering problems, yet their theoretical properties have been studied relatively little. For the Euclidean $k$-center problem, Ackermann et al. show that the $k$-clustering in the hierarchy computed by complete linkage has a worst-case approximation ratio of $\Theta(\log k)$. If the data lies in $\mathbb{R}^d$ for constant dimension $d$, the guarantee improves to $\mathcal{O}(1)$, but the $\mathcal{O}$-notation hides a linear dependence on $d$. Complete linkage for $k$-median or $k$-means has not been analyzed so far. In this paper, we show that Ward's method computes a $2$-approximation with respect to the $k$-means objective function if the optimal $k$-clustering is well separated. If additionally the optimal clustering also satisfies a balance condition, then Ward's method fully recovers the optimum solution. These results hold in arbitrary dimension. We accompany our positive results with a lower bound of $\Omega((3/2)^d)$ for data sets in $\mathbb{R}^d$ that holds if no separation is guaranteed, and with lower bounds when the guaranteed separation is not sufficiently strong. Finally, we show that Ward produces an $\mathcal{O}(1)$-approximative clustering for one-dimensional data sets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript analyzes Ward's method (a complete-linkage greedy algorithm) for hierarchical k-means clustering. It proves that if the optimal k-clustering is well-separated, then Ward's method yields a 2-approximation to the k-means objective in arbitrary dimension; with an added balance condition on the optimal clustering, it recovers the optimum exactly. These positive results are accompanied by a dimension-dependent lower bound of Ω((3/2)^d) when no separation is assumed, further lower bounds for insufficient separation, and an unconditional O(1)-approximation for one-dimensional data.

Significance. If the claims hold, the work supplies the first non-trivial approximation and recovery guarantees for Ward's method on k-means, a widely used practical heuristic whose theoretical properties had remained largely unanalyzed. The explicit conditioning on separation (and balance), the matching-style lower bounds, and the unconditional 1D result together delineate the regime in which the method succeeds. This advances the theory of linkage-based clustering algorithms and supplies concrete, falsifiable conditions under which a popular greedy procedure is provably effective.

minor comments (3)
  1. [Abstract] Abstract: the terms 'well separated' and 'balance condition' are used without even a one-sentence gloss; a brief parenthetical definition or forward reference to their formal statements (presumably in §2 or §3) would improve readability for readers who consult only the abstract.
  2. [Lower bounds section] The lower-bound constructions (especially the Ω((3/2)^d) family) are stated to hold 'if no separation is guaranteed'; it would be helpful to include a short remark clarifying whether these constructions also satisfy or violate the balance condition used in the positive recovery theorem.
  3. [Preliminaries] Notation for the k-means cost and the separation parameter should be introduced once, early, and used consistently; occasional re-definition of symbols across sections can be avoided by a single 'Notation' paragraph.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the manuscript, the clear summary of our results, and the recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives its 2-approximation and exact recovery results for Ward's method explicitly from the assumption that the optimal k-clustering is well-separated (and balanced for recovery). These are standard conditional approximation guarantees in clustering analysis, obtained via direct analysis of the merge steps under the separation property rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation. The lower bounds are likewise derived from explicit constructions without reduction to the positive claims. The derivation chain is self-contained against the stated assumptions and does not invoke uniqueness theorems or ansatzes from prior author work in a circular manner.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The positive results rest on the domain assumption of well-separated (and balanced) optimal clusters plus standard mathematical properties of Euclidean k-means cost; no free parameters, invented entities, or ad-hoc axioms are introduced beyond these.

axioms (2)
  • standard math k-means cost is defined via squared Euclidean distances to cluster centers
    Invoked throughout as the objective; standard definition in the field.
  • domain assumption separation and balance are properties of the optimal clustering that can be assumed for the analysis
    Explicitly stated as the conditions under which the 2-approx and recovery hold.

pith-pipeline@v0.9.0 · 5853 in / 1347 out tokens · 21859 ms · 2026-05-24T23:14:58.975551+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

  1. [1]

    Ackermann, Johannes Blömer, Daniel Kuntze, and Christian Sohler

    Marcel R. Ackermann, Johannes Blömer, Daniel Kuntze, and Christian Sohler. Analysis of agglomerative clustering. Algorithmica, 69(1):184–215, 2014

  2. [2]

    Better guarantees for k-means and euclidean k-median by primal-dual algorithms

    Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and J ustin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms. In Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science (FOCS) , pages 61–72, 2017

  3. [3]

    NP-hardness of Euclidean sum-of-squares clustering

    Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preya s Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245–248, 2009

  4. [4]

    k-means has polynomial smoothed complexity

    David Arthur, Bodo Manthey, and Heiko Röglin. k-means has polynomial smoothed complexity. In Proceedings of the 50th Annual IEEE Symposium on Foundation s of Computer Science (FOCS), pages 405–414, 2009

  5. [5]

    How slow is the k-means method? In Proceedings of the 22nd International Symposium on Computational Geometry (S oCG), pages 144–153, 2006

    David Arthur and Sergei Vassilvitskii. How slow is the k-means method? In Proceedings of the 22nd International Symposium on Computational Geometry (S oCG), pages 144–153, 2006

  6. [6]

    k-means++: the a dvantages of careful seeding

    David Arthur and Sergei Vassilvitskii. k-means++: the a dvantages of careful seeding. In Proceedings of the 18th ACM-SIAM Symposium on Discrete Algo rithms (SODA) , pages 1027– 1035, 2007

  7. [7]

    Worst-case and s moothed analysis of the ICP algorithm, with an application to the k-means method

    David Arthur and Sergei Vassilvitskii. Worst-case and s moothed analysis of the ICP algorithm, with an application to the k-means method. SIAM Journal on Computing , 39(2):766–782, 2009

  8. [8]

    Center-based clustering under perturbation stability

    Pranjal Awasthi, A vrim Blum, and Or Sheffet. Center-based clustering under perturbation stability. Information Processing Letters, 112(1-2):49–54, 2012

  9. [9]

    The hardness of approximation of euclidean k-means

    Pranjal Awasthi, Moses Charikar, Ravishankar Krishnas wamy, and Ali Kemal Sinop. The hardness of approximation of euclidean k-means. In Proceedings of the 31st International Symposium on Computational Geometry (SoCG) , pages 754–767, 2015. 24

  10. [10]

    Improved spectral-norm bounds for clustering

    Pranjal Awasthi and Or Sheffet. Improved spectral-norm bounds for clustering. In Proceedings of the 15th APPROX and 16th RANDOM , pages 37–49, 2012

  11. [11]

    A discriminative framework for clustering via similarity functions

    Maria-Florina Balcan, A vrim Blum, and Santosh Vempala. A discriminative framework for clustering via similarity functions. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC) , pages 671–680, 2008

  12. [12]

    Clustering unde r perturbation resilience

    Maria-Florina Balcan and Yingyu Liang. Clustering unde r perturbation resilience. SIAM Journal on Computing , 45(1):102–155, 2016

  13. [13]

    R obust hierarchical clustering

    Maria-Florina Balcan, Yingyu Liang, and Pramod Gupta. R obust hierarchical clustering. Journal of Machine Learning Research , 15(1):3831–3871, 2014. Appendix C, page 4048

  14. [14]

    Computational Feasibility of Clustering under Clusterability Assumptions

    Shai Ben-David. Computational feasibility of clusteri ng under clusterability assumptions. CoRR, abs/1501.00437, 2015. URL: http://arxiv.org/abs/1501.00437, arXiv:1501.00437

  15. [15]

    Clustering in the pre sence of background noise

    Shai Ben-David and Nika Haghtalab. Clustering in the pre sence of background noise. In Proceedings of the 31th International Conference on Machin e Learn- ing, ICML 2014, Beijing, China, 21-26 June 2014 , pages 280–288, 2014. URL: http://jmlr.org/proceedings/papers/v32/ben-david14.html

  16. [16]

    Approximate hierarchical clustering via sparsest cut and spreading metrics

    Moses Charikar and Vaggos Chatziafratis. Approximate hierarchical clustering via sparsest cut and spreading metrics. In Proceedings of the 28h Annual ACM-SIAM Symposium on Discret e Algorithms (SODA) , pages 841–854, 2017

  17. [17]

    Hier- archical clustering: Objective functions and algorithms

    Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann -Trenn, and Claire Mathieu. Hier- archical clustering: Objective functions and algorithms. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 378–397, 2018

  18. [18]

    Amit Daniely, Nati Linial, and Michael E. Saks. Cluster ing is difficult only when it does not matter. CoRR, abs/1205.4891, 2012. URL: http://arxiv.org/abs/1205.4891, arXiv:1205.4891

  19. [19]

    How fast is k-means? In Proceedings of the 16th Annual Conference on Learning Theory, page 735, 2003

    Sanjoy Dasgupta. How fast is k-means? In Proceedings of the 16th Annual Conference on Learning Theory, page 735, 2003

  20. [20]

    A cost function for similarity-based hierarchical clustering

    Sanjoy Dasgupta. A cost function for similarity-based hierarchical clustering. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing ( STOC), pages 118–127, 2016

  21. [21]

    Sanjoy Dasgupta and Philip M. Long. Performance guaran tees for hierarchical clus- tering. Journal of Computer and System Sciences , 70(4):555–569, 2005. URL: http://dx.doi.org/10.1016/j.jcss.2004.10.006, doi:10.1016/j.jcss.2004.10.006

  22. [22]

    Gonzalés

    Teofilo F. Gonzalés. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306, 1985

  23. [23]

    Improved analysis of c omplete-linkage clustering

    Anna Großwendt and Heiko Röglin. Improved analysis of c omplete-linkage clustering. Algo- rithmica, 78(4):1131–1150, 2017

  24. [24]

    Clustering with spect ral norm and the k-means al- gorithm

    Amit Kumar and Ravindran Kannan. Clustering with spect ral norm and the k-means al- gorithm. In Proceedings of the 51th Annual IEEE Symposium on Foundation s of Computer Science (FOCS), pages 299–308, 2010. 25

  25. [25]

    Fin ding meaningful cluster structure amidst background noise

    Shrinu Kushagra, Samira Samadi, and Shai Ben-David. Fin ding meaningful cluster structure amidst background noise. In 27th International Conference on Algorithmic Learning The ory (ALT), pages 339–354, 2016

  26. [26]

    Mirrokni, and Ilya P

    Silvio Lattanzi, Stefano Leonardi, Vahab S. Mirrokni, and Ilya P. Razenshteyn. Robust hierar- chical k-center clustering. In Proceedings of the 2015 Conference on Innovations in Theore tical Computer Science (ITCS) , pages 211–218, 2015

  27. [27]

    Improv ed and simplified inapproximability for k-means

    Euiwoong Lee, Melanie Schmidt, and John Wright. Improv ed and simplified inapproximability for k-means. Information Processing Letters, 120:40–43, 2017

  28. [28]

    Williamson

    Guolong Lin, Chandrashekhar Nagarajan, Rajmohan Raja raman, and David P. Williamson. A general approach for incremental approximation and hierar chical clustering. SIAM Journal on Computing, 39(8):3633–3669, 2010

  29. [29]

    Stuart P. Lloyd. Least squares quantization in PCM. Bell Laboratories Technical Memorandum,

  30. [30]

    later published as [30]

  31. [31]

    Stuart P. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982

  32. [32]

    Var adarajan

    Meena Mahajan, Prajakta Nimbhorkar, and Kasturi R. Var adarajan. The Planar k-means Problem is NP-Hard. In Proceedings of the 3rd Workshop on Algorithms and Computati on (W ALCOM), pages 274–285, 2009

  33. [33]

    Metric Perturbation Resilience

    Konstantin Makarychev and Yury Makarychev. Metric per turbation resilience. CoRR, abs/1607.06442, 2016. URL: http://arxiv.org/abs/1607.06442, arXiv:1607.06442

  34. [34]

    Improved smoothed analys is of the k-means method

    Bodo Manthey and Heiko Röglin. Improved smoothed analys is of the k-means method. In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algo rithms (SODA) , pages 461– 470, 2009

  35. [35]

    Mettu and C

    Ramgopal R. Mettu and C. Greg Plaxton. The online median problem. SIAM Journal on Computing, 32(3):816–832, 2003

  36. [36]

    Schulman, a nd Chaitanya Swamy

    Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, a nd Chaitanya Swamy. The effectiveness of Lloyd-type methods for the k-means problem. Journal of the ACM , 59(6):28:1–28:22, 2012

  37. [37]

    Greg Plaxton

    C. Greg Plaxton. Approximation algorithms for hierarc hical location problems. Journal of Computer and System Sciences , 72(3):425–443, 2006

  38. [38]

    Estimating the number of clusters in a dataset via the gap statistic

    Robert Tibshirani, Guenther Walther, and Trevor Hasti e. Estimating the number of clusters in a dataset via the gap statistic. Journal of the Royal Statistical Society: Series B (Statist ical Methodology), 63:411–423, 2001

  39. [39]

    k-means requires exponentially many iterations even in the p lane

    Andrea Vattani. k-means requires exponentially many iterations even in the p lane. Discrete & Computational Geometry, 45(4):596–616, 2011

  40. [40]

    Joe H. Ward Jr. Hierarchical grouping to optimize an obj ective function. Journal of the American Statistical Association , 58:236–244, 1963. 26