An Improved Greedy Approximation for (Metric) k-Means
Pith reviewed 2026-06-29 08:59 UTC · model grok-4.3
The pith
A polynomial-time algorithm achieves a 3 + 2√2 + ε approximation for metric k-Means.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that the LMP 2-approximation greedy algorithm for facility location admits an adaptation and analysis when connection costs are squared distances. This new LMP approximation is combined with the framework from the related k-Median paper to produce a polynomial-time 3 + 2√2 + ε < 5.83 approximation algorithm for k-Means in arbitrary metrics.
What carries the argument
Adapted LMP 2-approximation greedy algorithm for facility location with squared metric connection costs, which preserves the Lagrangian multiplier property.
Load-bearing premise
The LMP 2-approximation greedy algorithm for facility location admits an adaptation and analysis when connection costs are squared distances rather than linear metric distances.
What would settle it
A concrete set of points in a metric space on which the adapted algorithm returns a solution whose cost exceeds (3 + 2√2) times the optimal k-Means cost.
Figures
read the original abstract
Clustering is a basic task in data analysis and machine learning, and the optimization of clustering objectives are well-studied optimization problems; amongst these, the $k$-Means objective is arguably the most well known. Given a collection of points in a metric space, the goal is to partition them into $k$ clusters, each with an associated center, so as to minimize the sum of squared distances of points to their cluster centers. In this paper, we present a polynomial-time $3+2\sqrt{2}+\epsilon<5.83$-approximation algorithm for $k$-Means in general metrics. This substantially improves on the current-best $(9+\epsilon)$-approximation in [Ahmadian, Norouzi-Fard, Svensson, Ward - FOCS'17, SICOMP'20], and even slightly improves on the $5.92$-approximation in [Cohen-Addad, Esfandiari, Mirrokni, Narayanan - STOC'22] for the Euclidean special case. A natural approach for $k$-Means is to leverage Lagrangian Multiplier Preserving (LMP) approximations for the facility location problem. The previous best results for $k$-Means build upon an adaptation of an LMP $3$-approximation for facility location with metric connection costs in [Jain, Vazirani - J.ACM'01] based on a primal-dual method, rather than on the improved LMP greedy $2$-approximation for the same problem in [Jain, Mahdian, Markakis, Saberi, Vazirani - J.ACM'03]. The barrier to using the improved LMP algorithm was that no adaptation of this algorithm and its analysis to the case of squared metric connection costs was known (since squared distances violate triangle inequality). Our main contribution is overcoming this barrier by providing such an adaptation. This new LMP approximation algorithm is then combined with the framework recently introduced in [Cohen-Addad, Grandoni, Lee, Schwiegelshohn, Svensson - STOC'25] for the related (metric) $k$-Median problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims a polynomial-time (3 + 2√2 + ε)-approximation for metric k-Means by adapting the LMP 2-approximation greedy algorithm of Jain et al. (JACM 2003) to squared connection costs (which violate the triangle inequality) while preserving the LMP property, then plugging the result into the STOC'25 k-Median framework; this yields the stated ratio and improves on the prior 9+ε bound for general metrics.
Significance. If the adaptation holds, the result advances the approximation ratio for a core clustering problem from 9+ε down to <5.83 in general metrics (and slightly below the prior 5.92 Euclidean bound). The explicit handling of squared distances in the greedy LMP analysis is a concrete technical step that removes a known barrier; the manuscript supplies the adaptation as its central contribution rather than relying on self-referential or fitted parameters.
minor comments (2)
- [§1] §1 (Introduction): the statement that the new LMP algorithm 'is then combined with' the STOC'25 framework would benefit from an explicit sentence confirming that no additional loss factor is incurred beyond the ε term already stated.
- The notation for the squared-distance connection costs should be introduced once with a clear definition (e.g., c(i,j) = d(i,j)^2) before its first use in the algorithm description.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and their recommendation to accept. We are pleased that the significance of adapting the greedy LMP 2-approximation to squared distances (while preserving the LMP property) and combining it with the STOC'25 framework is recognized as advancing the state of the art for metric k-Means.
Circularity Check
No significant circularity identified
full rationale
The paper's derivation chain centers on a new adaptation of the LMP 2-approximation greedy algorithm (from Jain et al. 2003) to squared distances, explicitly presented as the main contribution overcoming a previously unknown barrier. This adapted LMP is then plugged into the independently cited STOC'25 framework for k-Median. No equation, parameter fit, or claim reduces by construction to its own inputs; the 3+2√2+ε bound is obtained from the supplied analysis rather than tautology or renaming. The overlapping-author citation to the framework is not load-bearing for the core adaptation step and does not create a self-referential loop.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existence of the LMP 2-approximation greedy algorithm for metric facility location from Jain et al. (J.ACM 2003)
- domain assumption The framework from Cohen-Addad et al. (STOC 2025) applies to the adapted LMP solution for k-means
Reference graph
Works this paper leans on
-
[1]
SIAM Journal on Computing , Year =
[AGK+04] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. “Local Search Heuristics for k-Median and Facility Location Problems”. In: SIAM J. Comput. 33.3 (2004), pp. 544–562. doi: 10.1137/S0097539702416402. url: https://doi.org/10.1137/S0097539702416402. [ANSW20] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensso...
-
[2]
Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019, 42:1– 42:14. doi: 10 . 4230 / LIPICS . ICALP . 2019
2019
-
[3]
A (2+ ε)-Approximation Algorithm for Metric k-Median
url: https : / / doi . org / 10 . 4230 / LIPIcs.ICALP.2019.42. [CGL+25] Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn, and Ola Svensson. “A (2+ ε)-Approximation Algorithm for Metric k-Median”. In: Proceedings of the 57th ACM Symposium on Theory of Computing, STOC 2025 . To appear
2019
-
[4]
Breaching the 2 LMP Approximation Barrier for Facility Location with Applications to k-Median
[CGLS23] Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, and Chris Schwiegelshohn. “Breaching the 2 LMP Approximation Barrier for Facility Location with Applications to k-Median”. In: Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algo- rithms, SODA 2023, Florence, Italy, January 22-25, 2023 . SIAM, 2023, pp. 940–986. doi: 10.1137/1.97816119...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.