pith. sign in

arxiv: 2604.06046 · v1 · submitted 2026-04-07 · 💻 cs.DS

k-Clustering via Iterative Randomized Rounding

Pith reviewed 2026-05-10 18:16 UTC · model grok-4.3

classification 💻 cs.DS
keywords k-clusteringapproximation algorithmsLP roundingLagrangian multiplier preservingk-meansk-medianmetric clustering
0
0 comments X

The pith

A single iterative randomized rounding procedure on the standard LP relaxation produces a ((3^p + 1)/2)-LMP approximation for k-clustering with p-th power distance costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that one rounding scheme applied to any fractional solution of the natural LP for k-clustering yields a random integral solution that opens exactly k facilities in expectation and whose expected cost is at most (3^p + 1)/2 times the optimal cost. This LMP guarantee holds for any p ≥ 1 and works for both general metric and Euclidean instances. The authors then convert the LMP algorithm into a true approximation algorithm by losing only an extra (1 + ε) factor, obtaining ((3^p + 1)/2 + ε) approximation overall. The same framework recovers the known 2 + ε bound for k-median and improves the previous best metric k-means bound from 5.83 to 5 + ε; a specialized analysis further yields 4 + ε for Euclidean k-means.

Core claim

There exists a single iterative randomized rounding algorithm that, given any fractional solution to the standard LP relaxation of k-clustering, outputs a random integral solution opening k facilities in expectation whose expected cost is at most (3^p + 1)/2 times the LP value (hence at most (3^p + 1)/2 OPT) for p-th power distance costs; this LMP-preserving rounding can be turned into a true ((3^p + 1)/2 + ε)-approximation by standard techniques.

What carries the argument

Iterative randomized rounding of the fractional facility-opening variables that preserves the Lagrangian-multiplier property while respecting the cardinality constraint.

If this is right

  • Recovers the best-known (2 + ε) approximation for metric k-median.
  • Improves the approximation factor for general-metric k-means from 5.83 to 5 + ε.
  • Achieves a (4 + ε) approximation for Euclidean k-means via a specialized analysis.
  • Unifies the treatment of all p-power costs under one rounding procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation between the LMP rounding step and the final (1 + ε) conversion may allow similar roundings to be plugged into other Lagrangian-based frameworks for clustering variants.
  • Because the rounding works from any fractional solution, it could be combined with faster LP solvers or sampling-based methods that produce approximate fractional solutions.
  • The improvement for Euclidean k-means suggests that geometry-specific analyses can tighten the general (3^p + 1)/2 bound without changing the rounding procedure itself.

Load-bearing premise

The input distances obey the triangle inequality (or Euclidean properties) so that the cost of the rounded solution remains bounded relative to the fractional LP cost.

What would settle it

An explicit instance together with a feasible fractional LP solution whose rounded integral solutions have expected p-power cost strictly larger than (3^p + 1)/2 times the LP value.

read the original abstract

In this work we propose a single rounding algorithm for the fractional solutions of the standard LP relaxation for $k$-clustering. As a starting point, we obtain an iterative rounding $(\frac{3^p + 1}{2})$-Lagrangian Multiplier-Perserving (LMP) approximation for the $k$-clustering problem with the cost function being the $p$-th power of the distance. Such an algorithm outputs a random solution that opens $k$ facilities \emph{in expectation}, whose cost in expectation is at most $\frac{3^p + 1}{2}$ times the optimum cost. Thus, we recover the $2$-LMP approximation for $k$-median by Jain et al.~[JACM'03], which played a central role in deriving the current best $2$ approximation for $k$-median. Unlike the result of Jain et al., our algorithm is based on LP rounding, and it can be easily adapted to the $L_p^p$-cost setting. For the Euclidean $k$-means problem, the LMP factor we obtain is $\frac{11}{3}$, which is better than the $5$ approximation given by this framework for general metrics. Then, we show how to convert the LMP-approximation algorithms to a true-approximation, with only a $(1+\varepsilon)$ factor loss in the approximation ratio. We obtain a ($\frac{3^p + 1}{2}+\varepsilon$)-approximation algorithm for $k$-clustering with cost function being the $p$-th power of the distance, for $p \geq 1$. This reproduces the best known ($2+\varepsilon$)-approximation for $k$-median by Cohen-Addad et al. [STOC'25], and improves the approximation factor for metric $k$-means from 5.83 by Charikar at al. [FOCS'25] to $5+\varepsilon$ in our framework. Moreover, the same algorithm, but with a specialized analysis, attains ($4+\varepsilon$)-approximation for Euclidean $k$-means matching the recent result by Charikar et al. [STOC'26].

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The manuscript proposes a single iterative randomized rounding algorithm applied to the standard LP relaxation for k-clustering with p-th power distance costs. It yields a ((3^p + 1)/2)-LMP approximation that opens exactly k facilities in expectation, recovering the 2-LMP bound for k-median (p=1) and giving 11/3 for Euclidean k-means. The LMP guarantee is converted via Lagrangian multiplier search to a true ((3^p + 1)/2 + ε)-approximation, improving the metric k-means factor to 5+ε and achieving 4+ε for Euclidean k-means with a specialized analysis.

Significance. If the central claims hold, this provides a unified LP-rounding framework that reproduces the best-known 2+ε approximation for k-median and improves prior bounds for metric k-means while matching recent Euclidean results. The direct derivation of the LMP property from randomized iterative rounding (rather than primal-dual methods) and the clean separation of general-metric and Euclidean analyses are technical strengths that could streamline future work on clustering approximations.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review, the accurate summary of our contributions, and the recommendation to accept. The report lists no major comments.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives its ((3^p + 1)/2 + ε)-approximation directly from a new iterative randomized rounding procedure on the standard LP relaxation for k-clustering with p-power costs. The LMP guarantee is obtained by analyzing the rounding's expectation properties and cost bounds, which are constructed and proved within the paper without reducing to fitted inputs or self-definitions. The conversion from LMP to true approximation applies standard Lagrangian multiplier search techniques. Prior citations (e.g., Jain et al. for the p=1 case) are used only for reproduction and context; they are not load-bearing for the novel general-p or Euclidean improvements. The derivation chain is self-contained against external benchmarks and exhibits no circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper starts from the standard LP relaxation of k-clustering and relies on metric properties of the distance function; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The standard LP relaxation for k-clustering is a valid starting point whose fractional solutions can be rounded while preserving the LMP property.
    Explicitly used as the input to the iterative rounding procedure.

pith-pipeline@v0.9.0 · 5723 in / 1247 out tokens · 43773 ms · 2026-05-10T18:16:53.764745+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Better guarantees for k-means and euclidean k-median by primal-dual algorithms.SIAM Journal on Computing, 49(4):FOCS17–97, 2019

    Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and euclidean k-median by primal-dual algorithms.SIAM Journal on Computing, 49(4):FOCS17–97, 2019

  2. [2]

    Local search heuristic for k-median and facility location problems

    Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristic for k-median and facility location problems. InProceedings of the thirty-third annual ACM symposium on Theory of computing, pages 21–29, 2001

  3. [3]

    On minimum sum of radii and diameters clustering

    Babak Behsaz and Mohammad R Salavatipour. On minimum sum of radii and diameters clustering. Algorithmica, 73(1):143–165, 2015

  4. [4]

    An optimal bifactor approximation algorithm for the metric unca- pacitated facility location problem.SIAM Journal on Computing, 39(6):2212–2231, 2010

    Jarosław Byrka and Karen Aardal. An optimal bifactor approximation algorithm for the metric unca- pacitated facility location problem.SIAM Journal on Computing, 39(6):2212–2231, 2010

  5. [5]

    An improved approximation for k-median and positive correlation in budgeted optimization.ACM Transactions on Algorithms (TALG), 13(2):1–31, 2017

    Jarosław Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization.ACM Transactions on Algorithms (TALG), 13(2):1–31, 2017. 30

  6. [6]

    Fault-tolerant facility location: a ran- domized dependent lp-rounding algorithm

    Jarosław Byrka, Aravind Srinivasan, and Chaitanya Swamy. Fault-tolerant facility location: a ran- domized dependent lp-rounding algorithm. InInternational Conference on Integer Programming and Combinatorial Optimization, pages 244–257. Springer, 2010

  7. [7]

    An improved greedy approximation for (metric) k-means

    Moses Charikar, Vincent Cohen-Addad, Ruiquan Gao, Fabrizio Grandoni, Euiwoong Le, and Ernest Van Wijland. An improved greedy approximation for (metric) k-means. In2025 IEEE 66th Annual Symposium on Foundations of Computer Science (FOCS), pages 233–240. IEEE, 2025

  8. [8]

    A (4+ε)-approximation for euclidean k-means via non-monotone dual-fitting

    Moses Charikar, Vincent Cohen-Addad, Ruiquan Gao, Fabrizio Grandoni, Euiwoong Lee, and Ernest van Wijland. A (4+ε)-approximation for euclidean k-means via non-monotone dual-fitting. InProceed- ings of the 58th Annual ACM Symposium on Theory of Computing. IEEE, 2026

  9. [9]

    A constant-factor approximation algorithm for the k-median problem

    Moses Charikar, Sudipto Guha, Éva Tardos, and David B Shmoys. A constant-factor approximation algorithm for the k-median problem. InProceedings of the thirty-first annual ACM symposium on Theory of computing, pages 1–10, 1999

  10. [10]

    Improved approximation algorithms for the uncapacitated facility location problem.SIAM Journal on Computing, 33(1):1–25, 2003

    Fabián A Chudak and David B Shmoys. Improved approximation algorithms for the uncapacitated facility location problem.SIAM Journal on Computing, 33(1):1–25, 2003

  11. [11]

    Improved approx- imations for euclidean k-means and k-median, via nested quasi-independent sets

    Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, and Shyam Narayanan. Improved approx- imations for euclidean k-means and k-median, via nested quasi-independent sets. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1621–1628, 2022

  12. [12]

    A (2+ε)-approximation algorithm for metric k-median

    Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn, and Ola Svensson. A (2+ε)-approximation algorithm for metric k-median. InProceedings of the 57th Annual ACM Sympo- sium on Theory of Computing, pages 615–624, 2025

  13. [13]

    Tight fpt approx- imations for k-median and k-means

    Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight fpt approx- imations for k-median and k-means. In46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), volume 132, pages 42–1. Schloss Dagstuhl–Leibniz-Zentrum fuer Infor- matik, 2019

  14. [14]

    A ptas for k-means clustering based on weak coresets

    Dan Feldman, Morteza Monemizadeh, and Christian Sohler. A ptas for k-means clustering based on weak coresets. InProceedings of the twenty-third annual symposium on Computational geometry, pages 11–18, 2007

  15. [15]

    Inapproxima- bility of maximum diameter clustering for few clusters

    Henry Fleischmann, Kyrylo Karlov, Karthik CS, Ashwin Padaki, and Stepan Zharkov. Inapproxima- bility of maximum diameter clustering for few clusters. InProceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4707–4731. SIAM, 2025

  16. [16]

    Improved bi-point rounding algorithmsandagoldenbarrierfork-median

    Kishen N Gowda, Thomas Pensyl, Aravind Srinivasan, and Khoa Trinh. Improved bi-point rounding algorithmsandagoldenbarrierfork-median. InProceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 987–1011. SIAM, 2023

  17. [17]

    Greedy strikes back: Improved facility location algorithms.Journal of algorithms, 31(1):228–248, 1999

    Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms.Journal of algorithms, 31(1):228–248, 1999

  18. [18]

    Simpler Analyses of Local Search Algorithms for Facility Location

    Anupam Gupta and Kanat Tangwongsan. Simpler analyses of local search algorithms for facility loca- tion.arXiv preprint arXiv:0809.2554, 2008

  19. [19]

    Data clustering: 50 years beyond k-means.Pattern recognition letters, 31(8):651–666, 2010

    Anil K Jain. Data clustering: 50 years beyond k-means.Pattern recognition letters, 31(8):651–666, 2010

  20. [20]

    Greedy facility location algorithms analyzed using dual fitting with factor-revealing lp.Journal of the ACM (JACM), 50(6):795–824, 2003

    Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing lp.Journal of the ACM (JACM), 50(6):795–824, 2003

  21. [21]

    Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation.Journal of the ACM (JACM), 48(2):274–296, 2001

    Kamal Jain and Vijay V Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation.Journal of the ACM (JACM), 48(2):274–296, 2001. 31

  22. [22]

    A 1.488 approximation algorithm for the uncapacitated facility location problem.Information and Computation, 222:45–58, 2013

    Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem.Information and Computation, 222:45–58, 2013

  23. [23]

    Approximating k-median via pseudo-approximation

    Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. Inproceedings of the forty-fifth annual ACM symposium on theory of computing, pages 901–910, 2013

  24. [24]

    Multivariate observations

    J MacQueen. Multivariate observations. InProceedings ofthe 5th Berkeley symposium on mathematical statisticsand probability, volume 1, pages 281–297. University of California press Oakland, CA, USA, 1967

  25. [25]

    Randomized distributed edge coloring via an extension of the chernoff–hoeffding bounds.SIAM Journal on Computing, 26(2):350–368, 1997

    Alessandro Panconesi and Aravind Srinivasan. Randomized distributed edge coloring via an extension of the chernoff–hoeffding bounds.SIAM Journal on Computing, 26(2):350–368, 1997

  26. [26]

    Approximation algorithms for facility location prob- lems

    David B Shmoys, Éva Tardos, and Karen Aardal. Approximation algorithms for facility location prob- lems. InProceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 265– 274, 1997

  27. [27]

    Distributions on level-sets with applications to approximation algorithms

    Aravind Srinivasan. Distributions on level-sets with applications to approximation algorithms. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 588–597. IEEE, 2001. A Obtaining solutions from additive pseudo-solutions Theorem A.1.For any constantp≥1, suppose thatAis ac-additiveα-approximation algorithm for k-clustering with cos...