A (2+varepsilon)-Approximation Algorithm for Metric k-Median
Pith reviewed 2026-05-23 00:53 UTC · model grok-4.3
The pith
A (2+ε)-approximation algorithm exists for metric k-median that opens exactly k centers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For any ε > 0 the algorithm returns a solution that opens exactly k centers and costs at most (2 + ε) times the optimal k-median cost. It is obtained by running the modified greedy routine to produce an intermediate solution with a few extra centers, walking between near-optimal solutions to reduce the excess while preserving the ratio, and invoking the stable-instance subroutine on the remaining hard cases.
What carries the argument
The walk-between-solutions procedure that moves from one near-optimal solution to another while bounding the number of extra centers opened.
If this is right
- The approximation ratio for k-median can be driven arbitrarily close to 2 while opening exactly k centers.
- Bi-point rounding is no longer required to enforce the exact cardinality constraint for near-2 guarantees.
- Stable instances admit a (2+ε) guarantee via sampling and submodular optimization.
- The modified greedy routine with O(log n / ε²) phases yields a (2+ε) solution with O(log n / ε²) extra centers.
Where Pith is reading between the lines
- The standard LP relaxation for k-median may have integrality gap at most 2.
- The sampling technique for stable instances could extend to k-means or other clustering objectives.
- If most hard instances are stable, the second routine alone might suffice for many practical inputs.
Load-bearing premise
The walk-between-solutions step can be merged with the stable-instance routine without losing the (2+ε) guarantee.
What would settle it
An explicit instance on which the combined procedure either opens more than k centers or returns a solution whose cost exceeds (2+ε) times optimal.
Figures
read the original abstract
In the classical NP-hard metric $k$-median problem, we are given a set of $n$ clients and centers with metric distances between them, along with an integer parameter $k\geq 1$. The objective is to select a subset of $k$ open centers that minimizes the total distance from each client to its closest open center. In their seminal work, Jain, Mahdian, Markakis, Saberi, and Vazirani presented the Greedy algorithm for facility location, which implies a $2$-approximation algorithm for $k$-median that opens $k$ centers in expectation. Since then, substantial research has aimed at narrowing the gap between their algorithm and the best achievable approximation by an algorithm guaranteed to open exactly $k$ centers. During the last decade, all improvements have been achieved by leveraging their algorithm or a small improvement thereof, followed by a second step called bi-point rounding, which inherently increases the approximation guarantee. Our main result closes this gap: for any $\epsilon >0$, we present a $(2+\epsilon)$-approximation algorithm for $k$-median, improving the previous best-known approximation factor of $2.613$. Our approach builds on a combination of two algorithms. First, we present a non-trivial modification of the Greedy algorithm that operates with $O(\log n/\epsilon^2)$ adaptive phases. Through a novel walk-between-solutions approach, this enables us to construct a $(2+\epsilon)$-approximation algorithm for $k$-median that consistently opens at most $k + O(\log n{/\epsilon^2})$ centers. Second, we develop a novel $(2+\epsilon)$-approximation algorithm tailored for stable instances, where removing any center from an optimal solution increases the cost by at least an $\Omega(\epsilon^3/\log n)$ fraction. Achieving this involves a sampling approach inspired by the $k$-means++ algorithm and a reduction to submodular optimization subject to a partition matroid.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims a (2+ε)-approximation algorithm for metric k-median that opens exactly k centers. It modifies the JMMSV greedy algorithm with O(log n/ε²) adaptive phases plus a walk-between-solutions technique to obtain a (2+ε) solution opening at most k+O(log n/ε²) centers, then gives a separate (2+ε) algorithm for “stable” instances (any optimal-center removal increases cost by Ω(ε³/log n)) via k-means++-style sampling and a reduction to submodular maximization under a partition matroid; the two routines are combined to produce an exactly-k solution for arbitrary instances.
Significance. If the combination step is shown to preserve the (2+ε) guarantee without additive loss, the result would be a notable improvement over the prior 2.613 factor and would close the gap to the classic 2-approximation that only holds in expectation. The stability-based case split and the explicit walk-between-solutions construction are technically interesting and could be reusable.
major comments (2)
- [Main algorithm / combination step (likely §4 or the proof of Theorem 1)] The central claim requires a single algorithm that returns exactly k centers. The first routine produces k+O(log n/ε²) centers; the second applies only to instances satisfying the exponentially small stability threshold Ω(ε³/log n). The manuscript must therefore contain an explicit reduction or case-split (presumably in the section describing the overall algorithm and the proof of the main theorem) that routes general instances to the stable routine or applies the walk to reduce centers, together with a quantitative error analysis showing that the stability threshold and sampling/submodular steps introduce no extra additive factor that would violate the overall (2+ε) bound. No such analysis is visible in the provided abstract or high-level description.
- [Stable-instance algorithm (sampling + submodular reduction)] The stability definition uses a threshold Ω(ε³/log n). Any constant-factor slack introduced by the sampling routine or the submodular reduction could push the effective threshold above the required Ω(ε³/log n) and thereby invalidate the (2+ε) guarantee on the stable branch. The manuscript should supply the precise constants and the inequality chain that shows the sampled solution remains stable under the stated threshold.
minor comments (2)
- [Abstract and §1] Notation for the number of extra centers (O(log n/ε²)) should be stated uniformly in the abstract, introduction, and theorem statements.
- [Introduction] The phrase “walk-between-solutions” is introduced without a forward reference to its formal definition; a one-sentence pointer would improve readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading, the recognition of the technical contributions, and the constructive feedback. We address each major comment below, clarifying the content of the full manuscript while agreeing that additional explicit exposition will improve readability.
read point-by-point responses
-
Referee: [Main algorithm / combination step (likely §4 or the proof of Theorem 1)] The central claim requires a single algorithm that returns exactly k centers. The first routine produces k+O(log n/ε²) centers; the second applies only to instances satisfying the exponentially small stability threshold Ω(ε³/log n). The manuscript must therefore contain an explicit reduction or case-split (presumably in the section describing the overall algorithm and the proof of the main theorem) that routes general instances to the stable routine or applies the walk to reduce centers, together with a quantitative error analysis showing that the stability threshold and sampling/submodular steps introduce no extra additive factor that would violate the overall (2+ε) bound. No such analysis is visible in the provided abstract or high-level description.
Authors: The full manuscript (Section 4 and proof of Theorem 1) contains an explicit case-split: instances are routed to the stable routine when they meet the Ω(ε³/log n) threshold (verified via the definition); otherwise the walk-between-solutions technique is applied to the output of the modified greedy phases to obtain exactly k centers. The quantitative analysis shows that the chosen threshold absorbs the O(1) factors from sampling and submodular maximization with no additive loss beyond the (2+ε) term. To address visibility, we will add a high-level paragraph in the introduction and a dedicated overview subsection before Theorem 1 that summarizes the routing and error propagation. revision: yes
-
Referee: [Stable-instance algorithm (sampling + submodular reduction)] The stability definition uses a threshold Ω(ε³/log n). Any constant-factor slack introduced by the sampling routine or the submodular reduction could push the effective threshold above the required Ω(ε³/log n) and thereby invalidate the (2+ε) guarantee on the stable branch. The manuscript should supply the precise constants and the inequality chain that shows the sampled solution remains stable under the stated threshold.
Authors: Section 5 supplies the constants: the k-means++-style sampling succeeds with high probability and incurs a multiplicative O(log n) factor that is absorbed by setting the threshold to Ω(ε³/log n) with an explicit ε³ slack; the submodular maximization under the partition matroid adds a further (1+ε) factor that is likewise absorbed. The inequality chain is (stability of OPT) ≥ Ω(ε³/log n) minus sampling error minus submodular error ≥ Ω(ε³/log n) after adjusting constants inside the Ω notation. We agree the chain should be written out explicitly and will add a new subsection with the full derivation and constant tracking. revision: yes
Circularity Check
No circularity; derivation relies on explicit algorithmic modifications to prior non-self work
full rationale
The paper constructs two new algorithms: a modified greedy procedure with adaptive phases and a walk-between-solutions step that produces a (2+ε) solution opening k + O(log n/ε²) centers, plus a sampling-plus-submodular routine that achieves (2+ε) only on instances meeting an explicitly stated stability threshold. These steps are defined forward from the classical Jain et al. greedy framework (external citation) and standard metric properties; no equation or guarantee is obtained by renaming a fitted parameter, by self-definition, or by a load-bearing self-citation chain. The final combination claim is therefore an independent algorithmic argument rather than a reduction to the paper's own inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- ε
axioms (1)
- domain assumption Input distances form a metric (satisfy triangle inequality).
Reference graph
Works this paper leans on
-
[1]
Stabil ity yields a PTAS for k-median and k-means clustering
[ABS10] Pranjal Awasthi, Avrim Blum, and Or Sheffet. Stabil ity yields a PTAS for k-median and k-means clustering. In 51th Annual IEEE Symposium on Foundations of Computer Scien ce, FOCS 2010, October 23-26, 2010, Las V egas, Nevada, USA , pages 309–318,
work page 2010
-
[2]
The hardness of approximation of Euclidean k-means
[ACKS15] Pranjal Awasthi, Moses Charikar, Ravishankar Kri shnaswamy , and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In Lars Arg e and J´ anos Pach, editors, 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eind- hoven, The Netherlands , volume 34 of LIPIcs, pages 754–767. Schloss Dagstuhl - Le...
work page 2015
-
[3]
Approximation schemes for Eu- clidean k-medians and related problems
[ARR98] Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation schemes for Eu- clidean k-medians and related problems. In Proceedings of the Thirtieth Annual ACM Sym- posium on the Theory of Computing, Dallas, T exas, USA, May 23 -26, 1998 , pages 106–113,
work page 1998
-
[4]
[ARS03] Aaron Archer, Ranjithkumar Rajagopalan, and David B. Shmoys. Lagrangian relaxation for the k-median problem: New insights and continuity prope rties. In Giuseppe Di Bat- tista and Uri Zwick, editors, Algorithms - ESA 2003, 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003, Proceedings , volume 2832 of Lecture Notes in Comput...
work page 2003
-
[5]
k-means++: t he advantages of careful seeding
[A V07] David Arthur and Sergei V assilvitskii. k-means++: t he advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007 , pages 1027–1035,
work page 2007
-
[6]
Probabilistic approximations of metr ic spaces and its algorithmic applica- tions
[Bar96] Y air Bartal. Probabilistic approximations of metr ic spaces and its algorithmic applica- tions. In 37th Annual Symposium on Foundations of Computer Science, F OCS ’96, Burlington, V ermont, USA, 14-16 October, 1996, pages 184–193. IEEE Computer Society ,
work page 1996
-
[7]
On approximating arbitrary metrices b y tree metrics
[Bar98] Y air Bartal. On approximating arbitrary metrices b y tree metrics. In Jeffrey Scott Vit- ter, editor, Proceedings of the Thirtieth Annual ACM Symposium on the The ory of Computing, Dallas, T exas, USA, May 23-26, 1998, pages 161–168. ACM,
work page 1998
-
[8]
[BF24] Niv Buchbinder and Moran Feldman. Deterministic alg orithm and faster algorithm for submodular maximization subject to a matroid constraint. CoRR, abs/2408.03583,
-
[9]
Rounding via trees: Deterministic approximation algorithms for group Steiner trees and k-median
[CCGG98] Moses Charikar, Chandra Chekuri, Ashish Goel, and Sudipto Guha. Rounding via trees: Deterministic approximation algorithms for group Steiner trees and k-median. In Jef- frey Scott Vitter, editor, Proceedings of the Thirtieth Annual ACM Symposium on the The ory of Computing, Dallas, T exas, USA, May 23-26, 1998, pages 114–123. ACM,
work page 1998
-
[10]
63 [CCSL21] Vincent Cohen-Addad, Karthik C. S., and Euiwoong L ee. On approximability of cluster- ing problems without candidate centers. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 2635–2648. SIAM,
work page 2021
-
[11]
[CEMN22] Vincent Cohen-Addad, Hossein Esfandiari, V ahab S . Mirrokni, and Shyam Narayanan. Improved approximations for Euclidean k-means and k-median, via nested quasi- independent sets. In Stefano Leonardi and Anupam Gupta, edi tors, STOC ’22: 54th An- nual ACM SIGACT Symposium on Theory of Computing, Rome, Ital y, June 20 - 24, 2022, pages 1621–1628. ACM,
work page 2022
-
[12]
Near-linear time approximations schemes for clustering in doubling metrics
[CFS19] Vincent Cohen-Addad, Andreas Emil Feldmann, and Da vid Saulpic. Near-linear time approximations schemes for clustering in doubling metrics . In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Scien ce, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019 , pages 540–559. IEEE Computer Society ,
work page 2019
-
[13]
An im- proved local search algorithm for k-median
[CGH+22] Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, a nd David Saulpic. An im- proved local search algorithm for k-median. In Joseph (Seffi ) Naor and Niv Buchbinder, editors, Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algo rithms, SODA 2022, Virtual Conference / Alexandria, V A, USA, January 9 - 12, 2022, pages 1556–1612. SIAM,
work page 2022
-
[14]
Tight FPT approximations for k-median and k-means
[CGK+19] Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoon g Lee, and Jason Li. Tight FPT approximations for k-median and k-means. In Christel Ba ier, Ioannis Chatzigian- nakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Au- tomata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece , volume ...
work page 2019
-
[15]
Breaching the 2 LMP approximation barrier for facility loca tion with applications to k- median
[CGLS23] Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoon g Lee, and Chris Schwiegelshohn. Breaching the 2 LMP approximation barrier for facility loca tion with applications to k- median. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM- SIAM Symposium on Discrete Algorithms, SODA 2023, Florence , Italy, January 22-25, 2023 , ...
work page 2023
-
[16]
[CGTS99] Moses Charikar, Sudipto Guha, ´Eva T ardos, and David B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In Jeffrey Scott Vitter, Lawrence L. Larmore, and Frank Thomson Leighton, ed itors, Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, M ay 1-4, 1999, Atlanta, Geor- gia...
work page 1999
-
[17]
[CK19] Vincent Cohen-Addad and Karthik C. S. Inapproximabi lity of clustering in lp metrics. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-1 2, 2019 , pages 519–539. IEEE Computer Society ,
work page 2019
-
[18]
A dependent lp-rounding ap proach for the k-median prob- lem
[CL12] Moses Charikar and Shi Li. A dependent lp-rounding ap proach for the k-median prob- lem. In Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Rog er W attenhofer, editors, Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, War- wick, UK, July 9-13, 2012, Proceedings, Part I , volume 7391 of Lecture Notes in Computer S...
work page 2012
-
[19]
On th e local structure of stable clus- tering instances
64 [CS17] Vincent Cohen-Addad and Chris Schwiegelshohn. On th e local structure of stable clus- tering instances. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15- 17, 2017 , pages 49–60. IEEE Computer Society ,
work page 2017
-
[20]
Embeddings an d non-approximability of geo- metric problems
[GI03] V enkatesan Guruswami and Piotr Indyk. Embeddings an d non-approximability of geo- metric problems. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA ., pages 537–538,
work page 2003
-
[21]
[GPST23] Kishen N. Gowda, Thomas W . Pensyl, Aravind Sriniva san, and Khoa Trinh. Improved bi-point rounding algorithms and a golden barrier for k-median. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 202 3, pages 987–1011. SIAM,
work page 2023
-
[22]
Simpler Analyses of Local Search Algorithms for Facility Location
[GT08] Anupam Gupta and Kanat T angwongsan. Simpler analyse s of local search algorithms for facility location. CoRR, abs/0809.2554,
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
A new greedy approach for facility location problems
[JMS02] Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings on 34th Annual ACM Symposium on Theory of Comput ing, May 19-21, 2002, Montr´ eal, Qu´ ebec, Canada, pages 731–740,
work page 2002
-
[24]
Approximating k-median via p seudo-approximation
65 [LS13] Shi Li and Ola Svensson. Approximating k-median via p seudo-approximation. In Sym- posium on Theory of Computing Conference, STOC’13, Palo Alt o, CA, USA, June 1-4, 2013 , pages 901–910,
work page 2013
-
[25]
Shmoys, ´Eva T ardos, and Karen Aardal
[STA97] David B. Shmoys, ´Eva T ardos, and Karen Aardal. Approximation algorithms for facility location problems (extended abstract). In Frank Thomson Le ighton and Peter W . Shor, editors, Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Comput- ing, El Paso, T exas, USA, May 4-6, 1997, pages 265–274. ACM,
work page 1997
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.