On Parallel k-Center Clustering
Pith reviewed 2026-05-24 09:44 UTC · model grok-4.3
The pith
An extension of a prior MPC algorithm for k-center clustering achieves an O(log* n) approximation in O(log log n) rounds with O(n^δ) local space.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in O(log log n) rounds returns a clustering with k(1+o(1)) clusters that is an O(log* n)-approximation for k-center in constant dimensional Euclidean space.
What carries the argument
Extension of the Bateni et al. (2021) algorithmic techniques for the low-local-space MPC model that improves the approximation factor from O(log log log n) to O(log* n).
If this is right
- The algorithm works for any constant δ in (0,1) and constant dimension.
- It handles the regime where k is at least Ω(n^δ).
- It returns nearly the right number of centers.
- The approximation improves on the prior O(log log log n) bound while matching the round complexity.
Where Pith is reading between the lines
- Similar extensions might apply to other clustering objectives like k-means in the same model.
- The log* n factor suggests the approximation is close to the sequential lower bounds in some regimes.
- This could reduce the gap between parallel and sequential performance for large-k instances.
Load-bearing premise
The techniques from Bateni et al. can be extended to achieve the better approximation ratio without increasing the number of rounds or local space.
What would settle it
An input instance in constant-dimensional Euclidean space where the extended algorithm either exceeds O(log log n) rounds, uses more than O(n^δ) local space, produces more than k(1+o(1)) centers, or fails to achieve O(log* n) approximation.
read the original abstract
We consider the classic $k$-center problem {in the constant dimensional Euclidean space} under a parallel setting, on the low-local-space Massively Parallel Computation (MPC) model, with local space per machine of ${O}(n^{\delta})$, where $\delta \in (0,1)$ is an arbitrary constant. As a central clustering problem, the $k$-center problem has been studied extensively. Still, until very recently, all parallel MPC algorithms have been requiring $\Omega(k)$ or even $\Omega(k n^{\delta})$ local space per machine. While this setting covers the case of small values of $k$, for a large number of clusters these algorithms require large local memory, making them poorly scalable. The case of large $k$, $k \ge \Omega(n^{\delta})$, has been considered recently for the low-local-space MPC model by Bateni et al.\ (2021), who gave an ${O}(\log \log n)$-round MPC algorithm that produces $k(1+o(1))$ centers whose cost has multiplicative approximation of ${O}(\log\log\log n)$. In this paper we extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in ${O}(\log\log n)$ rounds returns a clustering with $k(1+o(1))$ clusters that is an ${O}(\log^*n)$-approximation for $k$-center.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript extends the Bateni et al. (2021) low-local-space MPC algorithm for k-center clustering in constant-dimensional Euclidean space. It claims an O(log log n)-round algorithm that uses O(n^δ) local space per machine, outputs k(1+o(1)) centers, and achieves an O(log* n) approximation ratio, improving the prior O(log log log n) approximation while preserving round and space bounds.
Significance. If the claimed extension holds, the result improves the approximation factor for the k-center problem in the low-local-space MPC regime without increasing round or space complexity. This would be a modest but concrete advance for parallel clustering when k is large (k ≥ Ω(n^δ)). The paper does not ship machine-checked proofs or reproducible code.
major comments (1)
- [Abstract] Abstract: the central claim is an extension of Bateni et al. (2021) that improves the approximation from O(log log log n) to O(log* n) while retaining O(log log n) rounds and O(n^δ) local space. However, the full derivation, required lemmas, and handling of constant-dimensional Euclidean geometry are not visible in the supplied text, so the soundness of the extension cannot be verified.
Simulated Author's Rebuttal
We thank the referee for their review of our manuscript. Below we respond point-by-point to the sole major comment.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim is an extension of Bateni et al. (2021) that improves the approximation from O(log log log n) to O(log* n) while retaining O(log log n) rounds and O(n^δ) local space. However, the full derivation, required lemmas, and handling of constant-dimensional Euclidean geometry are not visible in the supplied text, so the soundness of the extension cannot be verified.
Authors: The complete manuscript (arXiv:2304.05883) contains the full technical development. Section 3 presents the MPC algorithm extending Bateni et al., Section 4 contains all lemmas establishing the O(log* n) approximation (via a careful analysis of the recursive partitioning and center selection), and the constant-dimensional Euclidean geometry is handled explicitly in the distance computations and ball-covering arguments throughout Sections 3–5. The supplied text referenced by the referee appears to have been limited to the abstract; the full paper supplies the derivations needed to verify soundness. revision: no
Circularity Check
No significant circularity; extension of external prior work
full rationale
The paper's central claim is an algorithmic extension of the Bateni et al. (2021) construction for low-local-space MPC k-center clustering. The abstract and description explicitly reference this as prior external work by different authors, with the new result improving the approximation factor while preserving round and space bounds. No equations, self-definitions, fitted parameters presented as predictions, or load-bearing self-citations appear in the text. The derivation chain is presented as building on an independent cited result rather than reducing to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The input points lie in constant-dimensional Euclidean space
- domain assumption The low-local-space MPC model with per-machine space O(n^δ) for δ ∈ (0,1)
Reference graph
Works this paper leans on
-
[1]
Parallel algorithms for geometric graph problems
Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC) , pages 574--583, 2014
work page 2014
-
[2]
MohammadHossein Bateni, Hossein Esfandiari, Manuela Fischer, and Vahab Mirrokni. Extreme k -center clustering. In Proceedings of the 35th AAAI Conference on Artificial Intelligence(AAAI) , pages 3941--3949, 2021
work page 2021
-
[3]
Communication steps for parallel query processing
Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. Journal of the ACM , 64(6):40:1--40:58, 2017
work page 2017
-
[4]
Parallel approximation algorithms for facility-location problems
Guy E Blelloch and Kanat Tangwongsan. Parallel approximation algorithms for facility-location problems. In Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures , pages 315--324, 2010
work page 2010
-
[5]
Round compression for parallel matching algorithms
Artur Czumaj, Jakub a cki, Aleksander M a dry, Slobodan Mitrovi\'c, Krzysztof Onak, and Piotr Sankowski. Round compression for parallel matching algorithms. In Proceedings of the 50th Annual ACM Symposium on Theory of Computing (STOC) , pages 471--484, 2018
work page 2018
-
[6]
Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Solving k -center clustering (with outliers) in MapReduce and streaming, almost as accurately as sequentially. Proceedings of the VLDB Endowment , 12(7):766--778, 2019
work page 2019
-
[7]
Jiecao Chen, He Sun, David P. Woodruff, and Qin Zhang. Communication-optimal distributed clustering. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS) , pages 3720--3728, 2016
work page 2016
-
[8]
Martin E. Dyer and Alan M. Frieze. A simple heuristic for the p -centre problem. Operations Research Letters , 3(6):285--288, 1985
work page 1985
-
[9]
MapReduce : Simplified data processing on large clusters
Jeffrey Dean and Sanjay Ghemawat. MapReduce : Simplified data processing on large clusters. Communications of the ACM , 51(1):107--113, January 2008
work page 2008
-
[10]
Fast clustering using MapReduce
Alina Ene, Sungjin Im, and Benjamin Moseley. Fast clustering using MapReduce . In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , pages 681--689, 2011
work page 2011
- [11]
-
[12]
Goodrich, Nodari Sitchinava, and Qin Zhang
Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in the MapReduce framework. In Proceedings of the 22nd International Symposium on Algorithms and Computation (ISAAC) , pages 374--383, 2011
work page 2011
-
[13]
Approximate nearest neighbor: Towards removing the curse of dimensionality
Sariel Har - Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theoretical Computer Science , 8(1):321--350, 2012
work page 2012
- [14]
-
[15]
Dorit S. Hochbaum and David B. Shmoys. A best possible heuristic for the k -center problem. Mathematics of Operations Research , 10(2):180--184, 1985
work page 1985
-
[16]
Dryad: Distributed data-parallel programs from sequential building blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Operating Systems Review , 41(3):59--72, March 2007
work page 2007
-
[17]
Karloff, Siddharth Suri, and Sergei Vassilvitskii
Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce . In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 938--948, 2010
work page 2010
-
[18]
Gustavo Malkomes, Matt J. Kusner, Wenlin Chen, Kilian Q. Weinberger, and Benjamin Moseley. Fast distributed k -center clustering with outliers on massive data. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS) , pages 1063--1071, 2015
work page 2015
-
[19]
Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale
Tom White. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale . O'Reilly Media, Sebastopol, CA, 4th edition, 2015
work page 2015
-
[20]
Efficient query expansion for advertisement search
Haofen Wang, Yan Liang, Linyun Fu, Gui - Rong Xue, and Yong Yu. Efficient query expansion for advertisement search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) , pages 51--58, 2009
work page 2009
-
[21]
Survey of clustering algorithms
Rui Xu and Donald Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks , 16(3):645--678, 2005
work page 2005
-
[22]
Franklin, Scott Shenker, and Ion Stoica
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) , 2010
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.