pith. sign in

arxiv: 2304.05883 · v2 · submitted 2023-04-12 · 💻 cs.DS

On Parallel k-Center Clustering

Pith reviewed 2026-05-24 09:44 UTC · model grok-4.3

classification 💻 cs.DS
keywords k-center clusteringmassively parallel computationlow local spaceapproximation algorithmsEuclidean spaceparallel algorithmsclustering
0
0 comments X

The pith

An extension of a prior MPC algorithm for k-center clustering achieves an O(log* n) approximation in O(log log n) rounds with O(n^δ) local space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends an existing algorithm to solve the k-center problem in constant-dimensional Euclidean space under the low-local-space MPC model. It produces a clustering with k(1+o(1)) centers that approximates the optimal cost by a factor of O(log* n). The same round bound of O(log log n) and local space bound of O(n^δ) are preserved. A sympathetic reader would care because this improves scalability for cases where k is large enough to require many machines without excessive per-machine memory.

Core claim

The authors extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in O(log log n) rounds returns a clustering with k(1+o(1)) clusters that is an O(log* n)-approximation for k-center in constant dimensional Euclidean space.

What carries the argument

Extension of the Bateni et al. (2021) algorithmic techniques for the low-local-space MPC model that improves the approximation factor from O(log log log n) to O(log* n).

If this is right

  • The algorithm works for any constant δ in (0,1) and constant dimension.
  • It handles the regime where k is at least Ω(n^δ).
  • It returns nearly the right number of centers.
  • The approximation improves on the prior O(log log log n) bound while matching the round complexity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar extensions might apply to other clustering objectives like k-means in the same model.
  • The log* n factor suggests the approximation is close to the sequential lower bounds in some regimes.
  • This could reduce the gap between parallel and sequential performance for large-k instances.

Load-bearing premise

The techniques from Bateni et al. can be extended to achieve the better approximation ratio without increasing the number of rounds or local space.

What would settle it

An input instance in constant-dimensional Euclidean space where the extended algorithm either exceeds O(log log n) rounds, uses more than O(n^δ) local space, produces more than k(1+o(1)) centers, or fails to achieve O(log* n) approximation.

read the original abstract

We consider the classic $k$-center problem {in the constant dimensional Euclidean space} under a parallel setting, on the low-local-space Massively Parallel Computation (MPC) model, with local space per machine of ${O}(n^{\delta})$, where $\delta \in (0,1)$ is an arbitrary constant. As a central clustering problem, the $k$-center problem has been studied extensively. Still, until very recently, all parallel MPC algorithms have been requiring $\Omega(k)$ or even $\Omega(k n^{\delta})$ local space per machine. While this setting covers the case of small values of $k$, for a large number of clusters these algorithms require large local memory, making them poorly scalable. The case of large $k$, $k \ge \Omega(n^{\delta})$, has been considered recently for the low-local-space MPC model by Bateni et al.\ (2021), who gave an ${O}(\log \log n)$-round MPC algorithm that produces $k(1+o(1))$ centers whose cost has multiplicative approximation of ${O}(\log\log\log n)$. In this paper we extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in ${O}(\log\log n)$ rounds returns a clustering with $k(1+o(1))$ clusters that is an ${O}(\log^*n)$-approximation for $k$-center.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript extends the Bateni et al. (2021) low-local-space MPC algorithm for k-center clustering in constant-dimensional Euclidean space. It claims an O(log log n)-round algorithm that uses O(n^δ) local space per machine, outputs k(1+o(1)) centers, and achieves an O(log* n) approximation ratio, improving the prior O(log log log n) approximation while preserving round and space bounds.

Significance. If the claimed extension holds, the result improves the approximation factor for the k-center problem in the low-local-space MPC regime without increasing round or space complexity. This would be a modest but concrete advance for parallel clustering when k is large (k ≥ Ω(n^δ)). The paper does not ship machine-checked proofs or reproducible code.

major comments (1)
  1. [Abstract] Abstract: the central claim is an extension of Bateni et al. (2021) that improves the approximation from O(log log log n) to O(log* n) while retaining O(log log n) rounds and O(n^δ) local space. However, the full derivation, required lemmas, and handling of constant-dimensional Euclidean geometry are not visible in the supplied text, so the soundness of the extension cannot be verified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript. Below we respond point-by-point to the sole major comment.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim is an extension of Bateni et al. (2021) that improves the approximation from O(log log log n) to O(log* n) while retaining O(log log n) rounds and O(n^δ) local space. However, the full derivation, required lemmas, and handling of constant-dimensional Euclidean geometry are not visible in the supplied text, so the soundness of the extension cannot be verified.

    Authors: The complete manuscript (arXiv:2304.05883) contains the full technical development. Section 3 presents the MPC algorithm extending Bateni et al., Section 4 contains all lemmas establishing the O(log* n) approximation (via a careful analysis of the recursive partitioning and center selection), and the constant-dimensional Euclidean geometry is handled explicitly in the distance computations and ball-covering arguments throughout Sections 3–5. The supplied text referenced by the referee appears to have been limited to the abstract; the full paper supplies the derivations needed to verify soundness. revision: no

Circularity Check

0 steps flagged

No significant circularity; extension of external prior work

full rationale

The paper's central claim is an algorithmic extension of the Bateni et al. (2021) construction for low-local-space MPC k-center clustering. The abstract and description explicitly reference this as prior external work by different authors, with the new result improving the approximation factor while preserving round and space bounds. No equations, self-definitions, fitted parameters presented as predictions, or load-bearing self-citations appear in the text. The derivation chain is presented as building on an independent cited result rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract invokes standard properties of constant-dimensional Euclidean space and the MPC model definition but introduces no free parameters, no ad-hoc axioms, and no invented entities.

axioms (2)
  • domain assumption The input points lie in constant-dimensional Euclidean space
    Stated explicitly in the abstract as the setting for the k-center problem.
  • domain assumption The low-local-space MPC model with per-machine space O(n^δ) for δ ∈ (0,1)
    Central to the problem statement and comparison with prior work.

pith-pipeline@v0.9.0 · 5781 in / 1560 out tokens · 46603 ms · 2026-05-24T09:44:46.541372+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

  1. [1]

    Parallel algorithms for geometric graph problems

    Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. Parallel algorithms for geometric graph problems. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC) , pages 574--583, 2014

  2. [2]

    Extreme k -center clustering

    MohammadHossein Bateni, Hossein Esfandiari, Manuela Fischer, and Vahab Mirrokni. Extreme k -center clustering. In Proceedings of the 35th AAAI Conference on Artificial Intelligence(AAAI) , pages 3941--3949, 2021

  3. [3]

    Communication steps for parallel query processing

    Paul Beame, Paraschos Koutris, and Dan Suciu. Communication steps for parallel query processing. Journal of the ACM , 64(6):40:1--40:58, 2017

  4. [4]

    Parallel approximation algorithms for facility-location problems

    Guy E Blelloch and Kanat Tangwongsan. Parallel approximation algorithms for facility-location problems. In Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures , pages 315--324, 2010

  5. [5]

    Round compression for parallel matching algorithms

    Artur Czumaj, Jakub a cki, Aleksander M a dry, Slobodan Mitrovi\'c, Krzysztof Onak, and Piotr Sankowski. Round compression for parallel matching algorithms. In Proceedings of the 50th Annual ACM Symposium on Theory of Computing (STOC) , pages 471--484, 2018

  6. [6]

    Solving k -center clustering (with outliers) in MapReduce and streaming, almost as accurately as sequentially

    Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Solving k -center clustering (with outliers) in MapReduce and streaming, almost as accurately as sequentially. Proceedings of the VLDB Endowment , 12(7):766--778, 2019

  7. [7]

    Woodruff, and Qin Zhang

    Jiecao Chen, He Sun, David P. Woodruff, and Qin Zhang. Communication-optimal distributed clustering. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS) , pages 3720--3728, 2016

  8. [8]

    Dyer and Alan M

    Martin E. Dyer and Alan M. Frieze. A simple heuristic for the p -centre problem. Operations Research Letters , 3(6):285--288, 1985

  9. [9]

    MapReduce : Simplified data processing on large clusters

    Jeffrey Dean and Sanjay Ghemawat. MapReduce : Simplified data processing on large clusters. Communications of the ACM , 51(1):107--113, January 2008

  10. [10]

    Fast clustering using MapReduce

    Alina Ene, Sungjin Im, and Benjamin Moseley. Fast clustering using MapReduce . In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , pages 681--689, 2011

  11. [11]

    Gonzalez

    Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science , 38:293--306, 1985

  12. [12]

    Goodrich, Nodari Sitchinava, and Qin Zhang

    Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. Sorting, searching, and simulation in the MapReduce framework. In Proceedings of the 22nd International Symposium on Algorithms and Computation (ISAAC) , pages 374--383, 2011

  13. [13]

    Approximate nearest neighbor: Towards removing the curse of dimensionality

    Sariel Har - Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theoretical Computer Science , 8(1):321--350, 2012

  14. [14]

    Nemhauser

    Wen - Lian Hsu and George L. Nemhauser. Easy and hard bottleneck location problems. Discrete Applied Mathematics , 1(3):209--215, 1979

  15. [15]

    Hochbaum and David B

    Dorit S. Hochbaum and David B. Shmoys. A best possible heuristic for the k -center problem. Mathematics of Operations Research , 10(2):180--184, 1985

  16. [16]

    Dryad: Distributed data-parallel programs from sequential building blocks

    Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Operating Systems Review , 41(3):59--72, March 2007

  17. [17]

    Karloff, Siddharth Suri, and Sergei Vassilvitskii

    Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce . In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 938--948, 2010

  18. [18]

    Kusner, Wenlin Chen, Kilian Q

    Gustavo Malkomes, Matt J. Kusner, Wenlin Chen, Kilian Q. Weinberger, and Benjamin Moseley. Fast distributed k -center clustering with outliers on massive data. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS) , pages 1063--1071, 2015

  19. [19]

    Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale

    Tom White. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale . O'Reilly Media, Sebastopol, CA, 4th edition, 2015

  20. [20]

    Efficient query expansion for advertisement search

    Haofen Wang, Yan Liang, Linyun Fu, Gui - Rong Xue, and Yong Yu. Efficient query expansion for advertisement search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) , pages 51--58, 2009

  21. [21]

    Survey of clustering algorithms

    Rui Xu and Donald Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks , 16(3):645--678, 2005

  22. [22]

    Franklin, Scott Shenker, and Ion Stoica

    Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) , 2010