pith. sign in

arxiv: 2606.08713 · v1 · pith:KJZ34VSVnew · submitted 2026-06-07 · 💻 cs.DS · cs.CG

The price of incrementality in k-center clustering

Pith reviewed 2026-06-27 17:40 UTC · model grok-4.3

classification 💻 cs.DS cs.CG
keywords k-center clusteringincremental algorithmsapproximation algorithmsmetric spaceslower boundsclustering
0
0 comments X

The pith

No incremental k-center solution can beat a factor-2 approximation even with unlimited computation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that any sequence of centers built one by one in a metric space must incur a 2-approximation gap at some prefix length k, no matter the amount of computation used to choose the sequence. This holds because the lower-bound instance forces a simultaneous tradeoff across every possible number of centers. A reader would care because the result separates the geometric cost of incrementality from ordinary computational hardness, showing that the constraint alone requires the ratio of 2. The construction applies to all n levels at once and was developed with the help of an external language model.

Core claim

There exists a metric space in which every incremental sequence of centers has, for some k, a maximum distance to the nearest center at least twice the optimal k-center radius for that instance.

What carries the argument

The lower-bound metric instance that imposes a simultaneous tradeoff across all n levels of the clustering.

If this is right

  • Any incremental algorithm for k-center, regardless of running time, has approximation ratio at least 2.
  • Gonzalez's greedy algorithm matches the best possible ratio among all incremental algorithms.
  • The price of requiring incrementality is exactly the factor 2 and is independent of computational power.
  • Non-incremental solutions can achieve strictly better ratios than 2 on the same instances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same construction technique may yield lower bounds for other objectives that require incremental output.
  • Relaxing strict incrementality to allow limited backtracking could potentially improve the ratio.
  • The result isolates incrementality as a distinct source of hardness from time-bounded computation.

Load-bearing premise

The lower-bound instance is a valid metric space in which every incremental sequence of centers must suffer a 2-approximation gap at some level.

What would settle it

An explicit incremental sequence of centers on the paper's lower-bound instance that achieves ratio strictly less than 2 at every prefix length.

Figures

Figures reproduced from arXiv: 2606.08713 by L\'aszl\'o Kozma.

Figure 1
Figure 1. Figure 1: Constructions of § 2.1 and § 2.2, identically scaled. Choosing p3 as the first center, the algorithm must take the second center on one side of p3, so the nearest center to the opposite endpoint (p1 or p5) remains p3 itself, at distance ϕ. This yields ALG2 = ϕ, for a ratio R ≥ ALG2 OPT2 ≥ ϕ 1 > 1.618. Thus, in all cases we have an approximation ratio of at least ϕ and no incremental algorithm can achieve R… view at source ↗
read the original abstract

The $k$-center problem is one of the best-studied and most intuitive clustering formulations. It asks, given a set of $n$ points in a metric space, for $k$ of the points to be designated as cluster centers, so that the maximum distance of an input point to its nearest center is minimized. Gonzalez's greedy algorithm from 1985 is a simple and efficient way to find a $2$-approximate solution. The algorithm has the attractive feature of \emph{incrementality}: it outputs the centers one by one, with a guaranteed $2$-approximation for every prefix of the obtained sequence of centers. Incrementality imposes a geometric constraint on how solutions can be built, and it is natural to ask whether this comes at a price in the quality of the solution. It is known that in polynomial time, the approximation ratio of $2$ is best possible, assuming $P \neq NP$. In this paper we show that even with \emph{unlimited} computational power, the factor $2$ cannot be improved, if the solution is required to be built incrementally. The lower bound construction imposes a tradeoff between all $n$ levels of the clustering simultaneously; it was obtained with the help of ChatGPT, an aspect we discuss in Section 3 of the paper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that even with unlimited computational power, the factor 2 cannot be improved upon for k-center if the solution must be built incrementally. This is shown by an explicit adversarial construction of a finite point set (obtained with ChatGPT assistance and discussed in Section 3) in which every incremental sequence of centers must suffer a 2-approximation gap at some level.

Significance. If the construction is valid, the result shows that the 2-approximation is forced by the incremental constraint rather than by computational limits, complementing the known P≠NP hardness for polynomial-time algorithms. The explicit finite construction is a strength, as it makes the lower bound directly verifiable in principle.

major comments (1)
  1. [Section 3] Section 3: The lower-bound instance is asserted to be a metric space whose distances force the claimed gap for every incremental sequence, but the manuscript provides neither an explicit enumeration of all triples nor a proof that the triangle inequality holds for every triple. This verification is load-bearing, as the result applies only if the distances form a valid metric.
minor comments (2)
  1. [Abstract] The abstract clearly states the claim, but Section 3 should separate the mathematical construction from the description of its generation method to avoid any conflation.
  2. Consider including a computational check (e.g., a short script or table of verified triples) for the metric property to support reproducibility of the finite instance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for recognizing the significance of the result. The major comment concerns verification that the lower-bound instance forms a valid metric; we address this directly below and agree that additional material is warranted.

read point-by-point responses
  1. Referee: [Section 3] Section 3: The lower-bound instance is asserted to be a metric space whose distances force the claimed gap for every incremental sequence, but the manuscript provides neither an explicit enumeration of all triples nor a proof that the triangle inequality holds for every triple. This verification is load-bearing, as the result applies only if the distances form a valid metric.

    Authors: We agree that a self-contained verification of the triangle inequality is necessary for the claim to be fully rigorous. The distances in the construction are defined explicitly (via a finite set of points and a distance matrix given in Section 3), so in principle the property is checkable; however, the current text does not supply either an exhaustive case analysis or a compact proof. In the revision we will add a short appendix that proves the triangle inequality holds for every triple, either by direct verification (the instance is small) or by showing that the distances coincide with shortest-path distances in an auxiliary graph whose edges already satisfy the inequality. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit adversarial construction for unconditional lower bound

full rationale

The paper's central result is an explicit finite point-set construction establishing that no incremental sequence of centers can achieve better than factor 2 at every prefix, even with unlimited computation. This is a direct combinatorial argument resting on the asserted distances of one instance; it does not reduce any claimed prediction or theorem to a fitted parameter, self-citation, or definitional renaming. The abstract and available text contain no equations that equate a derived quantity to its own input by construction, nor any load-bearing appeal to prior self-authored uniqueness results. The construction is presented as independent evidence, consistent with the reader's assessment of score 0.0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the existence of a metric-space instance that forces simultaneous approximation gaps at every prefix length; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The input is a finite metric space satisfying the triangle inequality.
    Standard assumption for the k-center problem stated in the abstract.

pith-pipeline@v0.9.1-grok · 5761 in / 1085 out tokens · 16674 ms · 2026-06-27T17:40:20.739857+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

134 extracted references · 51 canonical work pages

  1. [1]

    Papadimitriou and Mihalis Yannakakis , title =

    Christos H. Papadimitriou and Mihalis Yannakakis , title =. J. Comput. Syst. Sci. , volume =. 1991 , timestamp =

  2. [2]

    Hai Du and Yinfeng Xu and Binhai Zhu , title =. J. Comb. Optim. , volume =. 2015 , doi =

  3. [3]

    The Price of Hierarchical Clustering , journal =

    Anna Arutyunova and Heiko R. The Price of Hierarchical Clustering , journal =. 2025 , doi =

  4. [4]

    Incremental Clustering and Dynamic Information Retrieval , journal =

    Moses Charikar and Chandra Chekuri and Tom. Incremental Clustering and Dynamic Information Retrieval , journal =. 2004 , url =. doi:10.1137/S0097539702418498 , timestamp =

  5. [5]

    Mettu and C

    Ramgopal R. Mettu and C. Greg Plaxton , title =. 2003 , doi =

  6. [6]

    Williamson , title =

    Guolong Lin and Chandrashekhar Nagarajan and Rajmohan Rajaraman and David P. Williamson , title =. Proceedings of the Seventeenth Annual. 2006 , url =

  7. [7]

    Greg Plaxton , title =

    C. Greg Plaxton , title =. J. Comput. Syst. Sci. , volume =. 2006 , doi =

  8. [8]

    Networks , volume =

    Ashwin Arulselvan and Olaf Maurer and Martin Skutella , title =. Networks , volume =. 2015 , doi =

  9. [9]

    Manuscript , url =

    Approximability of metric clustering problems , author=. Manuscript , url =

  10. [10]

    , author=

    On Mentzer’s Hardness of the k-Center Problem on the Euclidean Plane. , author=

  11. [11]

    Frederickson , editor =

    Greg N. Frederickson , editor =. Parametric Search and Locating Supply Centers in Trees , booktitle =. 1991 , doi =

  12. [12]

    Nimrod Megiddo , title =. J. Symb. Comput. , volume =. 1990 , url =. doi:10.1016/S0747-7171(08)80067-3 , timestamp =

  13. [13]

    Fowler and Mike Paterson and Steven L

    Robert J. Fowler and Mike Paterson and Steven L. Tanimoto , title =. Inf. Process. Lett. , volume =. 1981 , url =. doi:10.1016/0020-0190(81)90111-3 , timestamp =

  14. [14]

    Dynamic algorithms for

    Emilio Cruciani and Sebastian Forster and Gramoz Goranci and Yasamin Nazari and Antonis Skarlatos , editor =. Dynamic algorithms for. Proceedings of the 2024. 2024 , doi =

  15. [15]

    General bounds for incremental maximization , journal =

    Aaron Bernstein and Yann Disser and Martin Gro. General bounds for incremental maximization , journal =. 2022 , doi =

  16. [16]

    Hochbaum and David B

    Dorit S. Hochbaum and David B. Shmoys , title =. Math. Oper. Res. , volume =. 1985 , doi =

  17. [17]

    doi: https://doi.org/10.1016/0304-3975(85)90224-5

    Teofilo F. Gonzalez , title =. Theor. Comput. Sci. , volume =. 1985 , url =. doi:10.1016/0304-3975(85)90224-5 , timestamp =

  18. [18]

    arXiv:2104.03461

    Matthias Englert and Nicolaos Matsakis and Pavel Vesel. Improved approximation guarantees for shortest superstrings using cycle classification by overlap to length ratios , booktitle =. 2022 , url =. doi:10.1145/3519935.3520001 , timestamp =

  19. [19]

    Approximation Guarantees for Shortest Superstrings: Simpler and Better , booktitle =

    Matthias Englert and Nicolaos Matsakis and Pavel Vesel. Approximation Guarantees for Shortest Superstrings: Simpler and Better , booktitle =. 2023 , url =. doi:10.4230/LIPICS.ISAAC.2023.29 , timestamp =

  20. [20]

    Data Compression Conference,

    Isamu Furuya and Takuya Takagi and Yuto Nakashima and Shunsuke Inenaga and Hideo Bannai and Takuya Kida , editor =. Data Compression Conference,. 2019 , timestamp =

  21. [21]

    Tokenizer Choice For

    Ali, Mehdi and Fromm, Michael and Thellmann, Klaudia and Rutmann, Richard and L. Tokenizer Choice For. arXiv preprint arXiv:2310.08754 , year=

  22. [22]

    Blum, Avrim and Jiang, Tao and Li, Ming and Tromp, John and Yannakakis, Mihalis , title =. J. ACM , month = jul, pages =. 1994 , issue_date =

  23. [23]

    Williamson and David B

    David P. Williamson and David B. Shmoys , title =. 2011 , isbn =

  24. [24]

    Symposium on Combinatorial Pattern Matching , pages=

    A really simple approximation of smallest grammar , author=. Symposium on Combinatorial Pattern Matching , pages=. 2014 , organization=

  25. [25]

    Application of

    Rytter, Wojciech , journal=. Application of. 2003 , publisher=

  26. [26]

    Proceedings of the Data Compression Conference , pages=

    Re-pair Achieves High-Order Entropy , author=. Proceedings of the Data Compression Conference , pages=

  27. [27]

    Findings of the Association for Computational Linguistics: ACL 2023 , pages=

    A Formal Perspective on Byte-Pair Encoding , author=. Findings of the Association for Computational Linguistics: ACL 2023 , pages=. 2023 , url =

  28. [28]

    Investigating the effectiveness of

    Gall. Investigating the effectiveness of. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) , pages=

  29. [29]

    International Conference on Computational Linguistics and Intelligent Text Processing , pages=

    How much does tokenization affect neural machine translation? , author=. International Conference on Computational Linguistics and Intelligent Text Processing , pages=. 2019 , organization=

  30. [30]

    China National Conference on Chinese Computational Linguistics , pages=

    A robustly optimized BERT pre-training approach with post-training , author=. China National Conference on Chinese Computational Linguistics , pages=. 2021 , organization=

  31. [31]

    OpenAI blog , volume=

    Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

  32. [32]

    Languages through the looking glass of

    Gutierrez-Vasques, Ximena and Bentz, Christian and Samard. Languages through the looking glass of. Computational Linguistics , volume=. 2023 , publisher=

  33. [33]

    arXiv preprint arXiv:2302.13971 , year=

    Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=

  34. [34]

    Vocabulary Learning via Optimal Transport for Neural Machine Translation , author=. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , pages=

  35. [35]

    arXiv preprint arXiv:2402.18376 , year=

    Tokenization Is More Than Compression , author=. arXiv preprint arXiv:2402.18376 , year=

  36. [36]

    arXiv preprint arXiv:2404.08335 , year=

    Toward a Theory of Tokenization in LLMs , author=. arXiv preprint arXiv:2404.08335 , year=

  37. [37]

    arXiv preprint arXiv:1508.07909 , year=

    Neural machine translation of rare words with subword units , author=. arXiv preprint arXiv:1508.07909 , year=

  38. [38]

    2023 IEEE International Conference on Big Data (BigData) , pages=

    Multimodal large language models: A survey , author=. 2023 IEEE International Conference on Big Data (BigData) , pages=. 2023 , organization=

  39. [39]

    arXiv preprint arXiv:2112.10508 , year=

    Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP , author=. arXiv preprint arXiv:2112.10508 , year=

  40. [40]

    Bloom: A 176b-parameter open-access multilingual language model , author=

  41. [41]

    IEEE Transactions on Information Theory , volume=

    Grammar-based codes: A new class of universal lossless source codes , author=. IEEE Transactions on Information Theory , volume=. 2000 , publisher=

  42. [42]

    Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , booktitle =. 2020 , timestamp =

  43. [43]

    Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

    Byte Pair Encoding is Suboptimal for Language Model Pretraining , author=. Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

  44. [44]

    Groups-Complexity-Cryptology , volume=

    Algorithmics on SLP-compressed strings: A survey , author=. Groups-Complexity-Cryptology , volume=. 2012 , publisher=

  45. [45]

    Improvements on

    Ga. Improvements on. 2017 Data Compression Conference (DCC) , pages=. 2017 , organization=

  46. [46]

    Rpair: rescaling

    Gagie, Travis and I, Tomohiro and Manzini, Giovanni and Navarro, Gonzalo and Sakamoto, Hiroshi and Takabatake, Yoshimasa , booktitle=. Rpair: rescaling. 2019 , organization=

  47. [47]

    Recursive

    Kim, Justin and Varki, Rahul and Oliva, Marco and Boucher, Christina , journal=. Recursive. 2024 , publisher=

  48. [48]

    IEEE Transactions on Information Theory , volume=

    The smallest grammar problem revisited , author=. IEEE Transactions on Information Theory , volume=. 2020 , publisher=

  49. [49]

    Proceedings of the IEEE , volume=

    Off-line dictionary-based compression , author=. Proceedings of the IEEE , volume=. 2000 , publisher=

  50. [50]

    2005 , timestamp =

    Moses Charikar and Eric Lehman and Ding Liu and Rina Panigrahy and Manoj Prabhakaran and Amit Sahai and Abhi Shelat , title =. 2005 , timestamp =

  51. [51]

    Journal of the ACM (JACM) , volume=

    Data compression via textual substitution , author=. Journal of the ACM (JACM) , volume=. 1982 , publisher=

  52. [52]

    1999 , journal=

    Byte pair encoding: A text compression scheme that accelerates pattern matching , author=. 1999 , journal=

  53. [53]

    The C Users Journal , volume=

    A new algorithm for data compression , author=. The C Users Journal , volume=. 1994 , publisher=

  54. [54]

    Paola Alimonti and Viggo Kann , title =. Theor. Comput. Sci. , volume =. 2000 , timestamp =

  55. [55]

    Karp , editor =

    Richard M. Karp , editor =. Reducibility Among Combinatorial Problems , booktitle =. 1972 , timestamp =

  56. [56]

    Chung, F. R. K. and Graham, R. L. , title =. Annals of the New York Academy of Sciences , year =

  57. [57]

    M. R. Garey and David S. Johnson , title =. 1979 , isbn =

  58. [58]

    Permutation patterns , year =

    Brignall, Robert , title =. Permutation patterns , year =. doi:10.1017/CBO9780511902499.003 , publisher =

  59. [59]

    2018 , publisher=

    Forbidden configurations in discrete geometry , author=. 2018 , publisher=

  60. [60]

    Mathematical Proceedings of the Cambridge Philosophical Society , year =

    Beardwood, Jillian and Halton, John H and Hammersley, John Michael , title =. Mathematical Proceedings of the Cambridge Philosophical Society , year =

  61. [61]

    A rounding algorithm for approximating minimum Manhattan networks , journal =

    Victor Chepoi and Karim Nouioua and Yann Vax. A rounding algorithm for approximating minimum Manhattan networks , journal =. 2008 , url =. doi:10.1016/j.tcs.2007.10.013 , timestamp =

  62. [62]

    Zeyu Guo and He Sun and Hong Zhu , title =. Int. J. Comput. Geom. Appl. , volume =. 2011 , url =. doi:10.1142/S0218195911003688 , timestamp =

  63. [63]

    Francis Y. L. Chin and Zeyu Guo and He Sun , title =. Discret. Comput. Geom. , year =. doi:10.1007/s00454-011-9342-z , timestamp =

  64. [64]

    Joachim Gudmundsson and Christos Levcopoulos and Giri Narasimhan , title =. Nord. J. Comput. , volume =. 2001 , url =

  65. [65]

    Proceedings of the 23rd European Workshop on Computational Geometry , pages=

    Small Manhattan networks and algorithmic applications for the earth mover’s distance , author=. Proceedings of the 23rd European Workshop on Computational Geometry , pages=

  66. [66]

    Binary search trees and rectangulations , journal =

    L. Binary search trees and rectangulations , journal =. 2016 , volume =. 1603.08151 , eprinttype =

  67. [67]

    2023 , url =

    Parinya Chalermsook and Manoj Gupta and Wanchote Jiamjitrak and Nidia Obscura Acosta and Akash Pareek and Sorrachai Yingchareonthawornchai , title =. 2023 , url =. doi:10.1137/1.9781611977554.ch22 , timestamp =

  68. [68]

    Levy and Robert E

    Caleb C. Levy and Robert E. Tarjan , title =. Algorithms and Data Structures - 16th International Symposium,. 2019 , url =. doi:10.1007/978-3-030-24766-9\_37 , timestamp =

  69. [69]

    Dion Harmon , title =

  70. [70]

    Wilber , title =

    Robert E. Wilber , title =. 1989 , url =. doi:10.1137/0218004 , timestamp =

  71. [71]

    Thurston , title =

    Daniel Dominic Sleator and Robert Endre Tarjan and William P. Thurston , title =. 1986 , url =. doi:10.1145/12130.12143 , timestamp =

  72. [72]

    Scaling limits of permutation classes with a finite specification: a dichotomy , journal =

    Bassino, Fr. Scaling limits of permutation classes with a finite specification: a dichotomy , journal =. 2022 , volume =. doi:10.1016/j.aim.2022.108513 , publisher =

  73. [73]

    Journal of Combinatorial Theory, Series B , volume =

    Robertson, Neil and Seymour, Paul D , title =. Journal of Combinatorial Theory, Series B , year =. doi:10.1016/J.JCTB.2004.08.001 , publisher =

  74. [74]

    Pattern-Avoiding Access in Binary Search Trees , booktitle =

    Parinya Chalermsook and Mayank Goswami and L. Pattern-Avoiding Access in Binary Search Trees , booktitle =. 2015 , url =. doi:10.1109/FOCS.2015.32 , timestamp =

  75. [75]

    Algorithmica , volume =

    Avrim Blum and Shuchi Chawla and Adam Kalai , title =. Algorithmica , volume =. 2003 , timestamp =

  76. [76]

    30 Zongpeng Li and Baochun Li

    Parinya Chalermsook and Mayank Goswami and L. Greedy Is an Almost Optimal Deque , booktitle =. 2015 , url =. doi:10.1007/978-3-319-21840-3\_13 , timestamp =

  77. [77]

    Demaine and Dion Harmon and John Iacono and Mihai P

    Erik D. Demaine and Dion Harmon and John Iacono and Mihai P. Dynamic Optimality - Almost , journal =. 2007 , url =. doi:10.1137/S0097539705447347 , timestamp =

  78. [78]

    Demaine and Dion Harmon and John Iacono and Daniel M

    Erik D. Demaine and Dion Harmon and John Iacono and Daniel M. Kane and Mihai P. The geometry of binary search trees , booktitle =. 2009 , url =. doi:10.1137/1.9781611973068.55 , timestamp =

  79. [79]

    Daniel Dominic Sleator and Robert Endre Tarjan , title =. J. 1985 , url =. doi:10.1145/3828.3835 , timestamp =

  80. [80]

    SIAM Journal on computing , volume=

    Guillotine subdivisions approximate polygonal subdivisions: A simple polynomial-time approximation scheme for geometric TSP, k-MST, and related problems , author=. SIAM Journal on computing , volume=. 1999 , publisher=

Showing first 80 references.