pith. sign in

arxiv: 2606.30348 · v1 · pith:PBOT36GOnew · submitted 2026-06-29 · 💻 cs.DS

Optimal Stable Coresets for Geometric Median via Uniform Sampling

Pith reviewed 2026-06-30 03:32 UTC · model grok-4.3

classification 💻 cs.DS
keywords geometric medianstable coresetuniform samplingcoresetscomputational geometrydimension independenceconstrained optimization
0
0 comments X

The pith

A uniform sample of size O(ε^{-2} log 1/ε) is a stable (ε, O(ε))-coreset for the geometric median with high probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The geometric median finds the point in R^d that minimizes the sum of Euclidean distances to an input set of points. The paper proves that uniform sampling produces a stable coreset of size O(ε^{-2} log 1/ε) that preserves the cost of near-optimal solutions even when the output point must obey additional unknown constraints. This size is independent of dimension d and tight up to the log factor. The proof adapts prior stable-coreset machinery but applies it iteratively to shrink the sample and drop the O(log d) overhead. A reader would care because many optimization routines use the geometric median as a subroutine under structural constraints, and this preprocessing step works in sublinear time.

Core claim

Our main result is that a uniform sample of size O(ε^{-2} log 1/ε) is a stable (ε, O(ε))-coreset for the geometric median, with high constant probability, and this bound is tight up to the logarithmic factor. Our analysis adapts recent machinery of Carmel and Krauthgamer for constructing stable coresets, which incurs an O(log d) factor. We show an iterative argument that progressively reduces the sample size, and eliminates this dependence on the dimension d. At a high level, this approach resembles the technique of iterative size reduction, which is applicable for strong coresets but not for weak coresets.

What carries the argument

The iterative argument that repeatedly applies the stable-coreset construction to shrink the sample and remove the O(log d) factor while preserving the (ε, O(ε)) stability guarantee.

If this is right

  • Constrained versions of the geometric median can be solved on the coreset without knowing the constraints in advance.
  • The coreset size is independent of the ambient dimension d.
  • The coreset is obtained by uniform sampling and therefore constructed in sublinear time.
  • The sample-size bound is optimal up to the logarithmic factor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The iterative reduction technique may transfer to stable coreset constructions for other geometric or clustering objectives.
  • It suggests a general route for closing the gap between weak and stable coreset sizes when dimension dependence appears in the analysis.
  • Dimension-independent stable coresets could simplify preprocessing pipelines for high-dimensional constrained optimization tasks that repeatedly invoke the geometric median.

Load-bearing premise

The prior stable-coreset construction can be applied iteratively without destroying the stability property at each step.

What would settle it

A concrete input set in high dimension for which every uniform sample of size o(ε^{-2} log 1/ε) fails to be a stable (ε, O(ε))-coreset for the geometric median.

read the original abstract

The geometric median problem asks to find a point in $\mathbb{R}^d$ that minimizes the sum of Euclidean distances to an input set. It is a classical problem in computational geometry and appears as a subroutine in numerous optimization tasks, many of which require the solution to satisfy additional structural constraints. A common approach to reduce the input size is to construct a coreset, which is a small weighted subset that faithfully represents the input for a specific optimization problem. Strong coresets preserve the cost of every candidate solution but require linear time to construct; weak coresets admit sublinear construction, in fact by uniform sampling, but only preserve near-optimal solutions, which is insufficient when the solution is constrained. To address this, we focus instead on the recently introduced intermediate notion of a \emph{stable coreset}, which simultaneously handles all constrained variants. Currently, there is a large gap between the known sample sizes for stable and weak coresets. Our main result is that a uniform sample of size $O(\epsilon^{-2} \log \tfrac{1}{\epsilon})$ is a stable $(\epsilon, O(\epsilon))$-coreset for the geometric median, with high constant probability, and this bound is tight up to the logarithmic factor. Our analysis adapts recent machinery of Carmel and Krauthgamer (ICLR 2026) for constructing stable coresets, which incurs an $O(\log d)$ factor. We show an iterative argument that progressively reduces the sample size, and eliminates this dependence on the dimension $d$. At a high level, this approach resembles the technique of iterative size reduction, which is applicable for strong coresets but not for weak coresets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that a uniform sample of size O(ε^{-2} log(1/ε)) forms a stable (ε, O(ε))-coreset for the geometric median problem with high constant probability, and that this sample size is tight up to the logarithmic factor. The analysis adapts the stable-coreset machinery of Carmel and Krauthgamer (ICLR 2026), which initially produces an O(log d) overhead, and removes the dimension dependence via an iterative size-reduction argument that the authors state resembles the technique used for strong coresets.

Significance. If the central claim is correct, the result would close the sample-complexity gap between stable and weak coresets for geometric median, enabling uniform-sampling constructions that simultaneously handle all constrained variants of the problem. The tightness statement and the elimination of the O(log d) factor would be notable strengths.

major comments (2)
  1. [Abstract (iterative argument paragraph) and the section containing the proof of the main theorem] The iterative size-reduction argument (described in the abstract and presumably detailed in the main technical section) is load-bearing for the O(ε^{-2} log(1/ε)) bound and the removal of the O(log d) factor. It is unclear from the high-level description how many iterations are performed, whether the additive O(ε) error terms accumulate across iterations, and whether the stability property continues to hold for every constrained variant after each reduction step without re-introducing dimension dependence.
  2. [Section containing the tightness argument] The claim that the bound is tight up to the logarithmic factor requires an explicit lower-bound construction that applies to stable coresets (rather than only to weak coresets). The manuscript should clarify whether the lower bound is derived directly for the stable-coreset definition or obtained by reduction from a known weak-coreset lower bound.
minor comments (2)
  1. [Abstract and Theorem statement] The abstract states 'high constant probability' without specifying the exact success probability (e.g., 1-δ with δ=1/poly(1/ε)); the formal theorem statement should include this parameter.
  2. [Introduction / Preliminaries] Notation for the stable-coreset parameters (ε, O(ε)) should be defined explicitly on first use, including the precise meaning of the additive O(ε) term relative to the stability requirement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We address each major comment below and will revise the manuscript to improve clarity on the points raised.

read point-by-point responses
  1. Referee: [Abstract (iterative argument paragraph) and the section containing the proof of the main theorem] The iterative size-reduction argument (described in the abstract and presumably detailed in the main technical section) is load-bearing for the O(ε^{-2} log(1/ε)) bound and the removal of the O(log d) factor. It is unclear from the high-level description how many iterations are performed, whether the additive O(ε) error terms accumulate across iterations, and whether the stability property continues to hold for every constrained variant after each reduction step without re-introducing dimension dependence.

    Authors: We agree that the high-level description in the abstract leaves some details implicit. The iterative procedure runs for O(log(1/ε)) iterations. Parameters are chosen so that the per-iteration additive error is O(ε / log(1/ε)), ensuring the total accumulated error remains O(ε). Each reduction step preserves the stable-coreset property for all constrained variants simultaneously, and the argument itself is dimension-independent. In the revised manuscript we will expand both the abstract paragraph and the proof section with an explicit iteration count, error-budget calculation, and a short lemma confirming preservation of stability across steps. revision: yes

  2. Referee: [Section containing the tightness argument] The claim that the bound is tight up to the logarithmic factor requires an explicit lower-bound construction that applies to stable coresets (rather than only to weak coresets). The manuscript should clarify whether the lower bound is derived directly for the stable-coreset definition or obtained by reduction from a known weak-coreset lower bound.

    Authors: Any lower bound for weak coresets immediately applies to stable coresets, because every stable coreset is also a weak coreset. Consequently the known Ω(ε^{-2}) lower bound for weak coresets of the geometric median carries over directly. We will add a short clarifying paragraph in the tightness section that states this reduction explicitly and notes that the resulting bound matches our upper bound up to the logarithmic factor. revision: yes

Circularity Check

0 steps flagged

Minor self-citation for base machinery; central iterative reduction supplies independent content

full rationale

The derivation adapts stable-coreset machinery from the authors' own prior work (Carmel and Krauthgamer, ICLR 2026) to obtain an O(log d) sample size, then applies a new iterative size-reduction argument to remove the dimension dependence and reach the claimed O(ε^{-2} log 1/ε) bound. Because the iterative step is presented as an original contribution that preserves the (ε, O(ε)) stability guarantee, the central claim does not reduce by construction to the self-citation. No self-definitional, fitted-input, or renaming patterns appear in the provided text, and the cited machinery is treated as an external building block rather than an unverified load-bearing premise internal to this paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters, axioms, or invented entities stated. The result relies on properties of the geometric median and the adapted stable coreset machinery from the cited ICLR 2026 paper.

pith-pipeline@v0.9.1-grok · 5837 in / 983 out tokens · 36118 ms · 2026-06-30T03:32:51.133549+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 15 canonical work pages

  1. [1]

    [AHV05] Pankaj K

    doi:10.1145/1824777.1824779. [AHV05] Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. Geometric ap- proximation via coresets. InCombinatorial and computational geometry, volume 52 of MSRI Publications, chapter 1, pages 1–30. Cambridge University Press,

  2. [2]

    [BBB18] Johannes Bl¨ omer, Sascha Brauer, and Kathrin Bujna

    URL:https://proceedings.mlr.press/v235/ afshani24a.html. [BBB18] Johannes Bl¨ omer, Sascha Brauer, and Kathrin Bujna. Coresets for fuzzy k-means with applications. InISAAC, volume 123 ofLIPIcs, pages 46:1–46:12. Schloss Dagstuhl - Leibniz-Zentrum f¨ ur Informatik, 2018.doi:10.4230/LIPICS.ISAAC.2018.46. [BC08] Mihai B˘ adoiu and Kenneth L. Clarkson. Optima...

  3. [3]

    press/v97/braverman19a.html

    URL:http://proceedings.mlr. press/v97/braverman19a.html. [BJKW21] Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Xuan Wu. Core- sets for clustering in excluded-minor graphs and beyond. InSODA, pages 2679–2696. SIAM, 2021.doi:10.1137/1.9781611976465.159. [BRSS25] Beatrice Bertolotti, Matteo Russo, Chris Schwiegelshohn, and Sudarshan Shya...

  4. [4]

    [CGJK25] Amir Carmel, Chengzhi Guo, Shaofeng H.-C

    URL:https://proceedings.mlr.press/v151/chhaya22a.html. [CGJK25] Amir Carmel, Chengzhi Guo, Shaofeng H.-C. Jiang, and Robert Krauthgamer. Coresets for 1-center inℓ 1 metrics. InITCS, volume 325 ofLIPIcs, pages 28:1–28:20. Schloss Dagstuhl - Leibniz-Zentrum f¨ ur Informatik, 2025.doi:10.4230/LIPICS.ITCS.2025

  5. [5]

    On ultrametric 1-median selection.Theoretical Computer Science, 828-829:65–69, 2020.doi:10.1016/j.tcs.2020.04.021

    [Cha20] Ching-Lueh Chang. On ultrametric 1-median selection.Theoretical Computer Science, 828-829:65–69, 2020.doi:10.1016/j.tcs.2020.04.021. [Che72] Herman Chernoff.Sequential analysis and optimal design. SIAM,

  6. [6]

    On coresets fork-median andk-means clustering in metric and Euclidean spaces and their applications.SIAM J

    [Che09] Ke Chen. On coresets fork-median andk-means clustering in metric and Euclidean spaces and their applications.SIAM J. Comput., 39(3):923–947, 2009.doi:10.1137/ 070699007. [CK26] Amir Carmel and Robert Krauthgamer. Stable coresets: Unleashing the power of uni- form sampling. In14th International Conference on Learning Representations, ICLR 2026,

  7. [7]

    [CLM+16] Michael B

    URL:https://openreview.net/forum?id=sOpAa8iR0A. [CLM+16] Michael B. Cohen, Yin Tat Lee, Gary L. Miller, Jakub Pachocki, and Aaron Sidford. Geometric median in nearly linear time. InSTOC, pages 9–21. ACM, 2016.doi: 10.1145/2897518.2897647. [CLSS22] Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, and Chris Schwiegelshohn. Towards optimal lower boun...

  8. [8]

    [CSS21a] Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn

    URL:https://doi.org/10.1016/j.ipl.2022.106250,doi: 10.1016/J.IPL.2022.106250. [CSS21a] Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. Improved coresets and sublinear algorithms for power means in Euclidean spaces. InNeurIPS, pages 21085–21098. Curran Associates, Inc.,

  9. [9]

    cc/paper/2021/hash/b035d6563a2adac9f822940c145263ce-Abstract.html

    URL:https://proceedings.neurips. cc/paper/2021/hash/b035d6563a2adac9f822940c145263ce-Abstract.html. [CSS21b] Vincent Cohen-Addad, David Saulpic, and Chris Schwiegelshohn. A new coreset frame- work for clustering. InSTOC, pages 169–182. ACM, 2021.doi:10.1145/3406325. 3451022. [Dan21] Matan Danos. Coresets for clustering by uniform sampling and general- ize...

  10. [10]

    Terminal embeddings

    12 [EFN15] Michael Elkin, Arnold Filtser, and Ofer Neiman. Terminal embeddings. InApproxi- mation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, volume 40 ofLIPIcs, pages 242–264. Schloss Dagstuhl - Leibniz- Zentrum f¨ ur Informatik, 2015.doi:10.4230/LIPICS.APPROX-RANDOM.2015.242. [FHY25] Ziyi Fang, Lingxiao Huan...

  11. [11]

    [FL11] Dan Feldman and Michael Langberg

    URL:https://openreview.net/forum? id=Yhfx12Azz1. [FL11] Dan Feldman and Michael Langberg. A unified framework for approximating and clus- tering data. InSTOC, pages 569–578. ACM, 2011.doi:10.1145/1993636.1993712. [Gor88] Yehoram Gordon. Gaussian processes and almost spherical sections of convex bodies. The Annals of Probability, 16(1):180–188,

  12. [12]

    net/forum?id=Nc1ZkRW8Vde

    URL:https://openreview. net/forum?id=Nc1ZkRW8Vde. [HJR21] Sariel Har-Peled, Mitchell Jones, and Saladi Rahul. Active-learning a convex body in low dimensions.Algorithmica, 83(6):1885–1917, 2021.doi:10.1007/ S00453-021-00807-W. [HJV19] Lingxiao Huang, Shaofeng H.-C. Jiang, and Nisheeth K. Vishnoi. Core- sets for clustering with fairness constraints. InNeur...

  13. [13]

    [HLLW25] Lingxiao Huang, Jian Li, Pinyan Lu, and Xuan Wu

    URL:https://proceedings.neurips.cc/paper/2019/hash/ 810dfbbebb17302018ae903e9cb7a483-Abstract.html. [HLLW25] Lingxiao Huang, Jian Li, Pinyan Lu, and Xuan Wu. Coresets for constrained clustering: General assignment constraints and improved size bounds. InSODA, pages 4732–4782. SIAM, 2025.doi:10.1137/1.9781611978322.161. [HLW24] Lingxiao Huang, Jian Li, and...

  14. [14]

    cc/paper/2021/hash/c115ba9e04ab27fbbb664f932112246d-Abstract.html

    URL:https://proceedings.neurips. cc/paper/2021/hash/c115ba9e04ab27fbbb664f932112246d-Abstract.html. [HV20] Lingxiao Huang and Nisheeth K. Vishnoi. Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal. InSTOC, pages 1416–1429. ACM, 2020.doi: 10.1145/3357713.3384296. [JKLZ24] Shaofeng H.-C. Jiang, Robert Krauthgamer, Jianing Lo...

  15. [15]

    Long, and Aravind Srinivasan

    [LLS01] Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the sample complexity of learning.J. Comput. Syst. Sci., 62(3):516–527, 2001.doi:10.1006/ jcss.2000.1741. [MS18] Alexander Munteanu and Chris Schwiegelshohn. Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms.K¨ unstliche Intell.,...

  16. [16]

    [NJ23] Andreas Nienk¨ otter and Xiaoyi Jiang

    URL:http://proceedings.mlr.press/v54/newling17a.html. [NJ23] Andreas Nienk¨ otter and Xiaoyi Jiang. Kernel-based generalized median computation for consensus learning.IEEE Trans. Pattern Anal. Mach. Intell., 45(5):5872–5888, 2023.doi:10.1109/TPAMI.2022.3202565. [PPP21] Aditya Parulekar, Advait Parulekar, and Eric Price. L1 regression with lewis weights su...

  17. [17]

    A remark concerning the dependence onϵin Dvoretzky’s theorem

    [Sch06] Gideon Schechtman. A remark concerning the dependence onϵin Dvoretzky’s theorem. InGeometric Aspects of Functional Analysis: Israel Seminar (GAFA) 1987–88, pages 274–277. Springer,

  18. [18]

    Isometric embedding of finite ultrametric spaces in banach spaces.Topol- ogy and its Applications, 142(1):13–17, 2004.doi:10.1016/j.topol.2003.12.002

    [Shk04] S.A Shkarin. Isometric embedding of finite ultrametric spaces in banach spaces.Topol- ogy and its Applications, 142(1):13–17, 2004.doi:10.1016/j.topol.2003.12.002. 14 [SSS19] Melanie Schmidt, Chris Schwiegelshohn, and Christian Sohler. Fair coresets and streaming algorithms for fair k-means. InWAOA, Lecture Notes in Computer Sci- ence, pages 232–2...