pith. sign in

arxiv: 2402.01061 · v3 · submitted 2024-02-01 · 🧮 math.OC

On the power of linear programming for K-means clustering

Pith reviewed 2026-05-24 04:13 UTC · model grok-4.3

classification 🧮 math.OC
keywords K-means clusteringlinear programming relaxationtightnesscutting plane algorithmcluster recoverystochastic modelinteger programming
0
0 comments X

The pith

A linear programming relaxation for K-means clustering returns exact solutions on nearly all tested instances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies an existing LP relaxation of the nonconvex K-means problem and tests its behavior on synthetic and real data sets. Experiments indicate that the LP optimum satisfies the original integer constraints in almost every case examined. For the special case of two clusters the authors derive explicit sufficient conditions that force the relaxation to be tight, and they prove that these conditions hold with high probability under a standard stochastic data model. They also exhibit a family of inputs where the relaxation is never tight and describe a cutting-plane procedure paired with a first-order GPU solver that handles instances up to four thousand points.

Core claim

The LP relaxation introduced in prior work is tight for K-means clustering under sufficient conditions when K equals two, recovers the correct clusters under a popular stochastic model, and is observed to be tight on the great majority of both synthetic and real-world data sets; the same relaxation can be solved at practical scale by a cutting-plane algorithm that uses the PDLP solver.

What carries the argument

The linear-programming relaxation containing Omega(n cubed) inequalities, whose optimal solution is feasible for the original K-means problem under the derived conditions.

If this is right

  • When the input meets the two-cluster sufficient conditions the LP solves the K-means problem exactly.
  • Under the stochastic model the LP recovers the planted clusters with high probability.
  • A cutting-plane algorithm with the PDLP solver computes the relaxation for up to four thousand points.
  • There exist explicit families of inputs for which the relaxation is never tight.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If similar tightness patterns appear for K greater than two, exact K-means could become practical without combinatorial search.
  • The counterexample family indicates that any general tightness proof must exclude certain geometric configurations of the point set.
  • The same relaxation technique could be tested on related partitioning problems such as balanced graph cuts.

Load-bearing premise

The data points satisfy the sufficient conditions that make the LP solution feasible for the integer K-means formulation.

What would settle it

A data set of moderate size on which the LP optimum violates at least one of the original K-means assignment constraints.

Figures

Figures reproduced from arXiv: 2402.01061 by Aida Khajavirad, Antonio De Rosa, Yakun Wang.

Figure 1
Figure 1. Figure 1: The recovery rate of the LP versus the SDP when the in [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparing the recovery threshold ∆0 of the LP relaxation versus the SDP relaxation for K-means clustering when the input is generated according to the SBM. 3 A Counter example for the tightness of the LP relaxation As we discussed in the previous section, in all our experiments with both synthetic and real-world data sets, Problem (LP2) is tight; i.e., its optimal solution is a partition matrix. In particu… view at source ↗
Figure 3
Figure 3. Figure 3: Comparing the strength of the LP relaxation versus [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
read the original abstract

In [SIAM J. Optim., 2022], the authors introduced a new linear programming (LP) relaxation for K-means clustering. In this paper, we further investigate both theoretical and computational properties of this relaxation. As evident from our numerical experiments with both synthetic real-world data sets, the proposed LP relaxation is almost always tight; i.e. its optimal solution is feasible for the original nonconvex problem. To better understand this unexpected behaviour, on the theoretical side, we focus on K-means clustering with two clusters, and we obtain sufficient conditions under which the LP relaxation is tight. We further analyze the sufficient conditions when the input is generated according to a popular stochastic model and obtain recovery guarantees for the LP relaxation. We conclude our theoretical study by constructing a family of inputs for which the LP relaxation is never tight. Denoting by $n$ the number of data points to be clustered, the LP relaxation contains $\Omega(n^3)$ inequalities making it impractical for large data sets. To address the scalability issue, by building upon a cutting-plane algorithm together with the GPU implementation of PDLP, a first-order method LP solver, we develop an efficient algorithm that solves the proposed LP and hence the K-means clustering problem, for up to $n \leq 4000$ data points.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript examines the LP relaxation for K-means clustering introduced in the 2022 SIAM J. Optim. paper. For K=2 it derives sufficient conditions guaranteeing tightness of the relaxation, obtains recovery guarantees when data follows a standard stochastic model, and constructs an explicit family of instances where the relaxation is never tight. It further develops a cutting-plane algorithm paired with a GPU implementation of the PDLP first-order solver to handle the Ω(n³) inequality formulation up to n=4000. Numerical experiments on synthetic and real-world data are presented to support the claim that the relaxation is almost always tight on the tested instances.

Significance. If the reported empirical tightness generalizes, the work shows that an LP relaxation can recover exact K-means solutions for moderate n, backed by explicit sufficient conditions and recovery results for the two-cluster case. The scalable solver directly addresses the cubic-size barrier identified in the formulation, and the construction of counterexamples clarifies the limits of the approach. These elements together provide both theoretical insight and a practical computational tool for the K-means problem.

major comments (1)
  1. [Numerical experiments / abstract] Numerical experiments (as summarized in the abstract and described in the experimental section): the central claim that the LP relaxation is 'almost always tight' is supported solely by experiments on unspecified synthetic and real-world data sets up to n=4000. No details are given on the precise collection of instances, exclusion rules, number of random trials, or statistical measures such as error bars, which makes it difficult to assess the robustness of the 'almost always' assertion that underpins the practical significance of the relaxation.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'synthetic real-world data sets' is missing the conjunction 'and'.
  2. [Scalability / algorithmic section] The manuscript correctly identifies the Ω(n³) inequality count as the scalability bottleneck and addresses it via cutting planes; however, the description of the cutting-plane implementation could benefit from a brief pseudocode outline or reference to the specific separation routine used.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestion regarding the experimental section. We agree that additional transparency will strengthen the manuscript and will revise accordingly.

read point-by-point responses
  1. Referee: [Numerical experiments / abstract] Numerical experiments (as summarized in the abstract and described in the experimental section): the central claim that the LP relaxation is 'almost always tight' is supported solely by experiments on unspecified synthetic and real-world data sets up to n=4000. No details are given on the precise collection of instances, exclusion rules, number of random trials, or statistical measures such as error bars, which makes it difficult to assess the robustness of the 'almost always' assertion that underpins the practical significance of the relaxation.

    Authors: We agree with the referee that the current description of the numerical experiments lacks sufficient detail. In the revised manuscript we will expand Section 5 (and the abstract if needed) to specify: (i) the exact generation procedure and parameter ranges for all synthetic instances, (ii) the sources and preprocessing steps for the real-world data sets, (iii) any exclusion or filtering rules applied, (iv) the total number of random trials performed for each experiment, and (v) statistical measures (means, standard deviations, or error bars) for the reported tightness frequencies. These additions will allow readers to evaluate the robustness of the empirical claim. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to 2022 LP formulation; all new results derived independently

full rationale

The paper cites its own prior SIAM J. Optim. 2022 work solely to define the LP relaxation being analyzed. All load-bearing contributions—sufficient conditions for tightness when K=2, recovery guarantees under the stochastic model, explicit counterexample family, and the empirical 'almost always tight' observation—are obtained from first-principles derivations or direct numerical experiments on the given instances. No new prediction reduces to a fitted parameter, self-defined quantity, or unverified self-citation chain; the central claims remain externally falsifiable via the stated conditions and tested data sets up to n=4000.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard LP duality and polyhedral theory plus the specific formulation from the 2022 reference; no new free parameters are introduced, no invented entities are postulated, and no ad-hoc axioms beyond convexity of the relaxation are required.

axioms (1)
  • domain assumption The LP relaxation polytope contains the convex hull of feasible K-means assignments.
    Invoked when claiming that an optimal LP vertex yields a feasible clustering solution.

pith-pipeline@v0.9.0 · 5763 in / 1413 out tokens · 20888 ms · 2026-05-24T04:13:31.866622+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Accessible Complexity Bounds for Restarted PDHG on Linear Programs with a Unique Optimizer

    math.OC 2024-10 unverdicted novelty 7.0

    Derives accessible O(κΦ ln(κΦ ||w*||/ε)) iteration bound for rPDHG on unique-optima LPs, with computable Φ, two-stage performance, and equivalence to stability and sharpness.

  2. From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs

    cs.LG 2026-05 unverdicted novelty 6.0

    A modular CPU-GPU batching framework for branch-and-bound delivers 10-100x speedups with zero optimality gap when certifying optimal cardinality-constrained GLMs.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 2 Pith papers

  1. [1]

    http://docs.mosek.com/9.0/faq.pdf

    MOSEK 9.2 , 2019. http://docs.mosek.com/9.0/faq.pdf

  2. [2]

    Abdalla and A

    P. Abdalla and A. S. Bandeira. Community detection with a subsampled semidefinite program. Sampling Theory, Signal Processing, and Data Analysis , 20(1):6, 2022

  3. [3]

    Aloise, A

    D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hard ness of Euclidean sum-of-squares clustering. Machine Learning, 75:245–248, 2009

  4. [4]

    Applegate, M

    D. Applegate, M. D ´ ıaz, O. Hinder, H. Lu, M. Lubin, B. O’Do noghue, and W. Schudy. Practical large-scale linear programming using primal-dual hybrid gradient. Advances in Neural Information Processing Systems , 34:20243–20257, 2021

  5. [5]

    Awasthi, A

    P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy , S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, 165:191–200, 2015

  6. [6]

    Czyzyk, M

    J. Czyzyk, M. P. Mesnier, and J. J. More. The NEOS server. IEEE Journal on Computational Science and Engineering, 5(3):68 –75, 1998

  7. [7]

    Dagum and R

    L. Dagum and R. Menon. OpenMP: an industry standard api fo r shared-memory programming. IEEE compu- tational science and engineering , 5(1):46–55, 1998

  8. [8]

    De Rosa and A

    A. De Rosa and A. Khajavirad. The ratio-cut polytope and K -means clustering. SIAM Journal on Optimization , 32(1):173–203, 2022

  9. [9]

    D. Dua, C. Graff, et al. UCI machine learning repository. URL http://archive. ics. uci. edu/ml , 2017

  10. [10]

    Friggstad, M

    Z. Friggstad, M. Rezapour, and Salavatipour M. R. Local search yields a PTAS for K-means in doubling metrics. SIAM Journal on Computing , 48(2):452–480, 2019

  11. [11]

    Gurobi Optimizer Reference Manual, 2021

    Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2021. URL: https://www.gurobi.com

  12. [12]

    Iguchi, D

    T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Proba bly certifiably correct K-means clustering. Mathematical Programming, 165:605–642, 2017

  13. [13]

    Kanungo, D.M

    T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu. A local search approxi- mation algorithm for K-means clustering. Proceedings of the 18th Annual ACM Symposium on Computation al Geometry, pages 10–18, 2002

  14. [14]

    Khajavirad and N

    A. Khajavirad and N. V. Sahinidis. A hybrid LP/NLP parad igm for global optimization relaxations. Mathe- matical Programming Computation , 10:383–421, 2018. doi:10.1007/s12532-018-0138-5

  15. [15]

    X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do bird s of a feather flock together? K-means, proximity, and conic programming. Mathematical Programming, 179:295–341, 2020

  16. [16]

    S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory , 28(2):129 –137, 1982

  17. [17]

    Lu and J

    H. Lu and J. Yang. cuPDLP.jl: A GPU implementation of res tarted primal-dual hybrid gradient for linear programming in julia. arXiv preprint arXiv:2311.12180 , 2023

  18. [18]

    H. Lu, J. Yang, H. Hu, Q. Huangfu, J. Liu, T. Liu, Y. Ye, C. Z hang, and D. Ge. cuPDLP-C: A strengthened implementation of cupdlp for linear programming by c langua ge. arXiv preprint arXiv:2312.14832 , 2023

  19. [19]

    Mahajan, P

    M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The plan ar K-means problem is NP-hard. In WALCOM: Algorithms and Computation , pages 274–285. Springer Berlin Heidelberg, 2009

  20. [20]

    Mangasarian

    O.L. Mangasarian. Uniqueness of solution in linear pro gramming. Linear Algebra and its Applications , 25:151– 162, 1979

  21. [21]

    Marzi, F

    F. Marzi, F. Rossi, and S. Smriglio. Computational stud y of separation algorithms for clique inequalities. Soft Computing, 23(9):3013–3027, 2019

  22. [22]

    D. G. Mixon, S. Villar, and R. Ward. Clustering subgauss ian mixtures by semidefinite programming. Information and Inference: A Journal of the IMA , 6(4):389–415, 2017

  23. [23]

    D. G. Mixon and K. Xie. Sketching semidefinite programs f or faster clustering. IEEE Transactions on Informa- tion Theory , 67(10):6832–6840, 2021

  24. [24]

    Neumaier and O

    A. Neumaier and O. Shcherbina. Safe bounds in linear and mixed-integer linear programming. Mathematical Programming, 99:283–296, 2004. 35

  25. [25]

    Approximating K-means-type clusterin g via semidefinite programming

    J Peng and Y Wei. Approximating K-means-type clusterin g via semidefinite programming. SIAM Journal on Optimization, 18(1):186–205, 2007

  26. [26]

    Peng and Y

    J. Peng and Y. Xia. A New Theoretical Framework for K-Means-Type Clustering , pages 79–96. Springer Berlin Heidelberg, 2005

  27. [27]

    Piccialli, A

    V. Piccialli, A. M. Sudoso, and A. Wiegele. SOS-SDP: An e xact solver for minimum sum-of-squares clustering. INFORMS Journal on Computing , 34(4):2144–2162, 2022

  28. [28]

    M. N. Prasad and G. A. Hanasusanto. Improved conic refor mulations for K-means clustering. SIAM Journal on Optimization , 28(4):3105–3126, 2018

  29. [29]

    Vershynin

    R. Vershynin. High-dimensional probability: An introduction with appli cations in data science , volume 47. Cambridge University Press, 2018. 36