On the power of linear programming for K-means clustering

Aida Khajavirad; Antonio De Rosa; Yakun Wang

arxiv: 2402.01061 · v3 · submitted 2024-02-01 · 🧮 math.OC

On the power of linear programming for K-means clustering

Antonio De Rosa , Aida Khajavirad , Yakun Wang This is my paper

Pith reviewed 2026-05-24 04:13 UTC · model grok-4.3

classification 🧮 math.OC

keywords K-means clusteringlinear programming relaxationtightnesscutting plane algorithmcluster recoverystochastic modelinteger programming

0 comments

The pith

A linear programming relaxation for K-means clustering returns exact solutions on nearly all tested instances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies an existing LP relaxation of the nonconvex K-means problem and tests its behavior on synthetic and real data sets. Experiments indicate that the LP optimum satisfies the original integer constraints in almost every case examined. For the special case of two clusters the authors derive explicit sufficient conditions that force the relaxation to be tight, and they prove that these conditions hold with high probability under a standard stochastic data model. They also exhibit a family of inputs where the relaxation is never tight and describe a cutting-plane procedure paired with a first-order GPU solver that handles instances up to four thousand points.

Core claim

The LP relaxation introduced in prior work is tight for K-means clustering under sufficient conditions when K equals two, recovers the correct clusters under a popular stochastic model, and is observed to be tight on the great majority of both synthetic and real-world data sets; the same relaxation can be solved at practical scale by a cutting-plane algorithm that uses the PDLP solver.

What carries the argument

The linear-programming relaxation containing Omega(n cubed) inequalities, whose optimal solution is feasible for the original K-means problem under the derived conditions.

If this is right

When the input meets the two-cluster sufficient conditions the LP solves the K-means problem exactly.
Under the stochastic model the LP recovers the planted clusters with high probability.
A cutting-plane algorithm with the PDLP solver computes the relaxation for up to four thousand points.
There exist explicit families of inputs for which the relaxation is never tight.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If similar tightness patterns appear for K greater than two, exact K-means could become practical without combinatorial search.
The counterexample family indicates that any general tightness proof must exclude certain geometric configurations of the point set.
The same relaxation technique could be tested on related partitioning problems such as balanced graph cuts.

Load-bearing premise

The data points satisfy the sufficient conditions that make the LP solution feasible for the integer K-means formulation.

What would settle it

A data set of moderate size on which the LP optimum violates at least one of the original K-means assignment constraints.

Figures

Figures reproduced from arXiv: 2402.01061 by Aida Khajavirad, Antonio De Rosa, Yakun Wang.

**Figure 2.** Figure 2: Comparing the recovery threshold ∆0 of the LP relaxation versus the SDP relaxation for K-means clustering when the input is generated according to the SBM. 3 A Counter example for the tightness of the LP relaxation As we discussed in the previous section, in all our experiments with both synthetic and real-world data sets, Problem (LP2) is tight; i.e., its optimal solution is a partition matrix. In particu… view at source ↗

**Figure 3.** Figure 3: Comparing the strength of the LP relaxation versus [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

read the original abstract

In [SIAM J. Optim., 2022], the authors introduced a new linear programming (LP) relaxation for K-means clustering. In this paper, we further investigate both theoretical and computational properties of this relaxation. As evident from our numerical experiments with both synthetic real-world data sets, the proposed LP relaxation is almost always tight; i.e. its optimal solution is feasible for the original nonconvex problem. To better understand this unexpected behaviour, on the theoretical side, we focus on K-means clustering with two clusters, and we obtain sufficient conditions under which the LP relaxation is tight. We further analyze the sufficient conditions when the input is generated according to a popular stochastic model and obtain recovery guarantees for the LP relaxation. We conclude our theoretical study by constructing a family of inputs for which the LP relaxation is never tight. Denoting by $n$ the number of data points to be clustered, the LP relaxation contains $\Omega(n^3)$ inequalities making it impractical for large data sets. To address the scalability issue, by building upon a cutting-plane algorithm together with the GPU implementation of PDLP, a first-order method LP solver, we develop an efficient algorithm that solves the proposed LP and hence the K-means clustering problem, for up to $n \leq 4000$ data points.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends the 2022 LP relaxation with new K=2 tightness conditions, stochastic recovery results, explicit counterexamples, and a cutting-plane GPU solver up to n=4000, but the 'almost always tight' claim rests on experiments without full data details or public artifacts.

read the letter

The punchline is that the paper delivers concrete new theory for when the LP is tight on two clusters and a practical way to solve it at moderate scale, while also showing cases where it fails. These are real additions to the 2022 relaxation rather than minor tweaks. The sufficient conditions and recovery guarantees under the stochastic model are derived directly from the formulation, and the counterexample family is explicit. The solver work using cutting planes plus PDLP on GPU is a useful engineering step that moves the cubic-size barrier from theoretical to something that runs on n up to 4000. That part is worth crediting because it addresses the obvious practicality issue head-on. The theoretical sections read cleanly and stay within what the math supports. The experiments claim near-universal tightness on synthetic and real sets, yet the data selection rules, exact instances, and variance measures are not reported in enough detail to let a reader reproduce or stress-test the claim independently. No code or data release is mentioned, which limits how far the empirical observation can travel. The work stays focused on K=2 for the new guarantees, so readers should not expect the same level of insight for larger K. This paper is aimed at researchers working on convex relaxations for clustering or exact methods for moderate-sized instances. Someone building or comparing LP-based clustering tools would get direct value from the conditions and the solver implementation. It deserves a serious referee because the new derivations and the counterexamples are grounded, the scalability fix is verifiable in principle, and the stated limitations match the evidence presented. The empirical transparency could be improved in revision, but that does not make the core contributions unsound. Recommendation: send it to peer review.

Referee Report

1 major / 2 minor

Summary. The manuscript examines the LP relaxation for K-means clustering introduced in the 2022 SIAM J. Optim. paper. For K=2 it derives sufficient conditions guaranteeing tightness of the relaxation, obtains recovery guarantees when data follows a standard stochastic model, and constructs an explicit family of instances where the relaxation is never tight. It further develops a cutting-plane algorithm paired with a GPU implementation of the PDLP first-order solver to handle the Ω(n³) inequality formulation up to n=4000. Numerical experiments on synthetic and real-world data are presented to support the claim that the relaxation is almost always tight on the tested instances.

Significance. If the reported empirical tightness generalizes, the work shows that an LP relaxation can recover exact K-means solutions for moderate n, backed by explicit sufficient conditions and recovery results for the two-cluster case. The scalable solver directly addresses the cubic-size barrier identified in the formulation, and the construction of counterexamples clarifies the limits of the approach. These elements together provide both theoretical insight and a practical computational tool for the K-means problem.

major comments (1)

[Numerical experiments / abstract] Numerical experiments (as summarized in the abstract and described in the experimental section): the central claim that the LP relaxation is 'almost always tight' is supported solely by experiments on unspecified synthetic and real-world data sets up to n=4000. No details are given on the precise collection of instances, exclusion rules, number of random trials, or statistical measures such as error bars, which makes it difficult to assess the robustness of the 'almost always' assertion that underpins the practical significance of the relaxation.

minor comments (2)

[Abstract] Abstract: the phrase 'synthetic real-world data sets' is missing the conjunction 'and'.
[Scalability / algorithmic section] The manuscript correctly identifies the Ω(n³) inequality count as the scalability bottleneck and addresses it via cutting planes; however, the description of the cutting-plane implementation could benefit from a brief pseudocode outline or reference to the specific separation routine used.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestion regarding the experimental section. We agree that additional transparency will strengthen the manuscript and will revise accordingly.

read point-by-point responses

Referee: [Numerical experiments / abstract] Numerical experiments (as summarized in the abstract and described in the experimental section): the central claim that the LP relaxation is 'almost always tight' is supported solely by experiments on unspecified synthetic and real-world data sets up to n=4000. No details are given on the precise collection of instances, exclusion rules, number of random trials, or statistical measures such as error bars, which makes it difficult to assess the robustness of the 'almost always' assertion that underpins the practical significance of the relaxation.

Authors: We agree with the referee that the current description of the numerical experiments lacks sufficient detail. In the revised manuscript we will expand Section 5 (and the abstract if needed) to specify: (i) the exact generation procedure and parameter ranges for all synthetic instances, (ii) the sources and preprocessing steps for the real-world data sets, (iii) any exclusion or filtering rules applied, (iv) the total number of random trials performed for each experiment, and (v) statistical measures (means, standard deviations, or error bars) for the reported tightness frequencies. These additions will allow readers to evaluate the robustness of the empirical claim. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to 2022 LP formulation; all new results derived independently

full rationale

The paper cites its own prior SIAM J. Optim. 2022 work solely to define the LP relaxation being analyzed. All load-bearing contributions—sufficient conditions for tightness when K=2, recovery guarantees under the stochastic model, explicit counterexample family, and the empirical 'almost always tight' observation—are obtained from first-principles derivations or direct numerical experiments on the given instances. No new prediction reduces to a fitted parameter, self-defined quantity, or unverified self-citation chain; the central claims remain externally falsifiable via the stated conditions and tested data sets up to n=4000.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard LP duality and polyhedral theory plus the specific formulation from the 2022 reference; no new free parameters are introduced, no invented entities are postulated, and no ad-hoc axioms beyond convexity of the relaxation are required.

axioms (1)

domain assumption The LP relaxation polytope contains the convex hull of feasible K-means assignments.
Invoked when claiming that an optimal LP vertex yields a feasible clustering solution.

pith-pipeline@v0.9.0 · 5763 in / 1413 out tokens · 20888 ms · 2026-05-24T04:13:31.866622+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Accessible Complexity Bounds for Restarted PDHG on Linear Programs with a Unique Optimizer
math.OC 2024-10 unverdicted novelty 7.0

Derives accessible O(κΦ ln(κΦ ||w*||/ε)) iteration bound for rPDHG on unique-optima LPs, with computable Φ, two-stage performance, and equivalence to stability and sharpness.
From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs
cs.LG 2026-05 unverdicted novelty 6.0

A modular CPU-GPU batching framework for branch-and-bound delivers 10-100x speedups with zero optimality gap when certifying optimal cardinality-constrained GLMs.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 2 Pith papers

[1]

http://docs.mosek.com/9.0/faq.pdf

MOSEK 9.2 , 2019. http://docs.mosek.com/9.0/faq.pdf

work page 2019
[2]

Abdalla and A

P. Abdalla and A. S. Bandeira. Community detection with a subsampled semideﬁnite program. Sampling Theory, Signal Processing, and Data Analysis , 20(1):6, 2022

work page 2022
[3]

Aloise, A

D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hard ness of Euclidean sum-of-squares clustering. Machine Learning, 75:245–248, 2009

work page 2009
[4]

Applegate, M

D. Applegate, M. D ´ ıaz, O. Hinder, H. Lu, M. Lubin, B. O’Do noghue, and W. Schudy. Practical large-scale linear programming using primal-dual hybrid gradient. Advances in Neural Information Processing Systems , 34:20243–20257, 2021

work page 2021
[5]

Awasthi, A

P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy , S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, 165:191–200, 2015

work page 2015
[6]

Czyzyk, M

J. Czyzyk, M. P. Mesnier, and J. J. More. The NEOS server. IEEE Journal on Computational Science and Engineering, 5(3):68 –75, 1998

work page 1998
[7]

Dagum and R

L. Dagum and R. Menon. OpenMP: an industry standard api fo r shared-memory programming. IEEE compu- tational science and engineering , 5(1):46–55, 1998

work page 1998
[8]

De Rosa and A

A. De Rosa and A. Khajavirad. The ratio-cut polytope and K -means clustering. SIAM Journal on Optimization , 32(1):173–203, 2022

work page 2022
[9]

D. Dua, C. Graﬀ, et al. UCI machine learning repository. URL http://archive. ics. uci. edu/ml , 2017

work page 2017
[10]

Friggstad, M

Z. Friggstad, M. Rezapour, and Salavatipour M. R. Local search yields a PTAS for K-means in doubling metrics. SIAM Journal on Computing , 48(2):452–480, 2019

work page 2019
[11]

Gurobi Optimizer Reference Manual, 2021

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2021. URL: https://www.gurobi.com

work page 2021
[12]

Iguchi, D

T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Proba bly certiﬁably correct K-means clustering. Mathematical Programming, 165:605–642, 2017

work page 2017
[13]

Kanungo, D.M

T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu. A local search approxi- mation algorithm for K-means clustering. Proceedings of the 18th Annual ACM Symposium on Computation al Geometry, pages 10–18, 2002

work page 2002
[14]

Khajavirad and N

A. Khajavirad and N. V. Sahinidis. A hybrid LP/NLP parad igm for global optimization relaxations. Mathe- matical Programming Computation , 10:383–421, 2018. doi:10.1007/s12532-018-0138-5

work page doi:10.1007/s12532-018-0138-5 2018
[15]

X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do bird s of a feather ﬂock together? K-means, proximity, and conic programming. Mathematical Programming, 179:295–341, 2020

work page 2020
[16]

S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory , 28(2):129 –137, 1982

work page 1982
[17]

Lu and J

H. Lu and J. Yang. cuPDLP.jl: A GPU implementation of res tarted primal-dual hybrid gradient for linear programming in julia. arXiv preprint arXiv:2311.12180 , 2023

work page arXiv 2023
[18]

H. Lu, J. Yang, H. Hu, Q. Huangfu, J. Liu, T. Liu, Y. Ye, C. Z hang, and D. Ge. cuPDLP-C: A strengthened implementation of cupdlp for linear programming by c langua ge. arXiv preprint arXiv:2312.14832 , 2023

work page arXiv 2023
[19]

Mahajan, P

M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The plan ar K-means problem is NP-hard. In WALCOM: Algorithms and Computation , pages 274–285. Springer Berlin Heidelberg, 2009

work page 2009
[20]

Mangasarian

O.L. Mangasarian. Uniqueness of solution in linear pro gramming. Linear Algebra and its Applications , 25:151– 162, 1979

work page 1979
[21]

Marzi, F

F. Marzi, F. Rossi, and S. Smriglio. Computational stud y of separation algorithms for clique inequalities. Soft Computing, 23(9):3013–3027, 2019

work page 2019
[22]

D. G. Mixon, S. Villar, and R. Ward. Clustering subgauss ian mixtures by semideﬁnite programming. Information and Inference: A Journal of the IMA , 6(4):389–415, 2017

work page 2017
[23]

D. G. Mixon and K. Xie. Sketching semideﬁnite programs f or faster clustering. IEEE Transactions on Informa- tion Theory , 67(10):6832–6840, 2021

work page 2021
[24]

Neumaier and O

A. Neumaier and O. Shcherbina. Safe bounds in linear and mixed-integer linear programming. Mathematical Programming, 99:283–296, 2004. 35

work page 2004
[25]

Approximating K-means-type clusterin g via semideﬁnite programming

J Peng and Y Wei. Approximating K-means-type clusterin g via semideﬁnite programming. SIAM Journal on Optimization, 18(1):186–205, 2007

work page 2007
[26]

Peng and Y

J. Peng and Y. Xia. A New Theoretical Framework for K-Means-Type Clustering , pages 79–96. Springer Berlin Heidelberg, 2005

work page 2005
[27]

Piccialli, A

V. Piccialli, A. M. Sudoso, and A. Wiegele. SOS-SDP: An e xact solver for minimum sum-of-squares clustering. INFORMS Journal on Computing , 34(4):2144–2162, 2022

work page 2022
[28]

M. N. Prasad and G. A. Hanasusanto. Improved conic refor mulations for K-means clustering. SIAM Journal on Optimization , 28(4):3105–3126, 2018

work page 2018
[29]

Vershynin

R. Vershynin. High-dimensional probability: An introduction with appli cations in data science , volume 47. Cambridge University Press, 2018. 36

work page 2018

[1] [1]

http://docs.mosek.com/9.0/faq.pdf

MOSEK 9.2 , 2019. http://docs.mosek.com/9.0/faq.pdf

work page 2019

[2] [2]

Abdalla and A

P. Abdalla and A. S. Bandeira. Community detection with a subsampled semideﬁnite program. Sampling Theory, Signal Processing, and Data Analysis , 20(1):6, 2022

work page 2022

[3] [3]

Aloise, A

D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hard ness of Euclidean sum-of-squares clustering. Machine Learning, 75:245–248, 2009

work page 2009

[4] [4]

Applegate, M

D. Applegate, M. D ´ ıaz, O. Hinder, H. Lu, M. Lubin, B. O’Do noghue, and W. Schudy. Practical large-scale linear programming using primal-dual hybrid gradient. Advances in Neural Information Processing Systems , 34:20243–20257, 2021

work page 2021

[5] [5]

Awasthi, A

P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy , S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, 165:191–200, 2015

work page 2015

[6] [6]

Czyzyk, M

J. Czyzyk, M. P. Mesnier, and J. J. More. The NEOS server. IEEE Journal on Computational Science and Engineering, 5(3):68 –75, 1998

work page 1998

[7] [7]

Dagum and R

L. Dagum and R. Menon. OpenMP: an industry standard api fo r shared-memory programming. IEEE compu- tational science and engineering , 5(1):46–55, 1998

work page 1998

[8] [8]

De Rosa and A

A. De Rosa and A. Khajavirad. The ratio-cut polytope and K -means clustering. SIAM Journal on Optimization , 32(1):173–203, 2022

work page 2022

[9] [9]

D. Dua, C. Graﬀ, et al. UCI machine learning repository. URL http://archive. ics. uci. edu/ml , 2017

work page 2017

[10] [10]

Friggstad, M

Z. Friggstad, M. Rezapour, and Salavatipour M. R. Local search yields a PTAS for K-means in doubling metrics. SIAM Journal on Computing , 48(2):452–480, 2019

work page 2019

[11] [11]

Gurobi Optimizer Reference Manual, 2021

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2021. URL: https://www.gurobi.com

work page 2021

[12] [12]

Iguchi, D

T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Proba bly certiﬁably correct K-means clustering. Mathematical Programming, 165:605–642, 2017

work page 2017

[13] [13]

Kanungo, D.M

T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu. A local search approxi- mation algorithm for K-means clustering. Proceedings of the 18th Annual ACM Symposium on Computation al Geometry, pages 10–18, 2002

work page 2002

[14] [14]

Khajavirad and N

A. Khajavirad and N. V. Sahinidis. A hybrid LP/NLP parad igm for global optimization relaxations. Mathe- matical Programming Computation , 10:383–421, 2018. doi:10.1007/s12532-018-0138-5

work page doi:10.1007/s12532-018-0138-5 2018

[15] [15]

X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do bird s of a feather ﬂock together? K-means, proximity, and conic programming. Mathematical Programming, 179:295–341, 2020

work page 2020

[16] [16]

S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory , 28(2):129 –137, 1982

work page 1982

[17] [17]

Lu and J

H. Lu and J. Yang. cuPDLP.jl: A GPU implementation of res tarted primal-dual hybrid gradient for linear programming in julia. arXiv preprint arXiv:2311.12180 , 2023

work page arXiv 2023

[18] [18]

H. Lu, J. Yang, H. Hu, Q. Huangfu, J. Liu, T. Liu, Y. Ye, C. Z hang, and D. Ge. cuPDLP-C: A strengthened implementation of cupdlp for linear programming by c langua ge. arXiv preprint arXiv:2312.14832 , 2023

work page arXiv 2023

[19] [19]

Mahajan, P

M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The plan ar K-means problem is NP-hard. In WALCOM: Algorithms and Computation , pages 274–285. Springer Berlin Heidelberg, 2009

work page 2009

[20] [20]

Mangasarian

O.L. Mangasarian. Uniqueness of solution in linear pro gramming. Linear Algebra and its Applications , 25:151– 162, 1979

work page 1979

[21] [21]

Marzi, F

F. Marzi, F. Rossi, and S. Smriglio. Computational stud y of separation algorithms for clique inequalities. Soft Computing, 23(9):3013–3027, 2019

work page 2019

[22] [22]

D. G. Mixon, S. Villar, and R. Ward. Clustering subgauss ian mixtures by semideﬁnite programming. Information and Inference: A Journal of the IMA , 6(4):389–415, 2017

work page 2017

[23] [23]

D. G. Mixon and K. Xie. Sketching semideﬁnite programs f or faster clustering. IEEE Transactions on Informa- tion Theory , 67(10):6832–6840, 2021

work page 2021

[24] [24]

Neumaier and O

A. Neumaier and O. Shcherbina. Safe bounds in linear and mixed-integer linear programming. Mathematical Programming, 99:283–296, 2004. 35

work page 2004

[25] [25]

Approximating K-means-type clusterin g via semideﬁnite programming

J Peng and Y Wei. Approximating K-means-type clusterin g via semideﬁnite programming. SIAM Journal on Optimization, 18(1):186–205, 2007

work page 2007

[26] [26]

Peng and Y

J. Peng and Y. Xia. A New Theoretical Framework for K-Means-Type Clustering , pages 79–96. Springer Berlin Heidelberg, 2005

work page 2005

[27] [27]

Piccialli, A

V. Piccialli, A. M. Sudoso, and A. Wiegele. SOS-SDP: An e xact solver for minimum sum-of-squares clustering. INFORMS Journal on Computing , 34(4):2144–2162, 2022

work page 2022

[28] [28]

M. N. Prasad and G. A. Hanasusanto. Improved conic refor mulations for K-means clustering. SIAM Journal on Optimization , 28(4):3105–3126, 2018

work page 2018

[29] [29]

Vershynin

R. Vershynin. High-dimensional probability: An introduction with appli cations in data science , volume 47. Cambridge University Press, 2018. 36

work page 2018