A Simple Algorithm for Clustering Discrete Distributions

Pradipta Mitra

arxiv: 2604.23512 · v1 · submitted 2026-04-26 · 💻 cs.DS

A Simple Algorithm for Clustering Discrete Distributions

Pradipta Mitra This is my paper

Pith reviewed 2026-05-08 05:40 UTC · model grok-4.3

classification 💻 cs.DS

keywords clusteringdiscrete distributionsBernoulli mixturesprojection algorithmlow-rank approximationk-meansmixture models

0 comments

The pith

A projection-based algorithm clusters mixtures of discrete distributions by applying k-means to their best low-rank approximation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a simple algorithm that projects data points onto approximate cluster centers derived from k-means on the rank-k approximation of the data matrix. This approach is rotationally invariant, unlike earlier coordinate-specific methods, and applies to both discrete Bernoulli distributions and continuous Gaussians. It succeeds when cluster centers satisfy a natural separation condition, resolving a long-standing conjecture about the existence of geometric clustering algorithms for discrete distributions. A sympathetic reader would care because it offers a unified, straightforward method for clustering high-dimensional mixture data without relying on combinatorial projections.

Core claim

The central discovery is that for mixtures of discrete distributions, projecting samples onto centers obtained from k-means on the best rank-k approximation of the data matrix correctly recovers the clusters under a separation condition on the centers. The same procedure works for Gaussian mixtures, providing a unified geometric approach.

What carries the argument

The projection onto approximate centers computed via k-means on the rank-k singular value decomposition approximation of the data matrix.

If this is right

The algorithm is rotationally invariant and does not depend on coordinate-specific combinatorial projections.
It resolves McSherry's conjecture on the existence of such geometric algorithms for discrete distributions.
The method applies equally to continuous distributions like high-dimensional Gaussians.
Clustering succeeds whenever the mixture components have sufficiently separated centers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This geometric approach could extend naturally to other mixture models where low-rank structure captures the centers.
In practice, the method might allow clustering in regimes where dimension is much larger than the number of samples.
If separation is only marginally satisfied, adaptive choice of the rank k could improve robustness.

Load-bearing premise

The cluster centers must be separated by a sufficient distance relative to the noise level so that the low-rank approximation and k-means yield projections that distinguish the components correctly.

What would settle it

A concrete counterexample where the separation condition holds yet the projections lead to incorrect clustering, or an experiment showing misclustering when the rank-k approximation distorts the centers.

Figures

Figures reproduced from arXiv: 2604.23512 by Pradipta Mitra.

**Figure 1.** Figure 1: If v ∈ Tr, the projection of v − µr on µs − µr will be closer to µr than it is to µs. Related Work. Let us discuss another example to further illustrate the dichotomy between discrete and continuous distributions in the literature. Two related papers, [6] and [1] employ a very natural linkage algorithm to perform the clean-up phase, and share much of their algorithm. However, these papers respectively addr… view at source ↗

read the original abstract

We propose a simple, projection-based algorithm for clustering mixtures of discrete (Bernoulli) distributions. Unlike previous approaches that rely on coordinate-specific ``combinatorial projections,'' our algorithm is rotationally invariant and works by projecting samples onto approximate centers obtained via a $k$-means computation on the best rank-$k$ approximation of the data matrix. This resolves a conjecture of McSherry on the existence of such geometric algorithms for discrete distributions. The same algorithm also applies to continuous distributions such as high-dimensional Gaussians, providing a unified approach across distribution types. We prove that the algorithm succeeds under a natural separation condition on the cluster centers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper gives a clean geometric algorithm for clustering Bernoulli mixtures that resolves McSherry's conjecture by using rank-k approximation plus k-means for projections instead of coordinate tricks. The same procedure is claimed to handle high-dimensional Gaussians under the same framework. That unification and the rotational invariance are the main points worth noting. Earlier methods leaned on combinatorial projections tied to specific coordinates, which this approach sidesteps by working directly with the low-rank structure of the data matrix. The algorithm itself stays simple: take the best rank-k approximation, run k-means to get rough centers, then project the samples onto those centers for the final clustering step. If the analysis goes through, this is a practical improvement for anyone who wants a method that does not depend on axis alignment. The proof relies on a separation condition between the cluster centers, which is standard for these mixture results and appears sufficient to control the error from the SVD step and the subsequent k-means. No obvious circularity or extra fitted parameters show up; everything builds from SVD and k-means, both of which have known guarantees. One soft spot is the exact tightness of the separation needed for the discrete case. Bernoulli mixtures have higher variance than Gaussians in the same dimension, so the error bounds on the projected centers might require a stronger gap than the abstract suggests. It would be useful to see how the constants compare to prior work on the same conjecture. The paper is aimed at theorists who study clustering algorithms for mixtures in high dimensions. Anyone looking for a geometric alternative to combinatorial methods will find the core idea useful. It deserves peer review because settling the conjecture with a simple, unified procedure is a concrete step forward, even if the presentation of the bounds could use tightening.

Referee Report

2 major / 3 minor

Summary. The paper proposes a simple, rotationally invariant projection-based algorithm for clustering mixtures of discrete Bernoulli distributions (and extends it to Gaussians). The algorithm computes the best rank-k approximation of the data matrix, runs k-means on the resulting low-rank matrix to obtain approximate centers, and projects samples onto these centers to assign clusters. It claims to resolve McSherry's conjecture on the existence of such geometric algorithms for discrete distributions and proves correctness under a natural separation condition on the cluster centers.

Significance. If the proof holds, the result supplies a unified geometric method that avoids coordinate-specific combinatorial projections, offering a simpler and more invariant approach to mixture clustering across discrete and continuous settings. This could streamline both theoretical analysis and practical implementation for high-dimensional data.

major comments (2)

Main theorem (proof of success under separation): the argument that the rank-k SVD approximation plus k-means yields centers accurate enough for the subsequent projection step to recover the clusters relies on error bounds that are not fully derived or quantified in the provided analysis; explicit comparison of the projection error to the separation distance is needed to confirm the guarantee holds for Bernoulli mixtures.
Algorithm section (projection onto approximate centers): the precise definition of how samples are projected onto the k-means centers obtained from the rank-k matrix, and why this step is rotationally invariant and free of coordinate bias for discrete distributions, requires a supporting lemma or explicit calculation showing invariance under orthogonal transformations.

minor comments (3)

Add pseudocode for the full algorithm, including the rank-k approximation, k-means, and projection steps, to improve reproducibility and clarity.
Include a reference to the exact statement of McSherry's conjecture being resolved, along with a brief discussion of how the separation condition compares to prior assumptions in the literature.
Clarify notation for the data matrix and its rank-k approximation at first use; ensure all parameters (e.g., k, separation threshold) are defined before appearing in the theorem statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of our manuscript and for providing constructive feedback. We address each of the major comments below and will incorporate the suggested improvements in the revised version.

read point-by-point responses

Referee: Main theorem (proof of success under separation): the argument that the rank-k SVD approximation plus k-means yields centers accurate enough for the subsequent projection step to recover the clusters relies on error bounds that are not fully derived or quantified in the provided analysis; explicit comparison of the projection error to the separation distance is needed to confirm the guarantee holds for Bernoulli mixtures.

Authors: We agree that the error bounds require more explicit quantification to fully establish the main theorem. In the revised manuscript, we will expand the proof section to derive the error bounds from the rank-k SVD approximation and the k-means clustering step in detail. We will also include an explicit comparison showing that the projection error is smaller than half the minimum separation distance between centers, thereby confirming the correctness of the clustering for Bernoulli mixtures under the stated separation condition. revision: yes
Referee: Algorithm section (projection onto approximate centers): the precise definition of how samples are projected onto the k-means centers obtained from the rank-k matrix, and why this step is rotationally invariant and free of coordinate bias for discrete distributions, requires a supporting lemma or explicit calculation showing invariance under orthogonal transformations.

Authors: Thank you for this suggestion. The projection is performed by assigning each data point to the nearest of the approximate centers obtained from k-means on the rank-k approximation. To clarify the invariance, we will add a new lemma in the algorithm section that proves the rotational invariance of the SVD-based rank-k approximation and the k-means step under orthogonal transformations. This will demonstrate that the projection step is free of coordinate bias, distinguishing it from combinatorial projection methods that depend on specific coordinates for discrete distributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a projection-based clustering algorithm relying on standard rank-k SVD approximation followed by k-means, which are external, well-established techniques not derived from the paper's own fitted parameters or definitions. The abstract and available description present the method as resolving an external conjecture of McSherry via a geometrically invariant approach under a stated separation condition on centers. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The derivation chain remains independent of its target clustering outputs and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract; the method uses established linear algebra and clustering primitives.

pith-pipeline@v0.9.0 · 5385 in / 1046 out tokens · 45819 ms · 2026-05-08T05:40:01.746545+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Achlioptas and F

D. Achlioptas and F. McSherry. On spectral learning of mixtures of distributions. InConference on Learning Theory (COLT), pages 458–469, 2005

work page 2005
[2]

Alon and N

N. Alon and N. Kahale. A spectral technique for coloring random 3-colorable graphs.SIAM J. Comput., 26(6):1733–1748, 1997

work page 1997
[3]

N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a random graph. InAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 594–598, 1998

work page 1998
[4]

R. Boppana. Eigenvalues and graph bisection: an average case analysis. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 280–285, 1987

work page 1987
[5]

Dasgupta, J

A. Dasgupta, J. Hopcroft, R. Kannan, and P. Mitra. Spectral clustering with limited indepen- dence. InAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1036–1045, 2007. 11

work page 2007
[6]

Dasgupta, J

A. Dasgupta, J. Hopcroft, and F. McSherry. Spectral analysis of random graphs with skewed degree distributions. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 602–610, 2004

work page 2004
[7]

Drineas, A

P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. InAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 291–299, 1999

work page 1999
[8]

Jain and V

K. Jain and V. Vazirani. Approximation algorithms for metric facility location and median problems using the primal-dual schema and lagrangian relaxation.J. ACM, 48(2):274–296, 2001

work page 2001
[9]

Kannan, H

R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. InConference on Learning Theory (COLT), pages 444–457, 2005

work page 2005
[10]

Kanungo, D

T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu. A local search approximation algorithm for k-means clustering.Comput. Geom., 28(2-3):89–112, 2004

work page 2004
[11]

McSherry

F. McSherry. Spectral partitioning of random graphs. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 529–537, 2001

work page 2001
[12]

Petrov.Sums of independent random variables

V. Petrov.Sums of independent random variables. Springer, 1975

work page 1975
[13]

Rudelson

M. Rudelson. Random vectors in the isotropic position.J. of Functional Analysis, 168(1):60– 72, 1999

work page 1999
[14]

Schmidt, A

J. Schmidt, A. Siegel, and A. Srinivasan. Chernoff-Hoeffding bounds for applications with limited independence.SIAM J. Discret. Math., 8(2):223–250, 1995

work page 1995
[15]

Vempala and G

S. Vempala and G. Wang. A spectral algorithm for learning mixture models.Journal of Com- puter and System Sciences, 68(4):841–860, 2004

work page 2004
[16]

V. Vu. Spectral norm of random matrices. InACM Symposium on Theory of computing (STOC), pages 619–626, 2005

work page 2005
[17]

Mitra.Clustering algorithms for random and pseudo-random structures

P. Mitra.Clustering algorithms for random and pseudo-random structures. PhD thesis, Yale University, 2008

work page 2008
[18]

S. Neumann. Bipartite stochastic block models with tiny clusters. InAdvances in Neural In- formation Processing Systems (NeurIPS), 2018

work page 2018
[19]

Seif and Y

M. Seif and Y. Chen. Clustering mixtures of discrete distributions: a note on Mitra’s algorithm. arXiv preprint arXiv:2405.19559, 2024. 12

work page arXiv 2024

[1] [1]

Achlioptas and F

D. Achlioptas and F. McSherry. On spectral learning of mixtures of distributions. InConference on Learning Theory (COLT), pages 458–469, 2005

work page 2005

[2] [2]

Alon and N

N. Alon and N. Kahale. A spectral technique for coloring random 3-colorable graphs.SIAM J. Comput., 26(6):1733–1748, 1997

work page 1997

[3] [3]

N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a random graph. InAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 594–598, 1998

work page 1998

[4] [4]

R. Boppana. Eigenvalues and graph bisection: an average case analysis. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 280–285, 1987

work page 1987

[5] [5]

Dasgupta, J

A. Dasgupta, J. Hopcroft, R. Kannan, and P. Mitra. Spectral clustering with limited indepen- dence. InAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1036–1045, 2007. 11

work page 2007

[6] [6]

Dasgupta, J

A. Dasgupta, J. Hopcroft, and F. McSherry. Spectral analysis of random graphs with skewed degree distributions. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 602–610, 2004

work page 2004

[7] [7]

Drineas, A

P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. InAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 291–299, 1999

work page 1999

[8] [8]

Jain and V

K. Jain and V. Vazirani. Approximation algorithms for metric facility location and median problems using the primal-dual schema and lagrangian relaxation.J. ACM, 48(2):274–296, 2001

work page 2001

[9] [9]

Kannan, H

R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. InConference on Learning Theory (COLT), pages 444–457, 2005

work page 2005

[10] [10]

Kanungo, D

T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu. A local search approximation algorithm for k-means clustering.Comput. Geom., 28(2-3):89–112, 2004

work page 2004

[11] [11]

McSherry

F. McSherry. Spectral partitioning of random graphs. InIEEE Symposium on Foundations of Computer Science (FOCS), pages 529–537, 2001

work page 2001

[12] [12]

Petrov.Sums of independent random variables

V. Petrov.Sums of independent random variables. Springer, 1975

work page 1975

[13] [13]

Rudelson

M. Rudelson. Random vectors in the isotropic position.J. of Functional Analysis, 168(1):60– 72, 1999

work page 1999

[14] [14]

Schmidt, A

J. Schmidt, A. Siegel, and A. Srinivasan. Chernoff-Hoeffding bounds for applications with limited independence.SIAM J. Discret. Math., 8(2):223–250, 1995

work page 1995

[15] [15]

Vempala and G

S. Vempala and G. Wang. A spectral algorithm for learning mixture models.Journal of Com- puter and System Sciences, 68(4):841–860, 2004

work page 2004

[16] [16]

V. Vu. Spectral norm of random matrices. InACM Symposium on Theory of computing (STOC), pages 619–626, 2005

work page 2005

[17] [17]

Mitra.Clustering algorithms for random and pseudo-random structures

P. Mitra.Clustering algorithms for random and pseudo-random structures. PhD thesis, Yale University, 2008

work page 2008

[18] [18]

S. Neumann. Bipartite stochastic block models with tiny clusters. InAdvances in Neural In- formation Processing Systems (NeurIPS), 2018

work page 2018

[19] [19]

Seif and Y

M. Seif and Y. Chen. Clustering mixtures of discrete distributions: a note on Mitra’s algorithm. arXiv preprint arXiv:2405.19559, 2024. 12

work page arXiv 2024