pith. sign in

arxiv: 2512.08376 · v2 · pith:7KVVID73new · submitted 2025-12-09 · 💻 cs.DS · cs.IT· math.IT· math.ST· stat.ML· stat.TH

A Distribution Testing Approach to Clustering Distributions

classification 💻 cs.DS cs.ITmath.ITmath.STstat.MLstat.TH
keywords distributionsvarepsilonboundsclusteringclusterscomplexitydistributionlower
0
0 comments X
read the original abstract

We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are $\varepsilon$-far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size $n$, number of distributions $k$, size $r$ of one of the clusters, and distance $\varepsilon$. In particular, we achieve tightness with respect to $(n,k,r,\varepsilon)$ (up to an $O(\log k)$ factor) for all regimes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.