pith. sign in

arxiv: 1612.03375 · v2 · pith:2YMCOWJOnew · submitted 2016-12-11 · 🧮 math.ST · stat.TH

Sample complexity of the distinct elements problem

classification 🧮 math.ST stat.TH
keywords distinctsamplecomplexityelementsestimatorfactorsproblemsize
0
0 comments X
read the original abstract

We consider the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing $ k $ balls based on $n$ samples drawn with replacements. Based on discrete polynomial approximation and interpolation, we propose an estimator with additive error guarantee that achieves the optimal sample complexity within $O(\log\log k)$ factors, and in fact within constant factors for most cases. The estimator can be computed in $O(n)$ time for an accurate estimation. The result also applies to sampling without replacement provided the sample size is a vanishing fraction of the urn size. One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.