pith. sign in

arxiv: 1504.01227 · v2 · pith:BZC7KODInew · submitted 2015-04-06 · 🧮 math.ST · stat.TH

Chebyshev polynomials, moment matching, and optimal estimation of the unseen

classification 🧮 math.ST stat.TH
keywords fracepsilonsamplechebyshevcomplexityestimatorleastsize
0
0 comments X
read the original abstract

We consider the problem of estimating the support size of a discrete distribution whose minimum non-zero mass is at least $ \frac{1}{k}$. Under the independent sampling model, we show that the sample complexity, i.e., the minimal sample size to achieve an additive error of $\epsilon k$ with probability at least 0.1 is within universal constant factors of $ \frac{k}{\log k}\log^2\frac{1}{\epsilon} $, which improves the state-of-the-art result of $ \frac{k}{\epsilon^2 \log k} $ in \cite{VV13}. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in $O(n+\log^2 k)$ time and attains the sample complexity within a factor of six asymptotically. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.