pith. machine review for the scientific record. sign in

arxiv: 1203.5829 · v3 · pith:KTEZPEEVnew · submitted 2012-03-26 · 🧮 math.ST · stat.ME· stat.TH

Ensemble estimators for multivariate entropy estimation

classification 🧮 math.ST stat.MEstat.TH
keywords estimatorsdensityentropydimensionrateweightedensembleestimating
0
0 comments X
read the original abstract

The problem of estimation of density functionals like entropy and mutual information has received much attention in the statistics and information theory communities. A large class of estimators of functionals of the probability density suffer from the curse of dimensionality, wherein the mean squared error (MSE) decays increasingly slowly as a function of the sample size $T$ as the dimension $d$ of the samples increases. In particular, the rate is often glacially slow of order $O(T^{-{\gamma}/{d}})$, where $\gamma>0$ is a rate parameter. Examples of such estimators include kernel density estimators, $k$-nearest neighbor ($k$-NN) density estimators, $k$-NN entropy estimators, intrinsic dimension estimators and other examples. In this paper, we propose a weighted affine combination of an ensemble of such estimators, where optimal weights can be chosen such that the weighted estimator converges at a much faster dimension invariant rate of $O(T^{-1})$. Furthermore, we show that these optimal weights can be determined by solving a convex optimization problem which can be performed offline and does not require training data. We illustrate the superior performance of our weighted estimator for two important applications: (i) estimating the Panter-Dite distortion-rate factor and (ii) estimating the Shannon entropy for testing the probability distribution of a random sample.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.