Forest Density Estimation

arxiv: 1001.1557 · v2 · pith:KW24WUOAnew · submitted 2010-01-10 · 📊 stat.ML

Forest Density Estimation

Han Liu , Min Xu , Haijie Gu , Anupam Gupta , John Lafferty , Larry Wasserman This is my paper

classification 📊 stat.ML

keywords forestdensityestimationdataproverisktreealgorithm

0 comments p. Extension

pith:KW24WUOA Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{KW24WUOA}

Prints a linked pith:KW24WUOA badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.

This paper has not been read by Pith yet.

Forest Density Estimation

discussion (0)