Density Estimation in Infinite Dimensional Exponential Families

Aapo Hyv\"arinen; Arthur Gretton; Bharath Sriperumbudur; Kenji Fukumizu; Revant Kumar

arxiv: 1312.3516 · v4 · pith:PERLJ5CEnew · submitted 2013-12-12 · 🧮 math.ST · stat.ME· stat.ML· stat.TH

Density Estimation in Infinite Dimensional Exponential Families

Bharath Sriperumbudur , Kenji Fukumizu , Arthur Gretton , Aapo Hyv\"arinen , Revant Kumar This is my paper

classification 🧮 math.ST stat.MEstat.MLstat.TH

keywords mathcalbetaestimatordivergencedensityproposedvertconvergence

0 comments

read the original abstract

In this paper, we consider an infinite dimensional exponential family, $\mathcal{P}$ of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space, $H$ and show it to be quite rich in the sense that a broad class of densities on $\mathbb{R}^d$ can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in $\mathcal{P}$. The main goal of the paper is to estimate an unknown density, $p_0$ through an element in $\mathcal{P}$. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between $p_0$ and $\mathcal{P}$, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. Instead, we propose an estimator, $\hat{p}_n$ based on minimizing the \emph{Fisher divergence}, $J(p_0\Vert p)$ between $p_0$ and $p\in \mathcal{P}$, which involves solving a simple finite-dimensional linear system. When $p_0\in\mathcal{P}$, we show that the proposed estimator is consistent, and provide a convergence rate of $n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}$ in Fisher divergence under the smoothness assumption that $\log p_0\in\mathcal{R}(C^\beta)$ for some $\beta\ge 0$, where $C$ is a certain Hilbert-Schmidt operator on $H$ and $\mathcal{R}(C^\beta)$ denotes the image of $C^\beta$. We also investigate the misspecified case of $p_0\notin\mathcal{P}$ and show that $J(p_0\Vert\hat{p}_n)\rightarrow \inf_{p\in\mathcal{P}}J(p_0\Vert p)$ as $n\rightarrow\infty$, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage with the proposed estimator grows as $d$ increases.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On Model-Based Clustering With Entropic Optimal Transport
stat.ME 2026-05 unverdicted novelty 6.0

Entropic optimal transport yields a clustering loss with the same global optimum as log-likelihood but a better-behaved optimization surface, outperforming standard EM in experiments.