Neural Estimation of Statistical Divergences

Sreejith Sreekumar; Ziv Goldfeld

arxiv: 2110.03652 · v4 · pith:LYPV5Q4Rnew · submitted 2021-10-07 · 🧮 math.ST · stat.ML· stat.TH

Neural Estimation of Statistical Divergences

Sreejith Sreekumar , Ziv Goldfeld This is my paper

classification 🧮 math.ST stat.MLstat.TH

keywords divergencesneuralempiricalerrorstatisticalapproximationboundsdistributions

0 comments

read the original abstract

Statistical divergences (SDs), which quantify the dissimilarity between probability distributions, are a basic constituent of statistical inference and machine learning. A modern method for estimating those divergences relies on parametrizing an empirical variational form by a neural network (NN) and optimizing over parameter space. Such neural estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. We establish non-asymptotic absolute error bounds for a neural estimator realized by a shallow NN, focusing on four popular $\mathsf{f}$-divergences -- Kullback-Leibler, chi-squared, squared Hellinger, and total variation. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory to bound the two sources of error involved: function approximation and empirical estimation. The bounds characterize the effective error in terms of NN size and the number of samples, and reveal scaling rates that ensure consistency. For compactly supported distributions, we further show that neural estimators of the first three divergences above with appropriate NN growth-rate are minimax rate-optimal, achieving the parametric convergence rate.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Blind Recovery of Latent Domains via Unsupervised Symmetry Discovery
cs.LG 2026-06 unverdicted novelty 6.0

Unsupervised symmetry discovery via shallow group-convolutional networks recovers latent domains from linear measurements of random fields by learning symmetry actions under stationarity and locality constraints.