pith. sign in

arxiv: 1907.00196 · v1 · pith:HWYCFTGVnew · submitted 2019-06-29 · 🧮 math.ST · stat.TH

Statistical estimation of the Kullback-Leibler divergence

Pith reviewed 2026-05-25 12:55 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords Kullback-Leibler divergencek-nearest neighbor estimationasymptotic unbiasednessL2-consistencydifferential entropyGaussian measuresstatistical estimation
0
0 comments X

The pith

k-nearest neighbor statistics from independent samples yield asymptotically unbiased and L2-consistent estimates of Kullback-Leibler divergence under wide conditions on densities in R^d.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs estimates of the Kullback-Leibler divergence using k-nearest neighbor statistics drawn from two separate collections of i.i.d. observations. It supplies conditions that ensure these estimates become asymptotically unbiased and converge in L2 norm as the sample sizes grow. The conditions cover any pair of Gaussian measures on R^d whose covariance matrices are nondegenerate. The same techniques also produce new consistency statements for the Kozachenko-Leonenko estimators of Shannon differential entropy. Readers care because accurate, consistent estimation of divergence is a basic requirement for comparing probability distributions in high dimensions.

Core claim

Wide conditions are provided to guarantee asymptotic unbiasedness and L2-consistency of the introduced estimates of the Kullback-Leibler divergence for probability measures in R^d having densities with respect to the Lebesgue measure. These estimates are constructed by means of two independent collections of i.i.d. observations and involve the specified k-nearest neighbor statistics. In particular, the established results are valid for estimates of the Kullback-Leibler divergence between any two Gaussian measures in R^d with nondegenerate covariance matrices. As a byproduct new statements are obtained concerning the Kozachenko-Leonenko estimators of the Shannon differential entropy.

What carries the argument

k-nearest neighbor statistics built from two independent collections of i.i.d. observations

If this is right

  • The k-nearest neighbor estimates become asymptotically unbiased for the Kullback-Leibler divergence.
  • The estimates converge in L2 to the true divergence value.
  • The consistency statements apply to the divergence between any two nondegenerate Gaussian measures on R^d.
  • New asymptotic unbiasedness and consistency results hold for the Kozachenko-Leonenko estimators of differential entropy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same nearest-neighbor construction may extend to estimation of other f-divergences provided the requisite density assumptions hold.
  • Practical performance in moderate dimensions could be checked by direct Monte Carlo comparison against closed-form KL values for Gaussians.
  • The independence requirement between the two samples could be relaxed in future work while preserving the consistency claims.

Load-bearing premise

The probability measures possess densities with respect to Lebesgue measure on R^d and the observations come from two independent i.i.d. collections.

What would settle it

A pair of densities on R^d together with explicit sequences of sample sizes for which the k-nearest neighbor KL estimator fails to converge to the true value in L2 or fails to be asymptotically unbiased.

read the original abstract

Wide conditions are provided to guarantee asymptotic unbiasedness and L^2-consistency of the introduced estimates of the Kullback-Leibler divergence for probability measures in R^d having densities w.r.t. the Lebesgue measure. These estimates are constructed by means of two independent collections of i.i.d. observations and involve the specified k-nearest neighbor statistics. In particular, the established results are valid for estimates of the Kullback-Leibler divergence between any two Gaussian measures in R^d with nondegenerate covariance matrices. As a byproduct we obtain new statements concerning the Kozachenko-Leonenko estimators of the Shannon differential entropy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper constructs k-nearest-neighbor estimators of the Kullback-Leibler divergence between two probability measures on R^d that admit densities with respect to Lebesgue measure, using two independent i.i.d. samples. It claims to supply wide conditions guaranteeing asymptotic unbiasedness and L^2-consistency of these estimators; the results are asserted to cover any pair of non-degenerate Gaussians in arbitrary dimension. As a byproduct, new consistency statements are derived for the Kozachenko-Leonenko entropy estimator.

Significance. If the stated conditions are indeed broad and the proofs are correct, the work supplies rigorous justification for a class of computationally attractive nonparametric estimators that are already used in practice. The explicit inclusion of the Gaussian case in every dimension and the new entropy results constitute concrete, usable advances in the theory of nearest-neighbor divergence estimation.

minor comments (3)
  1. The abstract asserts the existence of 'wide conditions' without naming them or sketching the proof strategy; while the full manuscript presumably supplies both, a brief indication in the abstract would improve readability.
  2. Notation for the two sample sizes (n and m) and the neighbor order k should be introduced once at the beginning of Section 2 and used consistently thereafter.
  3. The statement that the results hold for 'any two Gaussian measures with nondegenerate covariance matrices' would benefit from an explicit reference to the relevant theorem number.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the paper, the recognition of its significance for nearest-neighbor divergence estimation, and the recommendation of minor revision. No major comments appear in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper establishes asymptotic unbiasedness and L^2-consistency for k-NN KL divergence estimators directly from standard assumptions (i.i.d. samples, densities w.r.t. Lebesgue measure on R^d) via probabilistic analysis, including explicit coverage of nondegenerate Gaussians. No parameters are fitted to data and then relabeled as predictions, no self-citations bear the central load, and no ansatz or uniqueness claim reduces the result to its inputs by construction. The byproduct consistency statements for Kozachenko-Leonenko entropy estimators follow from the same direct arguments without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on standard mathematical assumptions for i.i.d. sampling and the existence of Lebesgue densities; no free parameters, ad-hoc axioms, or invented entities are mentioned.

axioms (2)
  • domain assumption Probability measures on R^d possess densities with respect to Lebesgue measure
    Explicitly required for the k-NN statistics and consistency statements to be defined (abstract).
  • domain assumption Two independent collections of i.i.d. observations are available
    Stated as the data source for the estimators (abstract).

pith-pipeline@v0.9.0 · 5621 in / 1256 out tokens · 132228 ms · 2026-05-25T12:55:48.780192+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

  1. [1]

    Alonso-Ruiz, P., Spodarev, E. (2016). Entropy-based inhomogeneity detection in fiber materials . Methodol. Comput. Appl. Probab. Published online: 27 November 2017, doi.org/10.1007/s11009-017- 9603-2

  2. [2]

    and Yuan M

    Berrett, T.B., Samworth R.J. and Yuan M. (2019). Efficient multivariate entropy estimation via k-nearest neighbour distances. Ann. of Statist . 47, 288–318

  3. [3]

    and Devroye L

    Biau G. and Devroye L. (2015). Lectures on the Nearest Neighbor Method . Springer, Cham

  4. [4]

    Billingsley, P. (1999). Convergence of Probability Measures , 2nd edn. John Wiley, New York

  5. [5]

    (2006) Pattern Recognition and Machine Learning

    Bishop, C.M. (2006) Pattern Recognition and Machine Learning . Springer, Singapore

  6. [6]

    Borkar, V.S. (1995). Probability Theory. An Advanced Course . Springer, New York

  7. [7]

    Bulinski, A., Dimitrov, D. (2019). Statistical estimation of the Shannon entropy. Acta Mathematica Sinica. English series . 35, 17–46

  8. [8]

    and Kozhevin, A

    Bulinski, A. and Kozhevin, A. (2018). Statistical estimation of conditional Shannon entropy. ESAIM: Probability and Statistics . Published online: November 28, 1–35

  9. [9]

    Charzy´nska, A., Gambin, A. (2016). Improvement of of the k-NN entropy estimator with applica- tions in systems biology. Entropy, 18(1), 13

  10. [10]

    Coelho F., Braga A.P., Verleysen M. (2016). A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems, International Journal of Computational Intelligence Systems , 9, 726–733

  11. [11]

    Cui, S., Luo, C. (2016). Feature-based non-parametric estimation of Kullback–L eibler divergence for SAR image change detection. Remote Sensing Letters , 11, 1102–1111

  12. [12]

    Delattre, S., Fournier, N. (2017). On the Kozachenko-Leonenko entropy estimator. Journal of Statistical Planning and Inference , DOI: http://dx.doi.org/10.1016/j.jspi.2017.01.004 (accepted manuscript)

  13. [13]

    Deledalle, C-A. (2017). Estimation of Kullback-Leibler losses for noisy recovery pr oblems within the exponential family. Electronic Journal of Statistics 11, 3141–3164

  14. [14]

    Evans, D. (2008). A computationally efficient estimator for mutual informatio n. Proc. Royal Soc. A , 464, 1203–1215. 32

  15. [15]

    and Schmidt, W.M

    Evans, D., Jones, A.J. and Schmidt, W.M. (2002). Asymptotic moments of near-neighbour dis- tance distributions. Proc. Royal Soc. A , 458, 2839–2849

  16. [16]

    and Galstyan A

    Gao, S., Steeg, G.V. and Galstyan A. (2015). Proc. of 31st Conference on Uncertainty in Arti- ficial Intelligence, Amsterdam, Netherlands, July 12 - 16, 2 015, 278–287

  17. [17]

    and Garnier, N.B

    Granero-Belinch´on, C., Roux, S.G. and Garnier, N.B. (2018). Kullback-Leibler divergence measure of intermittency: Application to turbulence. Physical Review E . 97, 013107, 1–10

  18. [18]

    Kallenberg, O. (1997). Foundations of Modern Probability . Springer, New York

  19. [19]

    Kozachenko, L.F., Leonenko, N.N. (1987). Sample estimate of the entropy of a random vector. Problems of Information Transmission , 23, 9–16

  20. [20]

    Kraskov, A., St ¨ogbauer, H., Grassberger, P. (2004). Estimating mutual information. Phys. Rev. E, 69:066138

  21. [21]

    Leonenko, N.N., Pronzato, L., Savani V. (2008). A class of R´ enyi information estimations for multidimensional densities. The Annals of Statistics , 36, 2153–2182. Correction: The Annals of Statis- tics (2010). 38, 3837-3838

  22. [22]

    and Liu, H

    Li, J., Cheng, K., W ang, S., Morstatter, F., Trevino, R.P., T ang, J. and Liu, H. (2017). Feature Selection: A Data Perspective. ACM Comput. Surv. . 50, Article 94 (December 2017), 1–45

  23. [23]

    and Chen, X

    Ma, T., W ang, F., Cheng, J., Yu, Y. and Chen, X. (2016). A hybrid spectral clustering and deep neural network ensemble algorithm for intrusion detectionin s ensor networks. Sensors 16, 1701, doi:10.3390/s1610170, 1-23

  24. [24]

    and Hero, A.O.III (2014)

    Moon, K.R., Sricharan, K., Greenewald, K. and Hero, A.O.III (2014). Ensemble estimation of information divergence. Entropy, 20, 560; doi:10.3390/e20080560, 1–39

  25. [25]

    and Veeravalli, V.V

    Moulin, P. and Veeravalli, V.V. (2019). Statistical Inference for Engineers and Data Scientists . Cambridge University Press

  26. [26]

    and Lee, D.D

    Noh, Y.K., Sugiyama, M., Liu, S., du Plessis, M.C., Park, F.C . and Lee, D.D. (2018). Bias reduction and metric learning for nearest-neighbor estimation of K ullback-Leibler divergence. Neural Computation. 30, 1930–1960

  27. [27]

    P´al, D., P ´oczos, B., Szepesv ´ari C. (2010). Estimation of R´ enyi entropy and mutual information based on generalized nearest-neighbor graphs. In: NIPS’10 Proceedings of the 23rd International Con- ference on Neural Information Processing Systems, Vancouv er, British Columbia, Canada (December 06 - 09, 2010) , 1849–1857

  28. [28]

    Pardo, L. (2006). Statistical Inference Based on Divergence Measures. C hapman and Hall/CRC, Boca Raton

  29. [29]

    Peng, H., Long, F., Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. on Pattern Analysis and Machine Intelligence, 27, 1226–1238

  30. [30]

    (2013) Limit theory for point processes in manifolds

    Penrose M.D., Yukich J.E. (2013) Limit theory for point processes in manifolds. Annals of Applied Probability, 6, 2160–2211

  31. [31]

    P´ erez-Cruz, F. (2009). Estimation of information theoretic measures for continu ous random variables. Advances in Neural Information Processing Systems , 1257–1264

  32. [32]

    P´oczos, B, Xiong, L., Schneider, J. (2011). Nonparametric divergence estimation with applications to machine learning on distributions. Proceedings of the Twenty-Seventh Conference on Uncertain ty in Artificial Intelligence, Barcelona, Spain July 14 - 17, 2011 . AUAI Press, Arlington, 599–608

  33. [33]

    and Sugiyama, M

    Sasaki, H., Noh, Y-K., Niu, G. and Sugiyama, M. (2016). Direct density derivative estimation. Neural Computation. 28, 1101–1140. 33

  34. [34]

    (2016) F-difergence inequalities

    Sason I., Verd ´u S. (2016) F-difergence inequalities. IEEE Transactions on Information Theory . 62, 5973 - 6006

  35. [35]

    Shannon, C.E. (1948). A mathematical theory of communication. Bell Systems Technical Journal , 27, July and October, 379–423 and 623–656

  36. [36]

    Shiryaev, A.N. (2016). Probability - 1 . 3rd edn. Springer, New York

  37. [37]

    Singh, S., P ´oszoc, B. (2016). Analysis of k-nearest neighbor distances with application to entropy estimation, arXiv preptint , arXiv: 1603.08578v2

  38. [38]

    Sricharan, K., Wei, D., Hero, A.O. (2013). Ensemble estimators for multivariate entropy estima- tion. IEEE Transactions on Information Theory , 59, 4374–4388

  39. [39]

    Stowell, D., Plumbley, M.D. (2009). Fast multidimensional entropy estimation by k-d partitioning. IEEE Signal Processing Letters , 16, NO. 6, JUNE (2009), preprint

  40. [40]

    Tsybakov A.B., V an der Meulen, E. C. (1996). Root- n consistent estimators of entropy for densities with unbounded support. Scand. J. Stat. 23, 75–83

  41. [41]

    Vergara J.R., Est ´ evez P.A. (2014). A review of feature selection methods based on mutual inf or- mation. Neural Comput. and Applic. 24, 175–186

  42. [42]

    W ang, Q., Kulkarni, S.R., Verd ´u, S. (2009). Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Transactions on Information Theory 55, 2392–2405

  43. [43]

    Yeh Yeh, J. (2014). Real Analysis: Theory of Measure and Integration . 3rd edn. World Scientific, Singapore

  44. [44]

    and Peng, M-L

    Yu, X-P, Chen, S-X. and Peng, M-L. (2017). Application of partial least squares algorithm based on Kullback - Leibler divergence in intrusion detection. In: Cai N. (Ed .) Proc. of the Int. conference Computer Science and Technology (CST2016), Shenzhen, Chin a, 8 10 January 2016 , World Scientific, Singapore, 256–263

  45. [45]

    and Tong, G

    Zhou, R., Cai, R. and Tong, G. (2013). Applications of entropy in finance: a review. Entropy. 15, 4909–4931. 34