pith. sign in

arxiv: 2502.12738 · v1 · pith:AKDWLZSBnew · submitted 2025-02-18 · 🧮 math.ST · stat.TH

Existence of Direct Density Ratio Estimators

classification 🧮 math.ST stat.TH
keywords existencekliepdirectestimationparameterproblemssufficientaverage
0
0 comments X
read the original abstract

Many two-sample problems call for a comparison of two distributions from an exponential family. Density ratio estimation methods provide ways to solve such problems through direct estimation of the differences in natural parameters. The term direct indicates that one avoids estimating both marginal distributions. In this context, we consider the Kullback--Leibler Importance Estimation Procedure (KLIEP), which has been the subject of recent work on differential networks. Our main result shows that the existence of the KLIEP estimator is characterized by whether the average sufficient statistic for one sample belongs to the convex hull of the set of all sufficient statistics for data points in the second sample. For high-dimensional problems it is customary to regularize the KLIEP loss by adding the product of a tuning parameter and a norm of the vector of parameter differences. We show that the existence of the regularized KLIEP estimator requires the tuning parameter to be no less than the dual norm-based distance between the average sufficient statistic and the convex hull. The implications of these existence issues are explored in applications to differential network analysis.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Regularized Variational and Spectral Log-Density-Ratio Estimation in the Gaussian Location Model

    cs.LG 2026-07 unverdicted novelty 6.0

    Derives asymptotic risk characterizations for regularized variational and spectral log-density-ratio estimators in the Gaussian location model and compares their performance across observation regimes.