pith. machine review for the scientific record. sign in

arxiv: 2508.03059 · v4 · submitted 2025-08-05 · 📊 stat.ME · stat.CO· stat.ML

Two-sample comparison through additive tree models for density ratios

classification 📊 stat.ME stat.COstat.ML
keywords lossratiodensitymodelsadditivebalancingbayesianuncertainty
0
0 comments X
read the original abstract

The ratio of two densities provides a direct characterization of their differences. We consider the two-sample comparison problem by estimating this ratio given i.i.d. observations from two distributions. To this end, we propose additive tree models for density ratio estimation along with efficient algorithms using a new loss function, the balancing loss. The loss allows tree-based models to be trained using several algorithms originally designed for supervised learning, such as forward-stagewise optimization and gradient boosting. Moreover, the balancing loss resembles an exponential family kernel, and it can serve as a pseudo-likelihood with conjugate priors. This property enables generalized Bayesian inference on the density ratio using backfitting samplers designed for Bayesian additive regression trees (BART). Our Bayesian strategy provides uncertainty quantification for the inferred density ratio, which is critical for applications involving high-dimensional and data-limited distributions with potentially substantial uncertainty. We further show connections of the balancing loss to the exponential loss in binary classification and to the variational form of f-divergence, particularly the squared Hellinger distance. Numerical experiments demonstrate that our method achieves both accuracy and computational efficiency, while uniquely providing uncertainty quantification. Finally, we demonstrate its application to assessing the quality of generative models for microbiome compositional data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.