Two-sample comparison through additive tree models for density ratios

Li Ma; Naoki Awaya; Yuliang Xu

arxiv: 2508.03059 · v4 · submitted 2025-08-05 · 📊 stat.ME · stat.CO· stat.ML

Two-sample comparison through additive tree models for density ratios

Naoki Awaya , Yuliang Xu , Li Ma This is my paper

Pith reviewed 2026-05-19 01:13 UTC · model grok-4.3

classification 📊 stat.ME stat.COstat.ML

keywords density ratio estimationtwo-sample comparisonadditive treesBayesian inferencebalancing lossuncertainty quantificationgenerative model assessment

0 comments

The pith

Additive tree models with a balancing loss estimate density ratios and quantify uncertainty in the estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops additive tree models to estimate the ratio of two densities from separate samples of observations. It introduces a balancing loss that supports both efficient training through standard supervised learning algorithms and generalized Bayesian inference using backfitting samplers. This matters for applications where data may be limited or high-dimensional, because it supplies not only a point estimate of the difference but also measures of reliability. The work also links the loss to classification problems and certain divergence measures, and demonstrates use in evaluating generative models on compositional data.

Core claim

Additive tree models trained with the balancing loss accurately estimate the density ratio between two distributions; because the loss resembles an exponential family kernel it functions as a pseudo-likelihood with conjugate priors, permitting direct generalized Bayesian inference on the ratio through BART-style backfitting samplers.

What carries the argument

The balancing loss, which serves as a pseudo-likelihood with conjugate priors for additive tree models and enables both optimization and Bayesian sampling.

If this is right

Tree models can be trained for density ratios using forward-stagewise optimization and gradient boosting.
Generalized Bayesian inference supplies uncertainty quantification for the estimated density ratio.
The balancing loss connects to the exponential loss used in binary classification and to variational expressions for f-divergences such as squared Hellinger distance.
The method can be applied to assess the quality of generative models on microbiome compositional data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same loss and sampling approach could be tested on other two-sample problems where uncertainty in the ratio directly affects downstream decisions.
Links to classification losses suggest the framework might adapt to settings with class imbalance or unequal sample sizes.
Further scaling experiments could check whether the Bayesian uncertainty remains reliable as dimension grows.

Load-bearing premise

The balancing loss functions as a pseudo-likelihood with conjugate priors so that standard backfitting samplers can be applied directly with only limited approximation error.

What would settle it

A benchmark experiment in which the Bayesian credible intervals for the density ratio fail to achieve nominal coverage rates on held-out data or in which the point estimates are less accurate than existing density-ratio methods.

read the original abstract

The ratio of two densities provides a direct characterization of their differences. We consider the two-sample comparison problem by estimating this ratio given i.i.d. observations from two distributions. To this end, we propose additive tree models for density ratio estimation along with efficient algorithms using a new loss function, the balancing loss. The loss allows tree-based models to be trained using several algorithms originally designed for supervised learning, such as forward-stagewise optimization and gradient boosting. Moreover, the balancing loss resembles an exponential family kernel, and it can serve as a pseudo-likelihood with conjugate priors. This property enables generalized Bayesian inference on the density ratio using backfitting samplers designed for Bayesian additive regression trees (BART). Our Bayesian strategy provides uncertainty quantification for the inferred density ratio, which is critical for applications involving high-dimensional and data-limited distributions with potentially substantial uncertainty. We further show connections of the balancing loss to the exponential loss in binary classification and to the variational form of f-divergence, particularly the squared Hellinger distance. Numerical experiments demonstrate that our method achieves both accuracy and computational efficiency, while uniquely providing uncertainty quantification. Finally, we demonstrate its application to assessing the quality of generative models for microbiome compositional data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The balancing loss for additive trees on density ratios is a clean new device that reuses supervised training tricks and BART samplers, but the conjugacy needed for reliable UQ looks like it may require approximations not fully spelled out.

read the letter

The paper's core move is a balancing loss that lets additive tree models be fit to density ratios using forward-stagewise or gradient-boosting steps, then extends the same loss to a generalized Bayesian setup via BART-style backfitting. That combination is new enough on its own terms. The experiments report solid accuracy and speed on simulated and real two-sample tasks, and the microbiome generative-model check is a useful concrete application. The links they draw to exponential classification loss and to the variational form of squared Hellinger distance are also cleanly stated and help situate the work.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes additive tree models for estimating the density ratio r(x) = p(x)/q(x) between two distributions given i.i.d. samples from each. A new balancing loss is introduced that permits training of the tree ensemble via forward-stagewise optimization and gradient boosting. The loss is shown to resemble an exponential-family kernel, allowing it to function as a pseudo-likelihood with conjugate priors; this enables generalized Bayesian inference on the density ratio via existing BART backfitting samplers, thereby supplying uncertainty quantification. Theoretical connections are drawn to the exponential loss for binary classification and to the variational representation of the squared Hellinger distance. Numerical experiments and an application to assessing generative models on microbiome compositional data are presented to illustrate accuracy, efficiency, and practical utility.

Significance. If the conjugacy claim holds without material approximation, the work would deliver a flexible, computationally efficient density-ratio estimator that uniquely supplies posterior uncertainty quantification for high-dimensional or data-limited regimes. The explicit links to classification losses and f-divergences provide useful theoretical context, and the microbiome application demonstrates relevance to modern statistical problems. Reproducible code and the use of established BART machinery are positive features that would strengthen the contribution if the central technical claims are verified.

major comments (2)

[§3.2] §3.2 (Balancing loss and pseudo-likelihood): the manuscript asserts that the balancing loss 'resembles an exponential family kernel' and thereby admits conjugate priors for direct application of BART backfitting. An explicit derivation of the resulting posterior (or the precise form of the conditional updates) is required to confirm that the positivity constraint on the ratio and the two-sample likelihood do not introduce non-conjugacy or necessitate additional variational approximations. Without this, the claimed uncertainty quantification may not correspond to the asserted generalized posterior.
[§4] §4 (Numerical experiments): the reported accuracy gains are presented without accompanying standard errors or formal statistical comparisons against the strongest existing density-ratio baselines (e.g., KLIEP, uLSIF, or recent tree-based competitors). In addition, the experiments do not isolate the contribution of the balancing loss versus the tree architecture, making it difficult to attribute performance to the proposed loss.

minor comments (2)

[§2] Notation for the density ratio and the two-sample indicator should be introduced once and used consistently; occasional switches between r(x) and exp(f(x)) are momentarily confusing.
[Figure 3] Figure 3 (microbiome application): the caption should explicitly state whether the plotted intervals are pointwise credible intervals or simultaneous bands.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and technical clarity.

read point-by-point responses

Referee: [§3.2] §3.2 (Balancing loss and pseudo-likelihood): the manuscript asserts that the balancing loss 'resembles an exponential family kernel' and thereby admits conjugate priors for direct application of BART backfitting. An explicit derivation of the resulting posterior (or the precise form of the conditional updates) is required to confirm that the positivity constraint on the ratio and the two-sample likelihood do not introduce non-conjugacy or necessitate additional variational approximations. Without this, the claimed uncertainty quantification may not correspond to the asserted generalized posterior.

Authors: We appreciate the referee's request for greater explicitness. Section 3.2 establishes that the balancing loss takes the kernel form of an exponential family, permitting conjugate priors and direct use of existing BART backfitting samplers. To address the concern, the revised manuscript will include a dedicated derivation subsection showing the conditional posterior updates under the two-sample pseudo-likelihood. The exponential form of the loss ensures the positivity constraint is satisfied automatically and does not break conjugacy; no variational approximation is introduced. We will also clarify that the generalized posterior is obtained exactly via the standard BART sampler applied to the pseudo-likelihood. revision: yes
Referee: [§4] §4 (Numerical experiments): the reported accuracy gains are presented without accompanying standard errors or formal statistical comparisons against the strongest existing density-ratio baselines (e.g., KLIEP, uLSIF, or recent tree-based competitors). In addition, the experiments do not isolate the contribution of the balancing loss versus the tree architecture, making it difficult to attribute performance to the proposed loss.

Authors: We agree that standard errors and formal comparisons would improve interpretability. In the revision we will report standard errors across replications and add formal pairwise comparisons (e.g., Wilcoxon signed-rank tests) against KLIEP, uLSIF, and the strongest tree-based baselines. To isolate the balancing loss, we will include an ablation experiment that replaces the balancing loss with a standard squared-error or logistic loss while keeping the additive-tree architecture fixed, thereby quantifying the incremental benefit of the proposed loss. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper proposes a new balancing loss for additive tree models in density ratio estimation. It states that this loss resembles an exponential family kernel and thereby serves as a pseudo-likelihood with conjugate priors, enabling direct application of BART backfitting samplers for generalized Bayesian inference and uncertainty quantification. This does not constitute circularity because the conjugacy property is presented as following from the explicit form of the introduced loss rather than being presupposed or reducing to a fitted parameter renamed as a prediction. No self-citation chains, uniqueness theorems imported from prior author work, or self-definitional steps are evident that would force the central claims by construction. The training algorithms (forward-stagewise, gradient boosting) and the Bayesian procedure are independent applications of the loss properties, making the overall derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the balancing loss having exponential-family-like properties that enable both boosting training and conjugate Bayesian inference; tree model hyperparameters are typical free choices.

free parameters (1)

tree ensemble hyperparameters
Number of trees, depth, and learning rate in additive tree models are chosen or tuned and affect the fitted density ratio.

axioms (1)

domain assumption The balancing loss resembles an exponential family kernel and serves as a pseudo-likelihood with conjugate priors.
Invoked to justify use of backfitting samplers from Bayesian additive regression trees for uncertainty quantification.

pith-pipeline@v0.9.0 · 5743 in / 1284 out tokens · 49508 ms · 2026-05-19T01:13:13.739818+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the balancing loss function l(w) = Ep[w^{-1}] + Eq[w] ... resembles an exponential family kernel, and it can serve as a pseudo-likelihood with conjugate priors

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.