Exploiting Similarities in A/B Testing with Off-Policy Estimation

Alexandre Gilotte; David Rohde; Otmane Sakhi

arxiv: 2506.10677 · v3 · pith:WKZ3EEVGnew · submitted 2025-06-12 · 📊 stat.ML · cs.LG

Exploiting Similarities in A/B Testing with Off-Policy Estimation

Otmane Sakhi , Alexandre Gilotte , David Rohde This is my paper

classification 📊 stat.ML cs.LG

keywords similaritiessystemstestingpropensitiesbaselinedifference-in-meansestimationestimator

0 comments

read the original abstract

We study A/B testing, the standard protocol for measuring the performance gain of a new decision system relative to a baseline. Traditional A/B testing treats both systems as black boxes, ignoring potential similarities between them. In practice, however, new and baseline systems are rarely radically different and often share significant structure, which can be captured by their propensities to make similar decisions. We show that in such cases, the commonly used difference-in-means estimator, though unbiased, is statistically suboptimal. Leveraging off-policy estimation, we introduce a family of A/B testing estimators that exploit the propensities of the tested systems to achieve improved concentration properties. This family is flexible enough to be tailored to practical decision-making. The resulting estimators are simple, robust to propensities misspecification, substantially more accurate when the tested systems exhibit similarities, and gracefully fall back to the difference-in-means estimator when such similarities are absent. Our theoretical analysis and empirical studies confirm their efficiency and practicality.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Logging Policy Design for Off-Policy Evaluation
stat.ML 2026-05 unverdicted novelty 7.0

Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.
Logging Policy Design for Off-Policy Evaluation
stat.ML 2026-05 unverdicted novelty 5.0

Derives optimal logging policies for minimizing off-policy evaluation error under known, unknown, and partially known target policies and reward distributions.