Exploiting Similarities in A/B Testing with Off-Policy Estimation
read the original abstract
We study A/B testing, the standard protocol for measuring the performance gain of a new decision system relative to a baseline. Traditional A/B testing treats both systems as black boxes, ignoring potential similarities between them. In practice, however, new and baseline systems are rarely radically different and often share significant structure, which can be captured by their propensities to make similar decisions. We show that in such cases, the commonly used difference-in-means estimator, though unbiased, is statistically suboptimal. Leveraging off-policy estimation, we introduce a family of A/B testing estimators that exploit the propensities of the tested systems to achieve improved concentration properties. This family is flexible enough to be tailored to practical decision-making. The resulting estimators are simple, robust to propensities misspecification, substantially more accurate when the tested systems exhibit similarities, and gracefully fall back to the difference-in-means estimator when such similarities are absent. Our theoretical analysis and empirical studies confirm their efficiency and practicality.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Logging Policy Design for Off-Policy Evaluation
Derives optimal logging policies for off-policy evaluation by balancing reward concentration against action coverage in known, unknown, and partially known regimes of target policy and rewards.
-
Logging Policy Design for Off-Policy Evaluation
Derives optimal logging policies for minimizing off-policy evaluation error under known, unknown, and partially known target policies and reward distributions.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.