pith. sign in

arxiv: 1903.08762 · v1 · pith:OYFRPMOWnew · submitted 2019-03-20 · 📊 stat.AP

Large-Scale Online Experimentation with Quantile Metrics

classification 📊 stat.AP
keywords testingquantilemetricsbootstrapmethodologyscalablestatisticallyvalid
0
0 comments X
read the original abstract

Online experimentation (or A/B testing) has been widely adopted in industry as the gold standard for measuring product impacts. Despite the wide adoption, few literatures discuss A/B testing with quantile metrics. Quantile metrics, such as 90th percentile page load time, are crucial to A/B testing as many key performance metrics including site speed and service latency are defined as quantiles. However, with LinkedIn's data size, quantile metric A/B testing is extremely challenging because there is no statistically valid and scalable variance estimator for the quantile of dependent samples: the bootstrap estimator is statistically valid, but takes days to compute; the standard asymptotic variance estimate is scalable but results in order-of-magnitude underestimation. In this paper, we present a statistically valid and scalable methodology for A/B testing with quantiles that is fully generalizable to other A/B testing platforms. It achieves over 500 times speed up compared to bootstrap and has only $2\%$ chance to differ from bootstrap estimates. Beyond methodology, we also share the implementation of a data pipeline using this methodology and insights on pipeline optimization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.