Second Order Calibration: A Simple Way to Get Approximate Posteriors

Amir Najmi; Omkar Muralidharan

read the original abstract

Many large-scale machine learning problems involve estimating an unknown parameter $\theta_{i}$ for each of many items. For example, a key problem in sponsored search is to estimate the click through rate (CTR) of each of billions of query-ad pairs. Most common methods, though, only give a point estimate of each $\theta_{i}$. A posterior distribution for each $\theta_{i}$ is usually more useful but harder to get. We present a simple post-processing technique that takes point estimates or scores $t_{i}$ (from any method) and estimates an approximate posterior for each $\theta_{i}$. We build on the idea of calibration, a common post-processing technique that estimates $\mathrm{E}\left(\theta_{i}\!\!\bigm|\!\! t_{i}\right)$. Our method, second order calibration, uses empirical Bayes methods to estimate the distribution of $\theta_{i}\!\!\bigm|\!\! t_{i}$ and uses the estimated distribution as an approximation to the posterior distribution of $\theta_{i}$. We show that this can yield improved point estimates and useful accuracy estimates. The method scales to large problems - our motivating example is a CTR estimation problem involving tens of billions of query-ad pairs.

Second Order Calibration: A Simple Way to Get Approximate Posteriors

discussion (0)