pith. sign in

arxiv: 1806.06123 · v1 · pith:UP7W2OVZnew · submitted 2018-06-15 · 💻 cs.LG · stat.ML

On the Relationship between Data Efficiency and Error for Uncertainty Sampling

classification 💻 cs.LG stat.ML
keywords dataerroractiveefficiencylearningsamplinguncertaintyclassifier
0
0 comments X
read the original abstract

While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logistic regression with the popular active learning algorithm, uncertainty sampling. Empirically, on 21 datasets from OpenML, we find a strong inverse correlation between data efficiency and the error rate of the final classifier. Theoretically, we show that for a variant of uncertainty sampling, the asymptotic data efficiency is within a constant factor of the inverse error rate of the limiting classifier.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.