pith. sign in

arxiv: 1706.08866 · v1 · pith:BKM65P4Snew · submitted 2017-06-27 · 💻 cs.HC

Re-Evaluating the Netflix Prize - Human Uncertainty and its Impact on Reliability

classification 💻 cs.HC
keywords differentdensitieserrorexaminehumannetflixpossibleprize
0
0 comments X
read the original abstract

In this paper, we examine the statistical soundness of comparative assessments within the field of recommender systems in terms of reliability and human uncertainty. From a controlled experiment, we get the insight that users provide different ratings on same items when repeatedly asked. This volatility of user ratings justifies the assumption of using probability densities instead of single rating scores. As a consequence, the well-known accuracy metrics (e.g. MAE, MSE, RMSE) yield a density themselves that emerges from convolution of all rating densities. When two different systems produce different RMSE distributions with significant intersection, then there exists a probability of error for each possible ranking. As an application, we examine possible ranking errors of the Netflix Prize. We are able to show that all top rankings are more or less subject to high probabilities of error and that some rankings may be deemed to be caused by mere chance rather than system quality.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.