Is a Data-Driven Approach still Better than Random Choice with Naive Bayes classifiers?

Piotr Szyma\'nski; Tomasz Kajdanowicz

arxiv: 1702.04013 · v1 · pith:VZV3PSLNnew · submitted 2017-02-13 · 💻 cs.LG · stat.ML

Is a Data-Driven Approach still Better than Random Choice with Naive Bayes classifiers?

Piotr Szyma\'nski , Tomasz Kajdanowicz This is my paper

classification 💻 cs.LG stat.ML

keywords betterdata-drivenmethodsrandomapproachescaseprioriaccuracy

0 comments

read the original abstract

We study the performance of data-driven, a priori and random approaches to label space partitioning for multi-label classification with a Gaussian Naive Bayes classifier. Experiments were performed on 12 benchmark data sets and evaluated on 5 established measures of classification quality: micro and macro averaged F1 score, Subset Accuracy and Hamming loss. Data-driven methods are significantly better than an average run of the random baseline. In case of F1 scores and Subset Accuracy - data driven approaches were more likely to perform better than random approaches than otherwise in the worst case. There always exists a method that performs better than a priori methods in the worst case. The advantage of data-driven methods against a priori methods with a weak classifier is lesser than when tree classifiers are used.

This paper has not been read by Pith yet.

Is a Data-Driven Approach still Better than Random Choice with Naive Bayes classifiers?

discussion (0)