Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Marcin Mo\.zejko; Przemys{\l}aw Sadownik; Rafa{\l} Rolczy\'nski; Renard Korzeniowski; Tomasz Korbak

arxiv: 1906.09325 · v1 · pith:PCJ4ZOH4new · submitted 2019-06-17 · 💻 cs.CL · cs.LG· stat.ML

Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

Renard Korzeniowski , Rafa{\l} Rolczy\'nski , Przemys{\l}aw Sadownik , Tomasz Korbak , Marcin Mo\.zejko This is my paper

classification 💻 cs.CL cs.LGstat.ML

keywords modeltaskclassificationdetectionengineeringfeaturefine-tuninghate

0 comments

read the original abstract

This paper presents our contribution to PolEval 2019 Task 6: Hate speech and bullying detection. We describe three parallel approaches that we followed: fine-tuning a pre-trained ULMFiT model to our classification task, fine-tuning a pre-trained BERT model to our classification task, and using the TPOT library to find the optimal pipeline. We present results achieved by these three tools and review their advantages and disadvantages in terms of user experience. Our team placed second in subtask 2 with a shallow model found by TPOT: a~logistic regression classifier with non-trivial feature engineering.

This paper has not been read by Pith yet.

Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish

discussion (0)