Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Azucena Montes Rend\'on; Carlos-Emiliano Gonz\'alez-Gallardo; Gerardo Sierra; Juan-Manuel Torres-Moreno

arxiv: 1702.06467 · v1 · pith:CZGYGLRJnew · submitted 2017-02-21 · 💻 cs.IR · cs.CL· cs.SI

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

Carlos-Emiliano Gonz\'alez-Gallardo , Juan-Manuel Torres-Moreno , Azucena Montes Rend\'on , Gerardo Sierra This is my paper

classification 💻 cs.IR cs.CLcs.SI

keywords normalizationcharacterdocumentsdynamicmultilingualn-gramsnetworkperformance

0 comments

read the original abstract

In this paper we describe a dynamic normalization process applied to social network multilingual documents (Facebook and Twitter) to improve the performance of the Author profiling task for short texts. After the normalization process, $n$-grams of characters and n-grams of POS tags are obtained to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of performance.

This paper has not been read by Pith yet.

Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization

discussion (0)