Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging

Alham Fikri Aji; Kemal Kurniawan

arxiv: 1809.03391 · v3 · pith:6VGLNAKFnew · submitted 2018-09-10 · 💻 cs.CL

Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging

Kemal Kurniawan , Alham Fikri Aji This is my paper

classification 💻 cs.CL

keywords indonesiantaggingmodelsneuraldatasetevaluatedexplorednetwork

0 comments

read the original abstract

Previous work in Indonesian part-of-speech (POS) tagging are hard to compare as they are not evaluated on a common dataset. Furthermore, in spite of the success of neural network models for English POS tagging, they are rarely explored for Indonesian. In this paper, we explored various techniques for Indonesian POS tagging, including rule-based, CRF, and neural network-based models. We evaluated our models on the IDN Tagged Corpus. A new state-of-the-art of 97.47 F1 score is achieved with a recurrent neural network. To provide a standard for future work, we release the dataset split that we used publicly.

This paper has not been read by Pith yet.

Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging

discussion (0)