Parallelization of Maximum Entropy POS Tagging for Bahasa Indonesia with MapReduce

Arif Nurwidyantoro; Edi Winarko

arxiv: 1208.3047 · v1 · pith:35FFNCDOnew · submitted 2012-08-15 · 💻 cs.DC · cs.CL

Parallelization of Maximum Entropy POS Tagging for Bahasa Indonesia with MapReduce

Arif Nurwidyantoro , Edi Winarko This is my paper

classification 💻 cs.DC cs.CL

keywords taggingprocessmapreducetrainingshowedtimebahasaentropy

0 comments

read the original abstract

In this paper, MapReduce programming model is used to parallelize training and tagging proceess in Maximum Entropy part of speech tagging for Bahasa Indonesia. In training process, MapReduce model is implemented dictionary, tagtoken, and feature creation. In tagging process, MapReduce is implemented to tag lines of document in parallel. The training experiments showed that total training time using MapReduce is faster, but its result reading time inside the process slow down the total training time. The tagging experiments using different number of map and reduce process showed that MapReduce implementation could speedup the tagging process. The fastest tagging result is showed by tagging process using 1,000,000 word corpus and 30 map process.

This paper has not been read by Pith yet.

Parallelization of Maximum Entropy POS Tagging for Bahasa Indonesia with MapReduce

discussion (0)