pith. sign in

arxiv: 1608.04434 · v1 · pith:TX672LN3new · submitted 2016-08-15 · 💻 cs.CL

Natural Language Processing using Hadoop and KOSHIK

classification 💻 cs.CL
keywords processinglanguagenaturaldatakoshikhadoopmanyadvantages
0
0 comments X
read the original abstract

Natural language processing, as a data analytics related technology, is used widely in many research areas such as artificial intelligence, human language processing, and translation. At present, due to explosive growth of data, there are many challenges for natural language processing. Hadoop is one of the platforms that can process the large amount of data required for natural language processing. KOSHIK is one of the natural language processing architectures, and utilizes Hadoop and contains language processing components such as Stanford CoreNLP and OpenNLP. This study describes how to build a KOSHIK platform with the relevant tools, and provides the steps to analyze wiki data. Finally, it evaluates and discusses the advantages and disadvantages of the KOSHIK architecture, and gives recommendations on improving the processing performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.