Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

Andrew Matteson; Chanhee Lee; Heuiseok Lim; Young-Bum Kim

arxiv: 1806.10771 · v1 · pith:W2JGDFBInew · submitted 2018-06-28 · 💻 cs.CL

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

Andrew Matteson , Chanhee Lee , Young-Bum Kim , Heuiseok Lim This is my paper

classification 💻 cs.CL

keywords morphologicalkoreananalysischaracter-leveltaggingactionsknowledgeknown

0 comments

read the original abstract

Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i.e., a dictionary of known morphological transformation forms, or actions). These models have been created with the assumption that character-level, dictionary-less morphological analysis was intractable due to the number of actions required. We present, in this study, a multi-stage action-based model that can perform morphological transformation and part-of-speech tagging using arbitrary units of input and apply it to the case of character-level Korean morphological analysis. Among models that do not employ prior linguistic knowledge, we achieve state-of-the-art word and sentence-level tagging accuracy with the Sejong Korean corpus using our proposed data-driven Bi-LSTM model.

This paper has not been read by Pith yet.

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

discussion (0)