pith. machine review for the scientific record. sign in

arxiv: cmp-lg/9606012 · v1 · submitted 1996-06-11 · cmp-lg · cs.CL

Recognition: unknown

An Efficient Inductive Unsupervised Semantic Tagger

Authors on Pith no claims yet
classification cmp-lg cs.CL
keywords semantictaggerefficienttagswordsconditionalcorpusfinal
0
0 comments X
read the original abstract

We report our development of a simple but fast and efficient inductive unsupervised semantic tagger for Chinese words. A POS hand-tagged corpus of 348,000 words is used. The corpus is being tagged in two steps. First, possible semantic tags are selected from a semantic dictionary(Tong Yi Ci Ci Lin), the POS and the conditional probability of semantic from POS, i.e., P(S|P). The final semantic tag is then assigned by considering the semantic tags before and after the current word and the semantic-word conditional probability P(S|W) derived from the first step. Semantic bigram probabilities P(S|S) are used in the second step. Final manual checking shows that this simple but efficient algorithm has a hit rate of 91%. The tagger tags 142 words per second, using a 120 MHz Pentium running FOXPRO. It runs about 2.3 times faster than a Viterbi tagger.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.