Tagging accurately -- Don't guess if you know

Atro Voutilainen (Research Unit for Computational Linguistics; Grenoble Laboratory); Pasi Tapanainen (Rank Xerox Research Centre; University of Helsinki)

arxiv: cmp-lg/9408009 · v1 · submitted 1994-08-16 · cmp-lg · cs.CL

Tagging accurately -- Don't guess if you know

Pasi Tapanainen (Rank Xerox Research Centre , Grenoble Laboratory) , Atro Voutilainen (Research Unit for Computational Linguistics , University of Helsinki) This is my paper

classification cmp-lg cs.CL

keywords taggerscombiningdiscusstextaccuracyaccuratelyachievecombine

0 comments

read the original abstract

We discuss combining knowledge-based (or rule-based) and statistical part-of-speech taggers. We use two mature taggers, ENGCG and Xerox Tagger, to independently tag the same text and combine the results to produce a fully disambiguated text. In a 27000 word test sample taken from a previously unseen corpus we achieve 98.5% accuracy. This paper presents the data in detail. We describe the problems we encountered in the course of combining the two taggers and discuss the problem of evaluating taggers.

This paper has not been read by Pith yet.

Tagging accurately -- Don't guess if you know

discussion (0)