pith. sign in

arxiv: cs/0007018 · v1 · submitted 2000-07-13 · 💻 cs.CL

Bootstrapping a Tagged Corpus through Combination of Existing Heterogeneous Taggers

classification 💻 cs.CL
keywords existingcombi-bootstrapresourcescorpussamplesmalltaggertaggers
0
0 comments X
read the original abstract

This paper describes a new method, Combi-bootstrap, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. Combi-bootstrap uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that Combi-bootstrap: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.