pith. sign in

arxiv: cmp-lg/9411022 · v2 · submitted 1994-11-16 · cmp-lg · cs.CL

Adaptive Sentence Boundary Disambiguation

classification cmp-lg cs.CL
keywords sentenceboundariesefficientincludingmarksmethodpart-of-speechadaptable
0
0 comments X
read the original abstract

Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5\% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.