Structural Tags, Annealing and Automatic Word Classification
classification
cmp-lg
cs.CL
keywords
classificationsystemwordalgorithmannealingautomaticcorpuscurrent
read the original abstract
This paper describes an automatic word classification system which uses a locally optimal annealing algorithm and average class mutual information. A new word-class representation, the structural tag is introduced and its advantages for use in statistical language modelling are presented. A summary of some results with the one million word LOB corpus is given; the algorithm is also shown to discover the vowel-consonant distinction and displays an ability to cluster words syntactically in a Latin corpus. Finally, a comparison is made between the current classification system and several leading alternative systems, which shows that the current system performs respectably well.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.