Distributional Clustering of English Words

Fernando Pereira (AT&T Bell Laboratories); Lillian Lee (Harvard University); Naftali Tishby (Hebrew University)

arxiv: cmp-lg/9408011 · v1 · submitted 1994-08-22 · cmp-lg · cs.CL

Distributional Clustering of English Words

Fernando Pereira (AT&T Bell Laboratories) , Naftali Tishby (Hebrew University) , Lillian Lee (Harvard University) This is my paper

classification cmp-lg cs.CL

keywords clusteringclustersannealingdatamodelsusedwordsaccording

0 comments

read the original abstract

We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical ``soft'' clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

This paper has not been read by Pith yet.

Distributional Clustering of English Words

discussion (0)