pith. machine review for the scientific record. sign in

arxiv: 1303.3664 · v2 · submitted 2013-03-15 · 📊 stat.ML · cs.LG

Recognition: unknown

Topic Discovery through Data Dependent and Random Projections

Authors on Pith no claims yet
classification 📊 stat.ML cs.LG
keywords topicpatternsprojectionsrandomwordsalgorithmsassociatedcomplexity
0
0 comments X
read the original abstract

We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.