pith. sign in

arxiv: 1603.04747 · v1 · pith:RXQ65Q3Enew · submitted 2016-03-15 · 💻 cs.CL

Topic Modeling Using Distributed Word Embeddings

classification 💻 cs.CL
keywords topicsmodelingtopiccontentcorpusdistributedembeddingsfind
0
0 comments X
read the original abstract

We propose a new algorithm for topic modeling, Vec2Topic, that identifies the main topics in a corpus using semantic information captured via high-dimensional distributed word embeddings. Our technique is unsupervised and generates a list of topics ranked with respect to importance. We find that it works better than existing topic modeling techniques such as Latent Dirichlet Allocation for identifying key topics in user-generated content, such as emails, chats, etc., where topics are diffused across the corpus. We also find that Vec2Topic works equally well for non-user generated content, such as papers, reports, etc., and for small corpora such as a single-document.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.