pith. sign in

arxiv: 1507.07998 · v1 · pith:O6FK3U6Unew · submitted 2015-07-29 · 💻 cs.CL · cs.AI· cs.LG

Document Embedding with Paragraph Vectors

classification 💻 cs.CL cs.AIcs.LG
keywords paragraphmethodvectorsdocumentembeddingotheranalysissentiment
0
0 comments X
read the original abstract

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged for sentiment analysis. That proof of concept, while encouraging, was rather narrow. Here we consider tasks other than sentiment analysis, provide a more thorough comparison of Paragraph Vectors to other document modelling algorithms such as Latent Dirichlet Allocation, and evaluate performance of the method as we vary the dimensionality of the learned representation. We benchmarked the models on two document similarity data sets, one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method performs significantly better than other methods, and propose a simple improvement to enhance embedding quality. Somewhat surprisingly, we also show that much like word embeddings, vector operations on Paragraph Vectors can perform useful semantic results.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Low-supervision urgency detection and transfer in short crisis messages

    cs.CL 2019-07 unverdicted novelty 4.0

    Presents a low-supervision urgency detection system using ensembles and transfer learning that outperforms baselines on multiple disaster datasets.