pith. the verified trust layer for science. sign in

arxiv: 1702.01520 · v1 · pith:OCO6CA62new · submitted 2017-02-06 · 💻 cs.IR

Document Visualization using Topic Clouds

classification 💻 cs.IR
keywords topicdocumentcloudrepresentationtopicswordsderivedimportant
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{OCO6CA62}

Prints a linked pith:OCO6CA62 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Traditionally a document is visualized by a word cloud. Recently, distributed representation methods for documents have been developed, which map a document to a set of topic embeddings. Visualizing such a representation is useful to present the semantics of a document in higher granularity; it is also challenging, as there are multiple topics, each containing multiple words. We propose to visualize a set of topics using Topic Cloud, which is a pie chart consisting of topic slices, where each slice contains important words in this topic. To make important topics/words visually prominent, the sizes of topic slices and word fonts are proportional to their importance in the document. A topic cloud can help the user quickly evaluate the quality of derived document representations. For NLP practitioners, It can be used to qualitatively compare the topic quality of different document representation algorithms, or to inspect how model parameters impact the derived representations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.