pith. sign in

arxiv: 1703.00565 · v3 · pith:FES5ZBLQnew · submitted 2017-03-02 · 💻 cs.CL · cs.IR

Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ

classification 💻 cs.CL cs.IR
keywords toolscattertextcategoriesdocumentvisualizationvisualizingableaxis
0
0 comments X
read the original abstract

Scattertext is an open source tool for visualizing linguistic variation between document categories in a language-independent way. The tool presents a scatterplot, where each axis corresponds to the rank-frequency a term occurs in a category of documents. Through a tie-breaking strategy, the tool is able to display thousands of visible term-representing points and find space to legibly label hundreds of them. Scattertext also lends itself to a query-based visualization of how the use of terms with similar embeddings differs between document categories, as well as a visualization for comparing the importance scores of bag-of-words features to univariate metrics.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Context-Aware Explanations for Spatialized Document Layouts

    cs.HC 2026-06 unverdicted novelty 6.0

    CAPE produces spatially grounded natural-language explanations for document layouts using pattern detection and multi-level context, rated more helpful than content-only baselines in a user study.