pith. sign in

arxiv: 1508.02297 · v1 · pith:MG3VM7H7new · submitted 2015-08-10 · 💻 cs.CL

Measuring Word Significance using Distributed Representations of Words

classification 💻 cs.CL
keywords significancevectorswordwordscorporacorpusdistributedmikolov
0
0 comments X
read the original abstract

Distributed representations of words as real-valued vectors in a relatively low-dimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et al., 2013a; Mikolov et al., 2013b), was shown to encode semantic information in the direction of the word vectors. In this brief report, it is proposed to use the length of the vectors, together with the term frequency, as measure of word significance in a corpus. Experimental evidence using a domain-specific corpus of abstracts is presented to support this proposal. A useful visualization technique for text corpora emerges, where words are mapped onto a two-dimensional plane and automatically ranked by significance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

    cs.LG 2026-06 unverdicted novelty 3.0

    N-GRPO enhances GRPO via Semantic Neighbor Mixing of token embeddings to improve diversity and consistency in LLM math reasoning rollouts.