Distributed Representations of Words and Phrases and their Compositionality

Greg Corrado; Ilya Sutskever; Jeffrey Dean; Kai Chen; Tomas Mikolov

arxiv: 1310.4546 · v1 · pith:W2DEKMNWnew · submitted 2013-10-16 · 💻 cs.CL · cs.LG· stat.ML

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov , Ilya Sutskever , Kai Chen , Greg Corrado , Jeffrey Dean This is my paper

classification 💻 cs.CL cs.LGstat.ML

keywords representationsphraseswordcanadadistributedexamplelearningmethod

0 comments

read the original abstract

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
SimCSE: Simple Contrastive Learning of Sentence Embeddings
cs.CL 2021-04 conditional novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
cs.CL 2026-05 unverdicted novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic onlin...
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
cs.CL 2019-09 unverdicted novelty 7.0

Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
cs.LG 2026-05 unverdicted novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
MIPIC: Matryoshka Representation Learning via Self-Distilled Intra-Relational and Progressive Information Chaining
cs.CL 2026-04 unverdicted novelty 6.0

MIPIC trains nested Matryoshka representations via self-distilled intra-relational alignment with top-k CKA and progressive information chaining across depths, yielding competitive performance especially at extreme lo...
Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs
cs.LG 2026-04 unverdicted novelty 6.0

DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and...
A Multi-head-based architecture for effective morphological tagging in Russian with open dictionary
cs.CL 2026-04 unverdicted novelty 6.0

A multi-head attention model for Russian morphological tagging supports open dictionaries via subtoken splitting and reports 98-99% accuracy on grammatical categories while running efficiently on consumer hardware.
ASTRA: Mapping Art-Technology Institutions via Conceptual Axes, Text Embeddings, and Unsupervised Clustering
cs.DL 2026-03 accept novelty 6.0

ASTRA combines an eight-axis conceptual framework with text embeddings and unsupervised clustering to map and group 78 art-technology institutions into coherent thematic clusters.
Semantic knowledge guides innovation and drives cultural evolution
cs.MA 2025-10 unverdicted novelty 6.0

Semantic knowledge directs exploration to meaningful innovations, improves success rates, enables generalization, and synergizes with social learning to accelerate cumulative culture, as tested in agent-based models a...
Revisiting Feature Prediction for Learning Visual Representations from Video
cs.CV 2024-02 conditional novelty 6.0

V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
Demystifying CLIP Data
cs.CV 2023-09 accept novelty 6.0

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
Realised Volatility Forecasting: Machine Learning via Financial Word Embedding
q-fin.CP 2021-08 unverdicted novelty 6.0

News embeddings from financial text improve out-of-sample realized volatility forecasts for stocks, with stronger effects for stock-specific news and high-volatility periods, and yield gains when combined with benchmarks.
Toward General and Robust LLM-enhanced Text-attributed Graph Learning
cs.LG 2025-04 unverdicted novelty 5.0

UltraTAG organizes LLM-GNN methods for text-attributed graphs; UltraTAG-S adds LLM text propagation, augmentation, PageRank node selection, and edge reconfiguration to improve robustness on sparse data, with reported ...
Utilizing Pre-trained and Large Language Models for 10-K Items Segmentation
q-fin.GN 2025-02 unverdicted novelty 5.0

BERT4ItemSeg reaches macro-F1 of 0.9825 on core 10-K items across 3,737 annotated reports, outperforming GPT4ItemSeg (0.9567) and baselines.
From Node2Vec to GPT-based GraphRAG: scientific impact prediction across graph and language models
cs.DL 2026-05 unverdicted novelty 4.0

Directed citation graphs plus textual embeddings reach 0.84-0.85 AUC for top-P% impact classification while GPT-5.5/5.4 Nano prompts hit 0.87 but show no consistent gain from retrieved graph neighborhoods over target-...
Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task
cs.CL 2026-04 unverdicted novelty 4.0

Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance ...
DeepTrax: Embedding Graphs of Financial Transactions
cs.LG 2019-07 unverdicted novelty 4.0

DeepTrax learns embeddings for accounts and merchants in financial transaction graphs via methods inspired by standard graph embedding techniques, reporting strong link prediction performance and utility in fraud dete...