word2vec Parameter Learning Explained

Xin Rong

arxiv: 1411.2738 · v4 · pith:SZPN35RVnew · submitted 2014-11-11 · 💻 cs.CL

word2vec Parameter Learning Explained

Xin Rong This is my paper

classification 💻 cs.CL

keywords modelsword2vecparameterderivationsequationsincludingintuitivelearning

0 comments

read the original abstract

The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector representations of words learned by word2vec models have been shown to carry semantic meanings and are useful in various NLP tasks. As an increasing number of researchers would like to experiment with word2vec or similar techniques, I notice that there lacks a material that comprehensively explains the parameter learning process of word embedding models in details, thus preventing researchers that are non-experts in neural networks from understanding the working mechanism of such models. This note provides detailed derivations and explanations of the parameter update equations of the word2vec models, including the original continuous bag-of-word (CBOW) and skip-gram (SG) models, as well as advanced optimization techniques, including hierarchical softmax and negative sampling. Intuitive interpretations of the gradient equations are also provided alongside mathematical derivations. In the appendix, a review on the basics of neuron networks and backpropagation is provided. I also created an interactive demo, wevi, to facilitate the intuitive understanding of the model.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Duplicate Bug Report Detection: How Far Are We?
cs.SE 2022-12 unverdicted novelty 6.0

A new bias-aware benchmark for duplicate bug report detection shows simpler techniques outperform recent sophisticated methods on most projects and match industry tools.
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
cs.LG 2020-12 unverdicted novelty 6.0

TabTransformer uses Transformer self-attention to generate contextual embeddings from categorical features in tabular data, outperforming prior deep learning methods by at least 1% mean AUC and matching tree-based ens...
To Use AI as Dice of Possibilities with Timing Computation
cs.AI 2026-05 unverdicted novelty 5.0

Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.