An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

arxiv: 1607.05368 · v1 · pith:P2Y4C3ANnew · submitted 2016-07-19 · 💻 cs.CL

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

Jey Han Lau , Timothy Baldwin This is my paper

classification 💻 cs.CL

keywords doc2vecdocumentembeddingsembeddingempiricalevaluationmikolovmodels

0 comments p. Extension

pith:P2Y4C3AN Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{P2Y4C3AN}

Prints a linked pith:P2Y4C3AN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be further improved by using pre-trained word embeddings. We also provide recommendations on hyper-parameter settings for general purpose applications, and release source code to induce document embeddings using our trained doc2vec models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

No Data? No Problem: Synthesizing Security Graphs for Better Intrusion Detection
cs.CR 2025-06 unverdicted novelty 5.0

PROVSYN synthesizes high-fidelity security provenance graphs via graph generation and LLMs to augment imbalanced datasets, improving downstream APT detection accuracy by up to 38% on benchmarks.
Optimising for the long game: methodological challenges in energy system optimisation pathways
physics.soc-ph 2025-12 accept novelty 4.0

A systematic review of energy system optimization pathways identifies foresight choices, end effects, resolution trade-offs, and investment dynamics as key methodological challenges and recommends improvements to avoi...