Efficient Estimation of Word Representations in Vector Space
Pith reviewed 2026-05-11 02:16 UTC · model grok-4.3
The pith
Two new neural network architectures learn continuous vector representations of words from massive text data with higher accuracy and far lower training cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
What carries the argument
The continuous bag-of-words and skip-gram architectures, shallow neural networks trained to predict surrounding words from a target word or the target word from its context to derive dense vector representations.
Load-bearing premise
That performance on the chosen word similarity and analogy test sets reliably indicates that the vectors capture general syntactic and semantic relationships rather than dataset-specific patterns.
What would settle it
Training the models on the 1.6 billion word dataset and finding no accuracy gain on the syntactic and semantic test sets relative to prior neural network methods, or requiring substantially more computation time to reach comparable performance.
read the original abstract
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes two novel neural network architectures (Continuous Bag-of-Words and Skip-gram) for learning continuous vector representations of words from very large corpora. It evaluates these representations on word similarity tasks against prior neural methods and introduces an analogy task for syntactic and semantic relations, claiming substantially higher accuracy at far lower computational cost, including training high-quality vectors on a 1.6 billion word dataset in less than a day.
Significance. If the reported accuracy gains and training-time reductions hold under scrutiny, the work is significant for establishing practical, scalable methods to produce high-quality word embeddings. The efficiency stems from the architectural simplifications and use of hierarchical softmax, enabling training on billion-word scales that were previously prohibitive. This has provided a foundation for subsequent embedding techniques and downstream NLP improvements.
major comments (2)
- [§4] §4 (Experimental results): The central efficiency claim rests on the reported training time (<1 day on 1.6B words) and accuracy improvements versus prior neural baselines, but the section provides insufficient detail on exact baseline re-implementations, hyperparameter search procedures, and whether the same hardware/resources were used for all methods. This makes it difficult to confirm the comparisons are free of post-hoc tuning.
- [§4.2] §4.2 (Evaluation on word analogy task): The state-of-the-art claim is made on a test set introduced by the authors themselves. While the task is a useful contribution, the manuscript does not include results on independent downstream tasks (e.g., named entity recognition or machine translation) or cross-corpus validation to support the broader interpretation that the vectors capture general syntactic and semantic relationships.
minor comments (3)
- [§2] §2 (Model architectures): The notation for the input/output layers and context window could be clarified with an explicit equation for the CBOW averaging operation to avoid ambiguity in implementation.
- [Table 1, Figure 2] Table 1 and Figure 2: The reported accuracy numbers and training times would benefit from error bars or multiple runs to indicate variability, especially given the stochastic nature of the training.
- [References] References: The comparison to prior work (e.g., neural language models by Bengio et al.) could include a more explicit discussion of why the proposed models avoid the computational bottlenecks of those approaches.
Simulated Author's Rebuttal
We thank the referee for the positive review and constructive comments. We address each major comment below and will make the indicated revisions to improve clarity and transparency.
read point-by-point responses
-
Referee: [§4] §4 (Experimental results): The central efficiency claim rests on the reported training time (<1 day on 1.6B words) and accuracy improvements versus prior neural baselines, but the section provides insufficient detail on exact baseline re-implementations, hyperparameter search procedures, and whether the same hardware/resources were used for all methods. This makes it difficult to confirm the comparisons are free of post-hoc tuning.
Authors: We agree that greater detail on the experimental setup would strengthen the comparisons. In the revised manuscript we will expand §4 with additional information on the re-implementations of the prior neural baselines, the hyperparameter ranges explored for each method, and explicit confirmation that all timing and accuracy measurements were performed under comparable hardware and resource constraints. revision: yes
-
Referee: [§4.2] §4.2 (Evaluation on word analogy task): The state-of-the-art claim is made on a test set introduced by the authors themselves. While the task is a useful contribution, the manuscript does not include results on independent downstream tasks (e.g., named entity recognition or machine translation) or cross-corpus validation to support the broader interpretation that the vectors capture general syntactic and semantic relationships.
Authors: The analogy task was introduced in this work precisely to probe syntactic and semantic relations in a controlled manner. While we recognize that evaluations on downstream tasks would provide further support, the scope of the paper centers on efficient learning of high-quality vectors and direct assessment via the new task. We will add a short discussion in the revised version acknowledging this limitation and outlining how the vectors could be applied to downstream problems. revision: partial
Circularity Check
No circularity; empirical claims rest on independent external benchmarks
full rationale
The paper defines CBOW and Skip-gram models via explicit objective functions (Eqs. 1-4) trained on raw text corpora, then measures vector quality solely on held-out similarity datasets (WordSim-353) and a newly constructed analogy test set. These evaluation sets are not constructed from the fitted parameters or training objective, nor do any central claims reduce to self-citation or renaming of inputs. The reported accuracy gains and computational savings are direct empirical outcomes against external references, satisfying the self-contained benchmark criterion.
Axiom & Free-Parameter Ledger
free parameters (2)
- vector dimensionality
- context window size
axioms (2)
- domain assumption Back-propagation through a single hidden layer produces useful word vectors when trained on next-word or context prediction.
- domain assumption Word similarity and analogy test sets are valid proxies for syntactic and semantic understanding.
Forward citations
Cited by 60 Pith papers
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
-
Language Models are Few-Shot Learners
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
-
REALM: Retrieval-Augmented Language Model Pre-Training
REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.
-
Intriguing properties of neural networks
Deep neural networks exhibit distributed high-level semantic representations and discontinuous input-output mappings vulnerable to transferable adversarial perturbations.
-
Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning
Proposes latent analogies and analogy transduction to enable compositional generalization to unseen goal-context pairs in offline GCRL, outperforming trajectory-stitching baselines on manipulation tasks.
-
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic onlin...
-
Differentially Private Sampling from Distributions via Wasserstein Projection
Proposes Wasserstein Projection Mechanism for differentially private sampling that optimizes Wasserstein distance utility and provides convergence guarantees for approximate computation.
-
OZ-TAL: Online Zero-Shot Temporal Action Localization
Defines OZ-TAL task and presents a training-free VLM-based method that outperforms prior approaches for online and offline zero-shot temporal action localization on THUMOS14 and ActivityNet-1.3.
-
An Experimental Method to Study Opinion Diffusion in Human-AI Hybrid Societies
Hybrid human-AI networks in 5x5 grids reached lower final polarization than human-only networks after eight rounds of opinion revision on polarizing topics.
-
EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement
EditRefiner uses a perception-reasoning-action-evaluation agent loop and the EditFHF-15K human feedback dataset to refine text-guided image edits more accurately than prior methods.
-
Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation
Autoregressive semantic ID generation creates tree-induced probability correlations that prevent generative recommenders from capturing simple patterns; Latte adds latent tokens to relax these correlations.
-
Rational Communication Shapes Morphological Composition
Using historical corpora and the Rational Speech Act framework, attested English morphological compositions are ranked higher than plausible alternatives from the same time period when both semantic recoverability and...
-
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders
EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.
-
Identifying and Characterizing Semantic Clones of Solidity Functions
A code-and-comment analysis method detects semantic clones in Solidity functions with 59% overall precision (84% for same-name functions) and 97% recall on 300k contracts, plus LLM summaries for uncommented code.
-
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
TabGRAA enables self-improving tabular language models through iterative group-relative advantage alignment using modular automated quality signals like distinguishability classifiers.
-
Beyond Nodes vs. Edges: A Multi-View Fusion Framework for Provenance-Based Intrusion Detection
PROVFUSION fuses three complementary views of provenance data with lightweight schemes and voting to achieve higher detection accuracy and lower false positives than node- or edge-only baselines on nine benchmarks.
-
Learning to Discover at Test Time
TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.
-
GRAB: A Risk Taxonomy--Grounded Benchmark for Unsupervised Topic Discovery in Financial Disclosures
GRAB is a benchmark dataset of 1.61M sentences from 8,247 10-K filings with taxonomy-anchored weak supervision labels for standardized evaluation of unsupervised topic models on financial risk disclosures.
-
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
-
Adversarial Video Promotion Against Text-to-Video Retrieval
Pioneers ViPro, the first attack to adversarially promote videos in text-to-video retrieval, using Modal Refinement to improve black-box transferability across multiple targets.
-
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec converts state-of-the-art vision-language models into universal multimodal embedders via contrastive training on the new MMEB benchmark, delivering 10-20% absolute gains over prior models on both in-distributi...
-
A Simple Framework for Contrastive Learning of Visual Representations
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
-
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
-
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
-
Language Models as Knowledge Bases?
BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
-
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
The paper releases the first multimodal English-Hindi machine translation dataset of 31,525 segments with images and a challenge test set of 1,400 segments selected via embedding similarity for image-resolvable ambiguities.
-
Tight Sensitivity Bounds For Smaller Coresets
New algorithms compute provably tight sensitivity bounds for matrix rows, yielding smaller coresets for LMS approximation of affine k-subspaces via an iterative exact method and a dimensionality-reduction trick.
-
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
-
DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
DEL is a new loss for LLM numerical learning that applies supervised digit entropy optimization and extends to floating-point numbers, showing improved accuracy and distance metrics over prior methods on math benchmarks.
-
A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation
MAFIG is a multi-agent framework that uses LLM agents and evaluators to generate reading comprehension items with significantly higher adherence to specified feature constraints than single-agent baselines.
-
PipeANN-Filter: An Efficient Filtered Vector Search System on SSD
PipeANN-Filter improves filtered vector search latency and throughput on SSD by exploring a superset of valid vectors identified via probabilistic filters and verifying attributes only after selecting top-k candidates.
-
Multi-agent AI systems outperform human teams in creativity
Multi-agent LLM teams outperform human teams in creativity (d=1.50) across tasks by producing more novel ideas, with distinct semantic exploration patterns predicting success for each group.
-
Polar probe linearly decodes semantic structures from LLMs
LLMs represent semantic relations geometrically via embedding distance and direction; a linear Polar Probe decodes these structures from middle-layer activations and generalizes to new entities.
-
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via inte...
-
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
-
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs
Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation w...
-
Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes
Fixed 16-bit binary token codes can replace trainable input embeddings in 32-layer decoder-only models while maintaining comparable held-out perplexity on 17B tokens.
-
Semantic Smoothing for Language Models via Distribution Estimation and Embeddings
Semantic smoothing formulates next-word distribution estimation under KL loss with embedding-based KL-proximity side information, yielding an interpolation estimator with worst-case risk O(min{Δ, d/n}) that empiricall...
-
TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts
TAS-LoRA attaches a mixture of LoRA experts to a supernet and uses a dynamic router plus group-wise initialization to let different architecture subnets learn distinct features, yielding higher accuracy than prior TAS...
-
Query-efficient model evaluation using cached responses
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
-
The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks
Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
-
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge
Post-2015 AI adoption in science grew exponentially across domains but stayed limited to CS-linked topics, carried citation premiums, higher retractions, and showed rising Asian middle-income country involvement.
-
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge
AI adoption in science has shown exponential growth since 2015 across domains but stays confined to few CS-linked topics, carries citation premiums, higher retraction rates, and uneven geographic spread, leaving its t...
-
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge
AI use in science has grown exponentially since 2015 but stays confined to computer science and statistics topics, shows higher retraction rates and citations, and follows distinct global adoption patterns.
-
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks
A dual-purpose benchmark supplies two text-derived knowledge graphs and one expert reference graph on the same biomedical corpus to jointly measure construction method quality and GNN robustness via semi-supervised no...
-
Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch
Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time sol...
-
Deep Kernel Learning for Stratifying Glaucoma Trajectories
A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current ...
-
The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text
TEA Nets extracts agents, events, and targets from text to reveal emotional and semantic patterns in conspiracy theories and psychotherapy transcripts from humans and LLMs.
-
ImproBR: Bug Report Improver Using LLMs
ImproBR combines a hybrid detector with GPT-4o mini and RAG to raise bug report structural completeness from 7.9% to 96.4% and executable steps from 28.8% to 67.6% on 139 Mojira reports.
-
ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models
ADE scales multi-anchor word representations to transformers via Vocabulary Projection, Grouped Positional Encoding, and context-aware reweighting, achieving 98.7% fewer trainable parameters than DeBERTa-v3-base while...
-
Self-supervised pretraining for an iterative image size agnostic vision transformer
A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.
-
Context-Aware Search and Retrieval Under Token Erasure
Assigning higher redundancy to semantically important query features reduces retrieval error probability under token erasures, via multivariate Gaussian approximations of similarity margins and supporting numerical results.
-
Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
Embedding Arithmetic performs vector operations in the embedding space of T2I models to mitigate bias at inference time, outperforming baselines on diversity while preserving coherence via a new Concept Coherence Score.
-
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation
LLMs prompted with few-shot examples and rationales generate better reasoned distractors for MCQs than fine-tuned contrastive models across six benchmarks.
-
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning
REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
-
SIMMER: Cross-Modal Food Image--Recipe Retrieval via MLLM-Based Embedding
SIMMER uses a single multimodal LLM (VLM2Vec) with custom prompts and partial-recipe augmentation to embed food images and recipes, achieving new state-of-the-art retrieval accuracy on Recipe1M.
-
AFGNN: API Misuse Detection using Graph Neural Networks and Clustering
AFGNN detects API misuses in Java code more effectively than prior methods by representing usage as graphs and clustering learned embeddings from self-supervised training.
-
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...
-
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
-
Detecting RAG Advertisements Across Advertising Styles
Entity recognition models detect ads in RAG responses effectively and stay robust when advertisers switch styles, while lightweight models like random forests and SVMs become brittle under the same changes.
Reference graph
Works this paper leans on
- [1]
- [2]
- [3]
-
[4]
R. Collobert and J. Weston. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In International Conference on Machine Learning, ICML, 2008
work page 2008
-
[5]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. Kuksa. Natural Lan- guage Processing (Almost) from Scratch. Journal of Machine Learning Research, 12:2493- 2537, 2011
work page 2011
- [6]
- [7]
-
[8]
J. Elman. Finding Structure in Time. Cognitive Science, 14, 179-211, 1990
work page 1990
- [9]
-
[10]
G.E. Hinton, J.L. McClelland, D.E. Rumelhart. Distributed representations. In: Parallel dis- tributed processing: Explorations in the microstructure of cognition. V olume 1: Foundations, MIT Press, 1986
work page 1986
-
[11]
D.A. Jurgens, S.M. Mohammad, P.D. Turney, K.J. Holyoak. Semeval-2012 task 2: Measuring degrees of relational similarity. In: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), 2012
work page 2012
- [12]
-
[13]
T. Mikolov. Language Modeling for Speech Recognition in Czech, Masters thesis, Brno Uni- versity of Technology, 2007
work page 2007
-
[14]
T. Mikolov, J. Kopeck ´y, L. Burget, O. Glembek and J. ˇCernock´y. Neural network based lan- guage models for higly inflective languages, In: Proc. ICASSP 2009
work page 2009
-
[15]
T. Mikolov, M. Karafi ´at, L. Burget, J. ˇCernock´y, S. Khudanpur. Recurrent neural network based language model, In: Proceedings of Interspeech, 2010
work page 2010
-
[16]
T. Mikolov, S. Kombrink, L. Burget, J. ˇCernock´y, S. Khudanpur. Extensions of recurrent neural network language model, In: Proceedings of ICASSP 2011
work page 2011
-
[17]
T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. ˇCernock´y. Empirical Evaluation and Com- bination of Advanced Language Modeling Techniques, In: Proceedings of Interspeech, 2011. 4The code is available at https://code.google.com/p/word2vec/ 11
work page 2011
-
[18]
T. Mikolov, A. Deoras, D. Povey, L. Burget, J. ˇCernock´y. Strategies for Training Large Scale Neural Network Language Models, In: Proc. Automatic Speech Recognition and Understand- ing, 2011
work page 2011
-
[19]
T. Mikolov. Statistical Language Models based on Neural Networks. PhD thesis, Brno Univer- sity of Technology, 2012
work page 2012
-
[20]
T. Mikolov, W.T. Yih, G. Zweig. Linguistic Regularities in Continuous Space Word Represen- tations. NAACL HLT 2013
work page 2013
-
[21]
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. Accepted to NIPS 2013
work page 2013
-
[22]
A. Mnih, G. Hinton. Three new graphical models for statistical language modelling. ICML, 2007
work page 2007
-
[23]
A. Mnih, G. Hinton. A Scalable Hierarchical Distributed Language Model. Advances in Neural Information Processing Systems 21, MIT Press, 2009
work page 2009
-
[24]
A. Mnih, Y .W. Teh. A fast and simple algorithm for training neural probabilistic language models. ICML, 2012
work page 2012
- [25]
-
[26]
D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning internal representations by back- propagating errors. Nature, 323:533.536, 1986
work page 1986
-
[27]
H. Schwenk. Continuous space language models. Computer Speech and Language, vol. 21, 2007
work page 2007
-
[28]
R. Socher, E.H. Huang, J. Pennington, A.Y . Ng, and C.D. Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In NIPS, 2011
work page 2011
- [29]
-
[30]
P. D. Turney. Measuring Semantic Similarity by Latent Relational Analysis. In: Proc. Interna- tional Joint Conference on Artificial Intelligence, 2005
work page 2005
-
[31]
A. Zhila, W.T. Yih, C. Meek, G. Zweig, T. Mikolov. Combining Heterogeneous Models for Measuring Relational Similarity. NAACL HLT 2013
work page 2013
-
[32]
G. Zweig, C.J.C. Burges. The Microsoft Research Sentence Completion Challenge, Microsoft Research Technical Report MSR-TR-2011-129, 2011. 12
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.