Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

Kim Anh Nguyen; Ngoc Thang Vu; Sabine Schulte im Walde

arxiv: 1804.05388 · v2 · pith:NSJJ3VIJnew · submitted 2018-04-15 · 💻 cs.CL

Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

Kim Anh Nguyen , Sabine Schulte im Walde , Ngoc Thang Vu This is my paper

classification 💻 cs.CL

keywords datasetssimilaritymodelssemanticacrossvietnameseantonymsassess

0 comments

read the original abstract

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.

This paper has not been read by Pith yet.

Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

discussion (0)