Identifying Computer-Translated Paragraphs using Coherence Features

Hoang-Quoc Nguyen-Son; Huy H. Nguyen; Isao Echizen; Junichi Yamagishi; Ngoc-Dung T. Tieu

arxiv: 1812.10896 · v1 · pith:GPTVVYYBnew · submitted 2018-12-28 · 💻 cs.CL

Identifying Computer-Translated Paragraphs using Coherence Features

Hoang-Quoc Nguyen-Son , Ngoc-Dung T. Tieu , Huy H. Nguyen , Junichi Yamagishi , Isao Echizen This is my paper

classification 💻 cs.CL

keywords coherencefeaturesparagraphsaccuracybestcomputer-translatedequalerror

0 comments

read the original abstract

We have developed a method for extracting the coherence features from a paragraph by matching similar words in its sentences. We conducted an experiment with a parallel German corpus containing 2000 human-created and 2000 machine-translated paragraphs. The result showed that our method achieved the best performance (accuracy = 72.3%, equal error rate = 29.8%) when it is compared with previous methods on various computer-generated text including translation and paper generation (best accuracy = 67.9%, equal error rate = 32.0%). Experiments on Dutch, another rich resource language, and a low resource one (Japanese) attained similar performances. It demonstrated the efficiency of the coherence features at distinguishing computer-translated from human-created paragraphs on diverse languages.

This paper has not been read by Pith yet.

Identifying Computer-Translated Paragraphs using Coherence Features

discussion (0)