Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

Longtu Zhang; Mamoru Komachi

arxiv: 1903.00149 · v1 · pith:P2T5JQDKnew · submitted 2019-03-01 · 💻 cs.CL

Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

Longtu Zhang , Mamoru Komachi This is my paper

classification 💻 cs.CL

keywords leveldatasub-charactercharactertranslationunmtchinese-japaneseideograph

0 comments

read the original abstract

Unsupervised neural machine translation (UNMT) requires only monolingual data of similar language pairs during training and can produce bi-directional translation models with relatively good performance on alphabetic languages (Lample et al., 2018). However, no research has been done to logographic language pairs. This study focuses on Chinese-Japanese UNMT trained by data containing sub-character (ideograph or stroke) level information which is decomposed from character level data. BLEU scores of both character and sub-character level systems were compared against each other and the results showed that despite the effectiveness of UNMT on character level data, sub-character level data could further enhance the performance, in which the stroke level system outperformed the ideograph level system.

This paper has not been read by Pith yet.

Chinese-Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information

discussion (0)