pith. sign in

arxiv: 1705.00440 · v1 · pith:7XOMYD3Mnew · submitted 2017-05-01 · 💻 cs.CL

Data Augmentation for Low-Resource Neural Machine Translation

classification 💻 cs.CL
keywords translationlow-resourcequalityaugmentationbleudatamachineneural
0
0 comments X
read the original abstract

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.