Synthetic and Natural Noise Both Break Neural Machine Translation

Yonatan Belinkov; Yonatan Bisk

arxiv: 1711.02173 · v2 · pith:NT2NHCCAnew · submitted 2017-11-06 · 💻 cs.CL · cs.LG

Synthetic and Natural Noise Both Break Neural Machine Translation

Yonatan Belinkov , Yonatan Bisk This is my paper

classification 💻 cs.CL cs.LG

keywords modelsneuralnoisenoisytranslationfindlearnmachine

0 comments

read the original abstract

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Robust Machine Translation with Domain Sensitive Pseudo-Sources: Baidu-OSU WMT19 MT Robustness Shared Task System Report
cs.CL 2019-06 unverdicted novelty 3.0

Baidu-OSU WMT19 system achieves >10 BLEU gain on En-Fr and Fr-En social media translation via domain sensitive training and pseudo noisy sources.