Transfer Learning for Low-Resource Neural Machine Translation

arxiv: 1604.02201 · v1 · pith:IMB6CTUGnew · submitted 2016-04-08 · 💻 cs.CL

Transfer Learning for Low-Resource Neural Machine Translation

Barret Zoph , Deniz Yuret , Jonathan May , Kevin Knight This is my paper

classification 💻 cs.CL

keywords low-resourcemachinetransfertranslationbleulearninglanguagemodel

0 comments p. Extension

pith:IMB6CTUG Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{IMB6CTUG}

Prints a linked pith:IMB6CTUG badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

The encoder-decoder framework for neural machine translation (NMT) has been shown effective in large data scenarios, but is much less effective for low-resource languages. We present a transfer learning method that significantly improves Bleu scores across a range of low-resource languages. Our key idea is to first train a high-resource language pair (the parent model), then transfer some of the learned parameters to the low-resource pair (the child model) to initialize and constrain training. Using our transfer learning method we improve baseline NMT models by an average of 5.6 Bleu on four low-resource language pairs. Ensembling and unknown word replacement add another 2 Bleu which brings the NMT performance on low-resource machine translation close to a strong syntax based machine translation (SBMT) system, exceeding its performance on one language pair. Additionally, using the transfer learning model for re-scoring, we can improve the SBMT system by an average of 1.3 Bleu, improving the state-of-the-art on low-resource machine translation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Role of Vocabularies in Learning Sparse Representations for Ranking
cs.IR 2025-09 unverdicted novelty 5.0

Larger 100K vocabularies in SPLADE models, especially those initialized with ESPLADE pretraining, improve retrieval effectiveness after pruning compared to 32K baselines while keeping similar efficiency.