A Study of Reinforcement Learning for Neural Machine Translation

Fei Tian; Jianhuang Lai; Lijun Wu; Tao Qin; Tie-Yan Liu

arxiv: 1808.08866 · v1 · pith:PDRQZQYHnew · submitted 2018-08-27 · 💻 cs.LG · cs.AI· stat.ML

A Study of Reinforcement Learning for Neural Machine Translation

Lijun Wu , Fei Tian , Tao Qin , Jianhuang Lai , Tie-Yan Liu This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords translationlearningperformancereinforcementwmt17chinese-englishdataespecially

0 comments

read the original abstract

Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system. However, due to its instability, successfully RL training is challenging, especially in real-world systems where deep models and large datasets are leveraged. In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. We provide a comprehensive comparison of several important factors (e.g., baseline reward, reward shaping) in RL training. Furthermore, to fill in the gap that it remains unclear whether RL is still beneficial when monolingual data is used, we propose a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data. By integrating all our findings, we obtain competitive results on WMT14 English- German, WMT17 English-Chinese, and WMT17 Chinese-English translation tasks, especially setting a state-of-the-art performance on WMT17 Chinese-English translation task.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Enhancing Speech Large Language Models through Reinforced Behavior Alignment
cs.CL 2025-08 unverdicted novelty 5.0

Reinforced Behavior Alignment (RBA) uses self-synthesized data from a teacher LLM and reinforcement learning to close the instruction-following gap in SpeechLMs, outperforming distillation and reaching SOTA on spoken ...