Depth Growing for Neural Machine Translation

Fei Gao; Fei Tian; Jianhuang Lai; Lijun Wu; Tao Qin; Tie-Yan Liu; Yingce Xia; Yiren Wang

arxiv: 1907.01968 · v1 · pith:DK6UF2H6new · submitted 2019-07-03 · 💻 cs.CL

Depth Growing for Neural Machine Translation

Lijun Wu , Yiren Wang , Yingce Xia , Fei Tian , Fei Gao , Tao Qin , Jianhuang Lai , Tie-Yan Liu This is my paper

classification 💻 cs.CL

keywords translationdepthneuralenglishgrowingmachinemodelsapeterswu

0 comments

read the original abstract

While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT$14$ English$\to$German and English$\to$French translation tasks\footnote{Our code is available at \url{https://github.com/apeterswu/Depth_Growing_NMT}}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
cs.LG 2024-01 conditional novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.