pith. machine review for the scientific record. sign in

arxiv: 1606.02891 · v2 · submitted 2016-06-09 · 💻 cs.CL

Recognition: unknown

Edinburgh Neural Machine Translation Systems for WMT 16

Authors on Pith no claims yet
classification 💻 cs.CL
keywords systemstranslationenglishdirectionsimprovementsneuralnewsparticipated
0
0 comments X
read the original abstract

We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English<->Czech, English<->German, English<->Romanian and English<->Russian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. We experimented with using automatic back-translations of the monolingual News corpus as additional training data, pervasive dropout, and target-bidirectional models. All reported methods give substantial improvements, and we see improvements of 4.3--11.2 BLEU over our baseline systems. In the human evaluation, our systems were the (tied) best constrained system for 7 out of 8 translation directions in which we participated.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

    cs.LG 2022-08 conditional novelty 7.0

    LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

  2. Deep Learning Scaling is Predictable, Empirically

    cs.LG 2017-12 unverdicted novelty 7.0

    Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.