Vocabulary Selection Strategies for Neural Machine Translation

· 2016 · cs.CL · arXiv 1610.00072

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Classical translation models constrain the space of possible outputs by selecting a subset of translation rules based on the input sentence. Recent work on improving the efficiency of neural translation models adopted a similar strategy by restricting the output vocabulary to a subset of likely candidates given the source. In this paper we experiment with context and embedding-based selection methods and extend previous work by examining speed and accuracy trade-offs in more detail. We show that decoding time on CPUs can be reduced by up to 90% and training time by 25% on the WMT15 English-German and WMT16 English-Romanian tasks at the same or only negligible change in accuracy. This brings the time to decode with a state of the art neural translation system to just over 140 msec per sentence on a single CPU core for English-German.

representative citing papers

Sharing Attention Weights for Fast Transformer

cs.CL · 2019-06-26 · unverdicted · novelty 4.0

Sharing attention weights in adjacent Transformer layers yields 1.3X inference speedup with negligible BLEU loss on ten WMT and NIST tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Sharing Attention Weights for Fast Transformer cs.CL · 2019-06-26 · unverdicted · none · ref 9 · internal anchor
Sharing attention weights in adjacent Transformer layers yields 1.3X inference speedup with negligible BLEU loss on ten WMT and NIST tasks.

Vocabulary Selection Strategies for Neural Machine Translation

fields

years

verdicts

representative citing papers

citing papers explorer