Attention is all you need

[V aswaniet al · 2017

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CL · 2019-06-26 · unverdicted · novelty 4.0

Sharing attention weights in adjacent Transformer layers yields 1.3X inference speedup with negligible BLEU loss on ten WMT and NIST tasks.

Showing 1 of 1 citing paper.

Sharing Attention Weights for Fast Transformer cs.CL · 2019-06-26 · unverdicted · none · ref 18
Sharing attention weights in adjacent Transformer layers yields 1.3X inference speedup with negligible BLEU loss on ten WMT and NIST tasks.