pith. sign in

hub

Effective Approaches to Attention-based Neural Machine Translation

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it
abstract

An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems which already incorporate known techniques such as dropout. Our ensemble model using different attention architectures has established a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker.

hub tools

citation-role summary

background 1 method 1 other 1

citation-polarity summary

clear filters

representative citing papers

Graph-based Knowledge Distillation by Multi-head Attention Network

cs.LG · 2019-07-04 · unverdicted · novelty 6.0

Multi-head attention constructs a graph of dataset relations from the teacher embedding procedure and transfers it to the student via multi-task learning, yielding 7.05% higher CIFAR-100 accuracy than the student alone and 2.46% above prior SOTA.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Skeleton-based Coherence Modeling in Narratives

cs.CL · 2026-04-02 · unverdicted · novelty 4.0

Sentence-level models outperform skeleton-based approaches for narrative coherence despite a new SSN network improving on cosine and Euclidean baselines.

citing papers explorer

Showing 22 of 22 citing papers.