pith. sign in

hub

Effective Approaches to Attention-based Neural Machine Translation

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it
abstract

An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems which already incorporate known techniques such as dropout. Our ensemble model using different attention architectures has established a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker.

hub tools

citation-role summary

background 1 method 1 other 1

citation-polarity summary

clear filters

representative citing papers

Graph-based Knowledge Distillation by Multi-head Attention Network

cs.LG · 2019-07-04 · unverdicted · novelty 6.0

Multi-head attention constructs a graph of dataset relations from the teacher embedding procedure and transfers it to the student via multi-task learning, yielding 7.05% higher CIFAR-100 accuracy than the student alone and 2.46% above prior SOTA.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

Skeleton-based Coherence Modeling in Narratives

cs.CL · 2026-04-02 · unverdicted · novelty 4.0

Sentence-level models outperform skeleton-based approaches for narrative coherence despite a new SSN network improving on cosine and Euclidean baselines.

The General Theory of Localization Methods

cs.LG · 2026-05-20 · unverdicted · novelty 3.0 · 2 refs

The localization method is presented as a unifying framework connecting kernel methods, MeanShift, Hopfield networks, LLE, fuzzy inference, denoising autoencoders, and Transformers via local models and the localization trick.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Revisiting Neural Processes via Fourier Transform and Volterra Series cs.LG · 2026-05-31 · unverdicted · none · ref 115 · internal anchor

    Introduces SFConvCNPs and SFVConvCNPs using set Fourier convolutions and Volterra expansions for translation-equivariant neural processes on irregular data with global receptive fields and linear scaling.

  • Graph-based Knowledge Distillation by Multi-head Attention Network cs.LG · 2019-07-04 · unverdicted · none · ref 18 · internal anchor

    Multi-head attention constructs a graph of dataset relations from the teacher embedding procedure and transfers it to the student via multi-task learning, yielding 7.05% higher CIFAR-100 accuracy than the student alone and 2.46% above prior SOTA.

  • Creating A Neural Pedagogical Agent by Jointly Learning to Review and Assess cs.LG · 2019-06-26 · unverdicted · none · ref 28 · internal anchor

    Bidirectional RNN with attention models real-time user knowledge from question-response sequences to predict correctness, outperforming baselines especially for new users on a large TOEIC mobile app dataset.

  • The General Theory of Localization Methods cs.LG · 2026-05-20 · unverdicted · none · ref 8 · 2 links · internal anchor

    The localization method is presented as a unifying framework connecting kernel methods, MeanShift, Hopfield networks, LLE, fuzzy inference, denoising autoencoders, and Transformers via local models and the localization trick.

  • Positional Encoding in Transformer-Based Time Series Models: A Survey cs.LG · 2025-02-17 · unverdicted · none · ref 33 · internal anchor

    A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.