Saliency-driven Word Alignment Interpretation for Neural Machine Translation

Hainan Xu; Philipp Koehn; Shuoyang Ding

arxiv: 1906.10282 · v2 · pith:C6DF4TEGnew · submitted 2019-06-25 · 💻 cs.CL

Saliency-driven Word Alignment Interpretation for Neural Machine Translation

Shuoyang Ding , Hainan Xu , Philipp Koehn This is my paper

Pith reviewed 2026-05-25 17:13 UTC · model grok-4.3

classification 💻 cs.CL

keywords neural machine translationword alignmentsaliencyinterpretabilityTransformerforce decodingalignment quality

0 comments

The pith

NMT models learn interpretable word alignments that saliency methods can extract.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that neural machine translation models, including Transformers, do learn word alignments even though they are often seen as not doing so. These alignments become visible only when using saliency-based interpretation techniques that measure how much each source word influences target word predictions. The methods work without changing the model and apply in both forced and free decoding. If correct, this means alignment information is already present in standard NMT training and can be recovered post-hoc for analysis or use.

Core claim

NMT models learn interpretable word alignments, revealed by saliency-driven interpretation methods. Under force decoding, these alignments exceed fast-align quality for some systems, and in free decoding they align well with automatic tools. The methods are model-agnostic and require no parameter updates.

What carries the argument

Saliency scores that quantify the contribution of each source word to the model's output predictions for target words.

Load-bearing premise

Saliency scores accurately reflect the word alignment information learned by the model rather than unrelated computational effects.

What would settle it

If saliency-based alignments show no better agreement with human or gold alignments than random baselines when compared to fast-align results.

Figures

Figures reproduced from arXiv: 1906.10282 by Hainan Xu, Philipp Koehn, Shuoyang Ding.

**Figure 1.** Figure 1: Comparison of our saliency-based word alignment interpretation of convolutional NMT model with reference and attention interpretation. in computer-aided translation. When aiming for the most accurate alignments, the state-of-the-art tools include GIZA++ (Brown et al. , 1993 ; Och and Ney , 2003) and fast-align (Dyer et al. , 2013), which are all external models invented in SMT era and need to be run as a s… view at source ↗

**Figure 2.** Figure 2: Saliency interpretation of FConv de-en model [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Saliency interpretation of Transformer de-en mod [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Despite their original goal to jointly learn to align and translate, Neural Machine Translation (NMT) models, especially Transformer, are often perceived as not learning interpretable word alignments. In this paper, we show that NMT models do learn interpretable word alignments, which could only be revealed with proper interpretation methods. We propose a series of such methods that are model-agnostic, are able to be applied either offline or online, and do not require parameter update or architectural change. We show that under the force decoding setup, the alignments induced by our interpretation method are of better quality than fast-align for some systems, and when performing free decoding, they agree well with the alignments induced by automatic alignment tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Saliency methods extract alignments from NMT models that beat fast-align in some forced-decoding cases, but it is not clear they isolate what the model actually learned.

read the letter

The main thing to know is that this paper applies saliency techniques to pull word alignments out of pre-trained NMT systems, including Transformers, without any retraining or architecture changes. The methods are model-agnostic and can run either offline or online. Under forced decoding the induced alignments sometimes outperform fast-align, and in free decoding they line up reasonably with standard automatic aligners. That is the concrete empirical result on offer.

Referee Report

2 major / 2 minor

Summary. The paper claims that NMT models (including Transformers) learn interpretable word alignments that can be recovered using a series of model-agnostic saliency-based interpretation methods applicable offline or online without parameter updates or architectural changes. Under forced decoding the induced alignments are reported to exceed fast-align quality for some systems; under free decoding they agree well with automatic alignment tools.

Significance. If the saliency methods are shown to isolate alignments actually learned by the model rather than gradient or perturbation artifacts, the result would be significant for NMT interpretability research by providing a practical way to inspect alignments in pre-trained models. The model-agnostic and no-retraining design is a clear strength that enables direct application to existing systems.

major comments (2)

[Abstract / Experimental Results] Abstract and experimental sections: the central claim that saliency scores recover alignments the model has learned (rather than gradient saturation, decoder-state dependencies, or input-normalization artifacts) is load-bearing, yet the manuscript provides no ablations, random baselines, or controls that would distinguish these possibilities. This directly affects the force-decoding and free-decoding comparisons.
[Experimental Results] The reported superiority over fast-align under forced decoding and agreement with automatic tools under free decoding lacks details on statistical significance testing, variance across multiple runs, or dataset-specific breakdowns, making it impossible to assess whether the differences are robust.

minor comments (2)

[Methods] Notation for the different saliency variants (model-agnostic offline vs. online) should be introduced with explicit equations or pseudocode early in the methods section to improve readability.
[Related Work] The paper should include a short related-work subsection contrasting the proposed saliency approach with prior gradient- or attention-based alignment extraction methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, agreeing where the manuscript is lacking and outlining planned revisions.

read point-by-point responses

Referee: [Abstract / Experimental Results] Abstract and experimental sections: the central claim that saliency scores recover alignments the model has learned (rather than gradient saturation, decoder-state dependencies, or input-normalization artifacts) is load-bearing, yet the manuscript provides no ablations, random baselines, or controls that would distinguish these possibilities. This directly affects the force-decoding and free-decoding comparisons.

Authors: We acknowledge that the manuscript does not include explicit ablations, random baselines, or controls to isolate learned alignments from potential artifacts such as gradient saturation or decoder-state dependencies. While the reported comparisons to fast-align and automatic tools provide supporting evidence, they do not fully rule out these alternatives. In the revised version we will add random saliency baselines and targeted controls for decoder dependencies and input normalization. revision: yes
Referee: [Experimental Results] The reported superiority over fast-align under forced decoding and agreement with automatic tools under free decoding lacks details on statistical significance testing, variance across multiple runs, or dataset-specific breakdowns, making it impossible to assess whether the differences are robust.

Authors: The current results are presented as averages without statistical tests, variance, or per-dataset breakdowns. We will incorporate bootstrap significance testing, report standard deviations from multiple runs where feasible, and add dataset-specific result tables in the revision to allow assessment of robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: methods applied to pre-trained models with external comparisons

full rationale

The paper applies saliency-based interpretation techniques to existing pre-trained NMT models and evaluates the resulting alignments against independent external tools (fast-align and automatic aligners). No equations, parameters, or central claims are defined in terms of the paper's own outputs or fitted values. The derivation chain consists of standard gradient/perturbation computations followed by post-hoc comparison, with no self-definitional steps, fitted-input predictions, or load-bearing self-citations that reduce the result to the input by construction. This is the expected non-circular outcome for an interpretation study on fixed models.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that saliency-based scores faithfully reflect learned alignments; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Saliency methods applied to NMT models extract meaningful word alignment information
The paper's claim that alignments can be revealed with proper interpretation methods depends on this assumption about the validity of saliency for alignment extraction.

pith-pipeline@v0.9.0 · 5641 in / 1096 out tokens · 28649 ms · 2026-05-25T17:13:34.304192+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 5 internal anchors

[1]

Tamer Alkhouli, Gabriel Bretschner, and Hermann Ney. 2018. https://aclanthology.info/papers/W18-6318/w18-6318 On the alignment problem in multi-head attention-based neural machine translation . In Proceedings of the Third Conference on Machine Translation: Research Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018 , pages 177--185

work page 2018
[2]

Tamer Alkhouli, Gabriel Bretschner, Jan - Thorsten Peter, Mohammed Hethnawi, Andreas Guta, and Hermann Ney. 2016. http://aclweb.org/anthology/W/W16/W16-2206.pdf Alignment-based neural machine translation . In Proceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany , pages 54--65

work page 2016
[3]

Mihael Arcan, Marco Turchi, Sara Tonelli, and Paul Buitelaar. 2014. Enhancing statistical machine translation with bilingual terminology in a cat environment. In Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), pages 54--68

work page 2014
[4]

Sebastian Bach, Alexander Binder, Gr \'e goire Montavon, Frederick Klauschen, Klaus-Robert M \"u ller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140

work page 2015
[5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. http://arxiv.org/abs/1409.0473 Neural machine translation by jointly learning to align and translate . CoRR, abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014
[6]

Gosse Bouma and Yannick Parmentier, editors. 2014. http://aclweb.org/anthology/E/E14/ Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26-30, 2014, Gothenburg, Sweden . The Association for Computer Linguistics

work page 2014
[7]

Brown, Stephen Della Pietra, Vincent J

Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311

work page 1993
[8]

Le, and Oriol Vinyals

William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals. 2016. https://doi.org/10.1109/ICASSP.2016.7472621 Listen, attend and spell: A neural network for large vocabulary conversational speech recognition . In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , pages 4960--4964

work page doi:10.1109/icassp.2016.7472621 2016
[9]

Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. https://doi.org/10.18653/v1/P17-1177 Improved neural machine translation with a syntax-aware encoder and decoder . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1936--1945

work page doi:10.18653/v1/p17-1177 2017
[10]

Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. http://papers.nips.cc/paper/5847-attention-based-models-for-speech-recognition Attention-based models for speech recognition . In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 201...

work page 2015
[11]

Yanzhuo Ding, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. https://doi.org/10.18653/v1/P17-1106 Visualizing and understanding neural machine translation . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1150--1159

work page doi:10.18653/v1/p17-1106 2017
[12]

Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. http://aclweb.org/anthology/N/N13/N13-1073.pdf A simple, fast, and effective reparameterization of IBM model 2 . In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta...

work page 2013
[13]

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. http://proceedings.mlr.press/v70/gehring17a.html Convolutional sequence to sequence learning . In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pages 1243--1252

work page 2017
[14]

Hamidreza Ghader and Christof Monz. 2017. https://aclanthology.info/papers/I17-1004/i17-1004 What does attention in neural machine translation pay attention to? In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017 - Volume 1: Long Papers , pages 30--39

work page 2017
[15]

Fern, and Prasad Tadepalli

Reza Ghaeini, Xiaoli Z. Fern, and Prasad Tadepalli. 2018. https://aclanthology.info/papers/D18-1537/d18-1537 Interpreting recurrent and attention-based neural models: a case study on natural language inference . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, page...

work page 2018
[16]

Eva Hasler, Adri \` a de Gispert, Gonzalo Iglesias, and Bill Byrne. 2018. https://aclanthology.info/papers/N18-2081/n18-2081 Neural machine translation decoding with terminology constraints . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orl...

work page 2018
[17]

Philipp Koehn and Rebecca Knowles. 2017. https://aclanthology.info/papers/W17-3204/w17-3204 Six challenges for neural machine translation . In Proceedings of the First Workshop on Neural Machine Translation, NMT@ACL 2017, Vancouver, Canada, August 4, 2017, pages 28--39

work page 2017
[18]

Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. http://aclweb.org/anthology/N/N03/N03-1017.pdf Statistical phrase-based translation . In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, Edmonton, Canada, May 27 - June 1, 2003

work page 2003
[19]

Jaesong Lee, Joong - Hwi Shin, and Jun - Seok Kim. 2017. https://aclanthology.info/papers/D17-2021/d17-2021 Interactive visualization and manipulation of attention-based neural machine translation . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 - System Demo...

work page 2017
[20]

Jo \" e l Legrand, Michael Auli, and Ronan Collobert. 2016. http://aclweb.org/anthology/W/W16/W16-2207.pdf Neural network-based word alignment through score aggregation . In Proceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany , pages 66--73

work page 2016
[21]

Hovy, and Dan Jurafsky

Jiwei Li, Xinlei Chen, Eduard H. Hovy, and Dan Jurafsky. 2016. http://aclweb.org/anthology/N/N16/N16-1082.pdf Visualizing and understanding neural models in NLP . In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016 , ...

work page 2016
[22]

Finch, and Eiichiro Sumita

Lemao Liu, Masao Utiyama, Andrew M. Finch, and Eiichiro Sumita. 2016. http://aclweb.org/anthology/C/C16/C16-1291.pdf Neural machine translation with supervised attention . In COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan , pages 3093--3102

work page 2016
[23]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. http://aclweb.org/anthology/D/D15/D15-1166.pdf Effective approaches to attention-based neural machine translation . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 1412--1421

work page 2015
[24]

V Menon. 2015. Salience network

work page 2015
[25]

Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. http://aclweb.org/anthology/D/D16/D16-1249.pdf Supervised attentions for neural machine translation . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016 , pages 2283--2288

work page 2016
[26]

Gr \' e goire Montavon, Wojciech Samek, and Klaus - Robert M \" u ller. 2018. https://doi.org/10.1016/j.dsp.2017.10.011 Methods for interpreting and understanding deep neural networks . Digital Signal Processing, 73:1--15

work page doi:10.1016/j.dsp.2017.10.011 2018
[27]

Nguyen and David Chiang

Toan Q. Nguyen and David Chiang. 2018. https://aclanthology.info/papers/N18-1031/n18-1031 Improving lexical choice in neural machine translation . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vo...

work page 2018
[28]

Franz Josef Och and Hermann Ney. 2003. https://doi.org/10.1162/089120103321337421 A systematic comparison of various statistical alignment models . Computational Linguistics, 29(1):19--51

work page doi:10.1162/089120103321337421 2003
[29]

a ckstr \

Ankur P. Parikh, Oscar T \" a ckstr \" o m, Dipanjan Das, and Jakob Uszkoreit. 2016. http://aclweb.org/anthology/D/D16/D16-1244.pdf A decomposable attention model for natural language inference . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016 , pages 2249--2255

work page 2016
[30]

Alessandro Raganato and J \" o rg Tiedemann. 2018. https://aclanthology.info/papers/W18-5431/w18-5431 An analysis of encoder representations in transformer-based machine translation . In Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2018, Brussels, Belgium, November 1, 2018, pages 287--297

work page 2018
[31]

Rush, Sumit Chopra, and Jason Weston

Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. http://aclweb.org/anthology/D/D15/D15-1044.pdf A neural attention model for abstractive sentence summarization . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 379--389

work page 2015
[32]

Liu, and Christopher D

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. https://doi.org/10.18653/v1/P17-1099 Get to the point: Summarization with pointer-generator networks . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1073--1083

work page doi:10.18653/v1/p17-1099 2017
[33]

Richard M Shiffrin and Walter Schneider. 1977 a . Controlled and automatic human information processing: Ii. perceptual learning, automatic attending and a general theory. Psychological review, 84(2):127

work page 1977
[34]

Richard M Shiffrin and Walter Schneider. 1977 b . Controlled and automatic human information processing: Ii. perceptual learning, automatic attending and a general theory. Psychological review, 84(2):127

work page 1977
[35]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. http://arxiv.org/abs/1312.6034 Deep inside convolutional networks: Visualising image classification models and saliency maps . CoRR, abs/1312.6034

work page internal anchor Pith review Pith/arXiv arXiv 2013
[36]

SmoothGrad: removing noise by adding noise

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda B. Vi \' e gas, and Martin Wattenberg. 2017. http://arxiv.org/abs/1706.03825 Smoothgrad: removing noise by adding noise . CoRR, abs/1706.03825

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

Striving for Simplicity: The All Convolutional Net

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. http://arxiv.org/abs/1412.6806 Striving for simplicity: The all convolutional net . CoRR, abs/1412.6806

work page internal anchor Pith review Pith/arXiv arXiv 2014
[38]

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. http://proceedings.mlr.press/v70/sundararajan17a.html Axiomatic attribution for deep networks . In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pages 3319--3328

work page 2017
[39]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks Sequence to sequence learning with neural networks . In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada,...

work page 2014
[40]

Gongbo Tang, Mathias M \" u ller, Annette Rios, and Rico Sennrich. 2018 a . https://aclanthology.info/papers/D18-1458/d18-1458 Why self-attention? A targeted evaluation of neural machine translation architectures . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, p...

work page 2018
[41]

Gongbo Tang, Rico Sennrich, and Joakim Nivre. 2018 b . https://aclanthology.info/papers/W18-6304/w18-6304 An analysis of attention mechanisms: The case of word sense disambiguation in neural machine translation . In Proceedings of the Third Conference on Machine Translation: Research Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018 , pag...

work page 2018
[42]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need Attention is all you need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 201...

work page 2017
[43]

Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. 2018. https://aclanthology.info/papers/P18-2060/p18-2060 Neural hidden markov model for machine translation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers , pages 377--382

work page 2018
[44]

Thomas Zenkel, Joern Wuebker, and John DeNero. 2019. http://arxiv.org/abs/1901.11359 Adding interpretable attention to neural translation models improves word alignment . CoRR, abs/1901.11359

work page internal anchor Pith review Pith/arXiv arXiv 2019
[45]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[46]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Tamer Alkhouli, Gabriel Bretschner, and Hermann Ney. 2018. https://aclanthology.info/papers/W18-6318/w18-6318 On the alignment problem in multi-head attention-based neural machine translation . In Proceedings of the Third Conference on Machine Translation: Research Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018 , pages 177--185

work page 2018

[2] [2]

Tamer Alkhouli, Gabriel Bretschner, Jan - Thorsten Peter, Mohammed Hethnawi, Andreas Guta, and Hermann Ney. 2016. http://aclweb.org/anthology/W/W16/W16-2206.pdf Alignment-based neural machine translation . In Proceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany , pages 54--65

work page 2016

[3] [3]

Mihael Arcan, Marco Turchi, Sara Tonelli, and Paul Buitelaar. 2014. Enhancing statistical machine translation with bilingual terminology in a cat environment. In Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), pages 54--68

work page 2014

[4] [4]

Sebastian Bach, Alexander Binder, Gr \'e goire Montavon, Frederick Klauschen, Klaus-Robert M \"u ller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140

work page 2015

[5] [5]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. http://arxiv.org/abs/1409.0473 Neural machine translation by jointly learning to align and translate . CoRR, abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014

[6] [6]

Gosse Bouma and Yannick Parmentier, editors. 2014. http://aclweb.org/anthology/E/E14/ Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26-30, 2014, Gothenburg, Sweden . The Association for Computer Linguistics

work page 2014

[7] [7]

Brown, Stephen Della Pietra, Vincent J

Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311

work page 1993

[8] [8]

Le, and Oriol Vinyals

William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals. 2016. https://doi.org/10.1109/ICASSP.2016.7472621 Listen, attend and spell: A neural network for large vocabulary conversational speech recognition . In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , pages 4960--4964

work page doi:10.1109/icassp.2016.7472621 2016

[9] [9]

Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. https://doi.org/10.18653/v1/P17-1177 Improved neural machine translation with a syntax-aware encoder and decoder . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1936--1945

work page doi:10.18653/v1/p17-1177 2017

[10] [10]

Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. http://papers.nips.cc/paper/5847-attention-based-models-for-speech-recognition Attention-based models for speech recognition . In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 201...

work page 2015

[11] [11]

Yanzhuo Ding, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. https://doi.org/10.18653/v1/P17-1106 Visualizing and understanding neural machine translation . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1150--1159

work page doi:10.18653/v1/p17-1106 2017

[12] [12]

Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. http://aclweb.org/anthology/N/N13/N13-1073.pdf A simple, fast, and effective reparameterization of IBM model 2 . In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta...

work page 2013

[13] [13]

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. http://proceedings.mlr.press/v70/gehring17a.html Convolutional sequence to sequence learning . In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pages 1243--1252

work page 2017

[14] [14]

Hamidreza Ghader and Christof Monz. 2017. https://aclanthology.info/papers/I17-1004/i17-1004 What does attention in neural machine translation pay attention to? In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017 - Volume 1: Long Papers , pages 30--39

work page 2017

[15] [15]

Fern, and Prasad Tadepalli

Reza Ghaeini, Xiaoli Z. Fern, and Prasad Tadepalli. 2018. https://aclanthology.info/papers/D18-1537/d18-1537 Interpreting recurrent and attention-based neural models: a case study on natural language inference . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, page...

work page 2018

[16] [16]

Eva Hasler, Adri \` a de Gispert, Gonzalo Iglesias, and Bill Byrne. 2018. https://aclanthology.info/papers/N18-2081/n18-2081 Neural machine translation decoding with terminology constraints . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orl...

work page 2018

[17] [17]

Philipp Koehn and Rebecca Knowles. 2017. https://aclanthology.info/papers/W17-3204/w17-3204 Six challenges for neural machine translation . In Proceedings of the First Workshop on Neural Machine Translation, NMT@ACL 2017, Vancouver, Canada, August 4, 2017, pages 28--39

work page 2017

[18] [18]

Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. http://aclweb.org/anthology/N/N03/N03-1017.pdf Statistical phrase-based translation . In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, Edmonton, Canada, May 27 - June 1, 2003

work page 2003

[19] [19]

Jaesong Lee, Joong - Hwi Shin, and Jun - Seok Kim. 2017. https://aclanthology.info/papers/D17-2021/d17-2021 Interactive visualization and manipulation of attention-based neural machine translation . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 - System Demo...

work page 2017

[20] [20]

Jo \" e l Legrand, Michael Auli, and Ronan Collobert. 2016. http://aclweb.org/anthology/W/W16/W16-2207.pdf Neural network-based word alignment through score aggregation . In Proceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany , pages 66--73

work page 2016

[21] [21]

Hovy, and Dan Jurafsky

Jiwei Li, Xinlei Chen, Eduard H. Hovy, and Dan Jurafsky. 2016. http://aclweb.org/anthology/N/N16/N16-1082.pdf Visualizing and understanding neural models in NLP . In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016 , ...

work page 2016

[22] [22]

Finch, and Eiichiro Sumita

Lemao Liu, Masao Utiyama, Andrew M. Finch, and Eiichiro Sumita. 2016. http://aclweb.org/anthology/C/C16/C16-1291.pdf Neural machine translation with supervised attention . In COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan , pages 3093--3102

work page 2016

[23] [23]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. http://aclweb.org/anthology/D/D15/D15-1166.pdf Effective approaches to attention-based neural machine translation . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 1412--1421

work page 2015

[24] [24]

V Menon. 2015. Salience network

work page 2015

[25] [25]

Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. http://aclweb.org/anthology/D/D16/D16-1249.pdf Supervised attentions for neural machine translation . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016 , pages 2283--2288

work page 2016

[26] [26]

Gr \' e goire Montavon, Wojciech Samek, and Klaus - Robert M \" u ller. 2018. https://doi.org/10.1016/j.dsp.2017.10.011 Methods for interpreting and understanding deep neural networks . Digital Signal Processing, 73:1--15

work page doi:10.1016/j.dsp.2017.10.011 2018

[27] [27]

Nguyen and David Chiang

Toan Q. Nguyen and David Chiang. 2018. https://aclanthology.info/papers/N18-1031/n18-1031 Improving lexical choice in neural machine translation . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vo...

work page 2018

[28] [28]

Franz Josef Och and Hermann Ney. 2003. https://doi.org/10.1162/089120103321337421 A systematic comparison of various statistical alignment models . Computational Linguistics, 29(1):19--51

work page doi:10.1162/089120103321337421 2003

[29] [29]

a ckstr \

Ankur P. Parikh, Oscar T \" a ckstr \" o m, Dipanjan Das, and Jakob Uszkoreit. 2016. http://aclweb.org/anthology/D/D16/D16-1244.pdf A decomposable attention model for natural language inference . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016 , pages 2249--2255

work page 2016

[30] [30]

Alessandro Raganato and J \" o rg Tiedemann. 2018. https://aclanthology.info/papers/W18-5431/w18-5431 An analysis of encoder representations in transformer-based machine translation . In Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2018, Brussels, Belgium, November 1, 2018, pages 287--297

work page 2018

[31] [31]

Rush, Sumit Chopra, and Jason Weston

Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. http://aclweb.org/anthology/D/D15/D15-1044.pdf A neural attention model for abstractive sentence summarization . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 379--389

work page 2015

[32] [32]

Liu, and Christopher D

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. https://doi.org/10.18653/v1/P17-1099 Get to the point: Summarization with pointer-generator networks . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1073--1083

work page doi:10.18653/v1/p17-1099 2017

[33] [33]

Richard M Shiffrin and Walter Schneider. 1977 a . Controlled and automatic human information processing: Ii. perceptual learning, automatic attending and a general theory. Psychological review, 84(2):127

work page 1977

[34] [34]

Richard M Shiffrin and Walter Schneider. 1977 b . Controlled and automatic human information processing: Ii. perceptual learning, automatic attending and a general theory. Psychological review, 84(2):127

work page 1977

[35] [35]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. http://arxiv.org/abs/1312.6034 Deep inside convolutional networks: Visualising image classification models and saliency maps . CoRR, abs/1312.6034

work page internal anchor Pith review Pith/arXiv arXiv 2013

[36] [36]

SmoothGrad: removing noise by adding noise

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda B. Vi \' e gas, and Martin Wattenberg. 2017. http://arxiv.org/abs/1706.03825 Smoothgrad: removing noise by adding noise . CoRR, abs/1706.03825

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

Striving for Simplicity: The All Convolutional Net

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. http://arxiv.org/abs/1412.6806 Striving for simplicity: The all convolutional net . CoRR, abs/1412.6806

work page internal anchor Pith review Pith/arXiv arXiv 2014

[38] [38]

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. http://proceedings.mlr.press/v70/sundararajan17a.html Axiomatic attribution for deep networks . In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pages 3319--3328

work page 2017

[39] [39]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks Sequence to sequence learning with neural networks . In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada,...

work page 2014

[40] [40]

Gongbo Tang, Mathias M \" u ller, Annette Rios, and Rico Sennrich. 2018 a . https://aclanthology.info/papers/D18-1458/d18-1458 Why self-attention? A targeted evaluation of neural machine translation architectures . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, p...

work page 2018

[41] [41]

Gongbo Tang, Rico Sennrich, and Joakim Nivre. 2018 b . https://aclanthology.info/papers/W18-6304/w18-6304 An analysis of attention mechanisms: The case of word sense disambiguation in neural machine translation . In Proceedings of the Third Conference on Machine Translation: Research Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018 , pag...

work page 2018

[42] [42]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need Attention is all you need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 201...

work page 2017

[43] [43]

Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. 2018. https://aclanthology.info/papers/P18-2060/p18-2060 Neural hidden markov model for machine translation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers , pages 377--382

work page 2018

[44] [44]

Thomas Zenkel, Joern Wuebker, and John DeNero. 2019. http://arxiv.org/abs/1901.11359 Adding interpretable attention to neural translation models improves word alignment . CoRR, abs/1901.11359

work page internal anchor Pith review Pith/arXiv arXiv 2019

[45] [45]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[46] [46]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page