Saliency-driven Word Alignment Interpretation for Neural Machine Translation
Pith reviewed 2026-05-25 17:13 UTC · model grok-4.3
The pith
NMT models learn interpretable word alignments that saliency methods can extract.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NMT models learn interpretable word alignments, revealed by saliency-driven interpretation methods. Under force decoding, these alignments exceed fast-align quality for some systems, and in free decoding they align well with automatic tools. The methods are model-agnostic and require no parameter updates.
What carries the argument
Saliency scores that quantify the contribution of each source word to the model's output predictions for target words.
Load-bearing premise
Saliency scores accurately reflect the word alignment information learned by the model rather than unrelated computational effects.
What would settle it
If saliency-based alignments show no better agreement with human or gold alignments than random baselines when compared to fast-align results.
Figures
read the original abstract
Despite their original goal to jointly learn to align and translate, Neural Machine Translation (NMT) models, especially Transformer, are often perceived as not learning interpretable word alignments. In this paper, we show that NMT models do learn interpretable word alignments, which could only be revealed with proper interpretation methods. We propose a series of such methods that are model-agnostic, are able to be applied either offline or online, and do not require parameter update or architectural change. We show that under the force decoding setup, the alignments induced by our interpretation method are of better quality than fast-align for some systems, and when performing free decoding, they agree well with the alignments induced by automatic alignment tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that NMT models (including Transformers) learn interpretable word alignments that can be recovered using a series of model-agnostic saliency-based interpretation methods applicable offline or online without parameter updates or architectural changes. Under forced decoding the induced alignments are reported to exceed fast-align quality for some systems; under free decoding they agree well with automatic alignment tools.
Significance. If the saliency methods are shown to isolate alignments actually learned by the model rather than gradient or perturbation artifacts, the result would be significant for NMT interpretability research by providing a practical way to inspect alignments in pre-trained models. The model-agnostic and no-retraining design is a clear strength that enables direct application to existing systems.
major comments (2)
- [Abstract / Experimental Results] Abstract and experimental sections: the central claim that saliency scores recover alignments the model has learned (rather than gradient saturation, decoder-state dependencies, or input-normalization artifacts) is load-bearing, yet the manuscript provides no ablations, random baselines, or controls that would distinguish these possibilities. This directly affects the force-decoding and free-decoding comparisons.
- [Experimental Results] The reported superiority over fast-align under forced decoding and agreement with automatic tools under free decoding lacks details on statistical significance testing, variance across multiple runs, or dataset-specific breakdowns, making it impossible to assess whether the differences are robust.
minor comments (2)
- [Methods] Notation for the different saliency variants (model-agnostic offline vs. online) should be introduced with explicit equations or pseudocode early in the methods section to improve readability.
- [Related Work] The paper should include a short related-work subsection contrasting the proposed saliency approach with prior gradient- or attention-based alignment extraction methods.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, agreeing where the manuscript is lacking and outlining planned revisions.
read point-by-point responses
-
Referee: [Abstract / Experimental Results] Abstract and experimental sections: the central claim that saliency scores recover alignments the model has learned (rather than gradient saturation, decoder-state dependencies, or input-normalization artifacts) is load-bearing, yet the manuscript provides no ablations, random baselines, or controls that would distinguish these possibilities. This directly affects the force-decoding and free-decoding comparisons.
Authors: We acknowledge that the manuscript does not include explicit ablations, random baselines, or controls to isolate learned alignments from potential artifacts such as gradient saturation or decoder-state dependencies. While the reported comparisons to fast-align and automatic tools provide supporting evidence, they do not fully rule out these alternatives. In the revised version we will add random saliency baselines and targeted controls for decoder dependencies and input normalization. revision: yes
-
Referee: [Experimental Results] The reported superiority over fast-align under forced decoding and agreement with automatic tools under free decoding lacks details on statistical significance testing, variance across multiple runs, or dataset-specific breakdowns, making it impossible to assess whether the differences are robust.
Authors: The current results are presented as averages without statistical tests, variance, or per-dataset breakdowns. We will incorporate bootstrap significance testing, report standard deviations from multiple runs where feasible, and add dataset-specific result tables in the revision to allow assessment of robustness. revision: yes
Circularity Check
No circularity: methods applied to pre-trained models with external comparisons
full rationale
The paper applies saliency-based interpretation techniques to existing pre-trained NMT models and evaluates the resulting alignments against independent external tools (fast-align and automatic aligners). No equations, parameters, or central claims are defined in terms of the paper's own outputs or fitted values. The derivation chain consists of standard gradient/perturbation computations followed by post-hoc comparison, with no self-definitional steps, fitted-input predictions, or load-bearing self-citations that reduce the result to the input by construction. This is the expected non-circular outcome for an interpretation study on fixed models.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Saliency methods applied to NMT models extract meaningful word alignment information
Reference graph
Works this paper leans on
-
[1]
Tamer Alkhouli, Gabriel Bretschner, and Hermann Ney. 2018. https://aclanthology.info/papers/W18-6318/w18-6318 On the alignment problem in multi-head attention-based neural machine translation . In Proceedings of the Third Conference on Machine Translation: Research Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018 , pages 177--185
work page 2018
-
[2]
Tamer Alkhouli, Gabriel Bretschner, Jan - Thorsten Peter, Mohammed Hethnawi, Andreas Guta, and Hermann Ney. 2016. http://aclweb.org/anthology/W/W16/W16-2206.pdf Alignment-based neural machine translation . In Proceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany , pages 54--65
work page 2016
-
[3]
Mihael Arcan, Marco Turchi, Sara Tonelli, and Paul Buitelaar. 2014. Enhancing statistical machine translation with bilingual terminology in a cat environment. In Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), pages 54--68
work page 2014
-
[4]
Sebastian Bach, Alexander Binder, Gr \'e goire Montavon, Frederick Klauschen, Klaus-Robert M \"u ller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140
work page 2015
-
[5]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. http://arxiv.org/abs/1409.0473 Neural machine translation by jointly learning to align and translate . CoRR, abs/1409.0473
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[6]
Gosse Bouma and Yannick Parmentier, editors. 2014. http://aclweb.org/anthology/E/E14/ Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26-30, 2014, Gothenburg, Sweden . The Association for Computer Linguistics
work page 2014
-
[7]
Brown, Stephen Della Pietra, Vincent J
Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311
work page 1993
-
[8]
William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals. 2016. https://doi.org/10.1109/ICASSP.2016.7472621 Listen, attend and spell: A neural network for large vocabulary conversational speech recognition . In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016 , pages 4960--4964
-
[9]
Huadong Chen, Shujian Huang, David Chiang, and Jiajun Chen. 2017. https://doi.org/10.18653/v1/P17-1177 Improved neural machine translation with a syntax-aware encoder and decoder . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1936--1945
-
[10]
Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. http://papers.nips.cc/paper/5847-attention-based-models-for-speech-recognition Attention-based models for speech recognition . In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 201...
work page 2015
-
[11]
Yanzhuo Ding, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. https://doi.org/10.18653/v1/P17-1106 Visualizing and understanding neural machine translation . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1150--1159
-
[12]
Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. http://aclweb.org/anthology/N/N13/N13-1073.pdf A simple, fast, and effective reparameterization of IBM model 2 . In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta...
work page 2013
-
[13]
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. http://proceedings.mlr.press/v70/gehring17a.html Convolutional sequence to sequence learning . In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pages 1243--1252
work page 2017
-
[14]
Hamidreza Ghader and Christof Monz. 2017. https://aclanthology.info/papers/I17-1004/i17-1004 What does attention in neural machine translation pay attention to? In Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017 - Volume 1: Long Papers , pages 30--39
work page 2017
-
[15]
Reza Ghaeini, Xiaoli Z. Fern, and Prasad Tadepalli. 2018. https://aclanthology.info/papers/D18-1537/d18-1537 Interpreting recurrent and attention-based neural models: a case study on natural language inference . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, page...
work page 2018
-
[16]
Eva Hasler, Adri \` a de Gispert, Gonzalo Iglesias, and Bill Byrne. 2018. https://aclanthology.info/papers/N18-2081/n18-2081 Neural machine translation decoding with terminology constraints . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orl...
work page 2018
-
[17]
Philipp Koehn and Rebecca Knowles. 2017. https://aclanthology.info/papers/W17-3204/w17-3204 Six challenges for neural machine translation . In Proceedings of the First Workshop on Neural Machine Translation, NMT@ACL 2017, Vancouver, Canada, August 4, 2017, pages 28--39
work page 2017
-
[18]
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. http://aclweb.org/anthology/N/N03/N03-1017.pdf Statistical phrase-based translation . In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, Edmonton, Canada, May 27 - June 1, 2003
work page 2003
-
[19]
Jaesong Lee, Joong - Hwi Shin, and Jun - Seok Kim. 2017. https://aclanthology.info/papers/D17-2021/d17-2021 Interactive visualization and manipulation of attention-based neural machine translation . In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017 - System Demo...
work page 2017
-
[20]
Jo \" e l Legrand, Michael Auli, and Ronan Collobert. 2016. http://aclweb.org/anthology/W/W16/W16-2207.pdf Neural network-based word alignment through score aggregation . In Proceedings of the First Conference on Machine Translation, WMT 2016, colocated with ACL 2016, August 11-12, Berlin, Germany , pages 66--73
work page 2016
-
[21]
Jiwei Li, Xinlei Chen, Eduard H. Hovy, and Dan Jurafsky. 2016. http://aclweb.org/anthology/N/N16/N16-1082.pdf Visualizing and understanding neural models in NLP . In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016 , ...
work page 2016
-
[22]
Lemao Liu, Masao Utiyama, Andrew M. Finch, and Eiichiro Sumita. 2016. http://aclweb.org/anthology/C/C16/C16-1291.pdf Neural machine translation with supervised attention . In COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11-16, 2016, Osaka, Japan , pages 3093--3102
work page 2016
-
[23]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. http://aclweb.org/anthology/D/D15/D15-1166.pdf Effective approaches to attention-based neural machine translation . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 1412--1421
work page 2015
-
[24]
V Menon. 2015. Salience network
work page 2015
-
[25]
Haitao Mi, Zhiguo Wang, and Abe Ittycheriah. 2016. http://aclweb.org/anthology/D/D16/D16-1249.pdf Supervised attentions for neural machine translation . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016 , pages 2283--2288
work page 2016
-
[26]
Gr \' e goire Montavon, Wojciech Samek, and Klaus - Robert M \" u ller. 2018. https://doi.org/10.1016/j.dsp.2017.10.011 Methods for interpreting and understanding deep neural networks . Digital Signal Processing, 73:1--15
-
[27]
Toan Q. Nguyen and David Chiang. 2018. https://aclanthology.info/papers/N18-1031/n18-1031 Improving lexical choice in neural machine translation . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Vo...
work page 2018
-
[28]
Franz Josef Och and Hermann Ney. 2003. https://doi.org/10.1162/089120103321337421 A systematic comparison of various statistical alignment models . Computational Linguistics, 29(1):19--51
-
[29]
Ankur P. Parikh, Oscar T \" a ckstr \" o m, Dipanjan Das, and Jakob Uszkoreit. 2016. http://aclweb.org/anthology/D/D16/D16-1244.pdf A decomposable attention model for natural language inference . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016 , pages 2249--2255
work page 2016
-
[30]
Alessandro Raganato and J \" o rg Tiedemann. 2018. https://aclanthology.info/papers/W18-5431/w18-5431 An analysis of encoder representations in transformer-based machine translation . In Proceedings of the Workshop: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2018, Brussels, Belgium, November 1, 2018, pages 287--297
work page 2018
-
[31]
Rush, Sumit Chopra, and Jason Weston
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. http://aclweb.org/anthology/D/D15/D15-1044.pdf A neural attention model for abstractive sentence summarization . In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 379--389
work page 2015
-
[32]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. https://doi.org/10.18653/v1/P17-1099 Get to the point: Summarization with pointer-generator networks . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers , pages 1073--1083
-
[33]
Richard M Shiffrin and Walter Schneider. 1977 a . Controlled and automatic human information processing: Ii. perceptual learning, automatic attending and a general theory. Psychological review, 84(2):127
work page 1977
-
[34]
Richard M Shiffrin and Walter Schneider. 1977 b . Controlled and automatic human information processing: Ii. perceptual learning, automatic attending and a general theory. Psychological review, 84(2):127
work page 1977
-
[35]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. http://arxiv.org/abs/1312.6034 Deep inside convolutional networks: Visualising image classification models and saliency maps . CoRR, abs/1312.6034
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[36]
SmoothGrad: removing noise by adding noise
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda B. Vi \' e gas, and Martin Wattenberg. 2017. http://arxiv.org/abs/1706.03825 Smoothgrad: removing noise by adding noise . CoRR, abs/1706.03825
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
Striving for Simplicity: The All Convolutional Net
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. http://arxiv.org/abs/1412.6806 Striving for simplicity: The all convolutional net . CoRR, abs/1412.6806
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[38]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. http://proceedings.mlr.press/v70/sundararajan17a.html Axiomatic attribution for deep networks . In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , pages 3319--3328
work page 2017
-
[39]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks Sequence to sequence learning with neural networks . In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada,...
work page 2014
-
[40]
Gongbo Tang, Mathias M \" u ller, Annette Rios, and Rico Sennrich. 2018 a . https://aclanthology.info/papers/D18-1458/d18-1458 Why self-attention? A targeted evaluation of neural machine translation architectures . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, p...
work page 2018
-
[41]
Gongbo Tang, Rico Sennrich, and Joakim Nivre. 2018 b . https://aclanthology.info/papers/W18-6304/w18-6304 An analysis of attention mechanisms: The case of word sense disambiguation in neural machine translation . In Proceedings of the Third Conference on Machine Translation: Research Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018 , pag...
work page 2018
-
[42]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. http://papers.nips.cc/paper/7181-attention-is-all-you-need Attention is all you need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 201...
work page 2017
-
[43]
Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. 2018. https://aclanthology.info/papers/P18-2060/p18-2060 Neural hidden markov model for machine translation . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers , pages 377--382
work page 2018
-
[44]
Thomas Zenkel, Joern Wuebker, and John DeNero. 2019. http://arxiv.org/abs/1901.11359 Adding interpretable attention to neural translation models improves word alignment . CoRR, abs/1901.11359
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[45]
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
-
[46]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.