The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

Amjad Almahairi; Mai Oudah; Nizar Habash

arxiv: 1906.11751 · v1 · pith:U32ILPHTnew · submitted 2019-06-27 · 💻 cs.CL

The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

Mai Oudah , Amjad Almahairi , Nizar Habash This is my paper

Pith reviewed 2026-05-25 14:51 UTC · model grok-4.3

classification 💻 cs.CL

keywords machine translationArabic-Englishtokenizationpreprocessingstatistical MTneural MTsystem combination

0 comments

The pith

The best tokenization scheme for Arabic-English machine translation depends on whether the system is statistical or neural and on the amount of training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests multiple tokenization schemes on both statistical and neural machine translation models for Arabic to English. It varies the amount of training data and vocabulary size to measure effects on each approach. Results show that no single scheme is best in all cases; the winner shifts with model type and data volume. The work also finds that picking the stronger output from a statistical system and a neural system together produces clear gains over either alone.

Core claim

Our empirical results show that the best choice of tokenization scheme is largely based on the type of model and the size of data. We also show that we can gain significant improvements using a system selection that combines the output from neural and statistical MT.

What carries the argument

Head-to-head comparison of linguistically motivated tokenization schemes applied to statistical MT and neural MT models while varying training data size and vocabulary size.

If this is right

Statistical MT and neural MT benefit from different tokenization choices under the same conditions.
Increasing training data can change which tokenization scheme performs best for a given model.
Selecting the higher-quality translation from a statistical system and a neural system improves final output quality.
Vocabulary size interacts with tokenization choice in both model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Preprocessing decisions may need to be re-tuned when switching from statistical to neural architectures or when data volume changes.
The observed gains from system combination suggest a practical route to better Arabic-English output without redesigning either model.
Similar experiments on other language pairs could test whether model-dependent tokenization effects appear beyond Arabic.

Load-bearing premise

The tokenization schemes and datasets tested are representative enough to support general statements about preprocessing effects on Arabic-English translation quality.

What would settle it

A single tokenization scheme that produces the highest scores for both statistical and neural models at every data size tested would undermine the claim that the best scheme depends on model type and data size.

Figures

Figures reproduced from arXiv: 1906.11751 by Amjad Almahairi, Mai Oudah, Nizar Habash.

**Figure 1.** Figure 1: Tokenization schemes applied to an example. tokens. Thus, the same sentences will be selected across different tokenization schemes. 3.3 Target Language Resources We design the training so that both systems will have access to the same additional target language resources besides the target side of the training parallel corpus. In SMT, target language resources are used to build language models for fluenc… view at source ↗

**Figure 2.** Figure 2: The performance on in-domain test (MT05) under different settings with different training data sizes. #Vocab SMTtgt++ CI NMTscr/tgt++ CI P-value Raw 331K 52.78 ± 0.98 52.76 ± 1.24 0.412 ATB 208K 55.42 ± 1.07 53.54 ± 1.20 0.002 D3 190K 54.66 ± 1.02 53.51 ± 1.20 0.027 Raw+BPE 20K 53.78 ± 1.10 52.41 ± 1.17 0.003 ATB+BPE 20K 55.64 ± 1.11 53.18 ± 1.15 0.001 D3+BPE 20K 54.59 ± 1.07 53.38 ± 1.16 0.018 [PITH_FULL… view at source ↗

**Figure 3.** Figure 3: The input size vs. output size in SMT and NMT, respectively, on MT05 with ATB tokenization. We notice that in NMT parts of the input sentences are dropped and not translated at all, which motivates the length-based selection. SMTtgt++ NMTscr/tgt++ System Selection Oracle Setting BLEU Scheme BLEU BLEU BLEU ATB+BPE 55.64 ATB 53.54 56.18 61.26 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Examples from MT05, with SMT and NMT outputs when ATB is used as a scheme. The * designation next to the system name indicates the decision of the system selection [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Neural networks have become the state-of-the-art approach for machine translation (MT) in many languages. While linguistically-motivated tokenization techniques were shown to have significant effects on the performance of statistical MT, it remains unclear if those techniques are well suited for neural MT. In this paper, we systematically compare neural and statistical MT models for Arabic-English translation on data preprecossed by various prominent tokenization schemes. Furthermore, we consider a range of data and vocabulary sizes and compare their effect on both approaches. Our empirical results show that the best choice of tokenization scheme is largely based on the type of model and the size of data. We also show that we can gain significant improvements using a system selection that combines the output from neural and statistical MT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core result is that optimal tokenization for Arabic-English MT depends on model type and data size, with gains from NMT+SMT output selection; it's a useful but incremental empirical comparison.

read the letter

The main thing to know is that tokenization choices for Arabic-English translation perform differently under statistical versus neural MT, and the best option shifts with training data size; they also report gains from a simple system combination that picks the better output from each paradigm. The paper runs a controlled comparison of several established tokenization schemes across both model types and a range of data and vocabulary sizes. That addresses a gap left by earlier work that focused mostly on statistical MT, and the empirical setup avoids any circularity or fitting issues. The results line up with the abstract claim without visible internal contradictions. One soft spot is that the abstract supplies no concrete numbers, significance tests, or dataset details, which makes it hard to judge effect sizes or robustness from the summary alone; the full paper would need to show those to carry weight. The work stays within one language pair, so it does not claim broad generality. This is the sort of practical study that MT practitioners working on Arabic or other morphologically rich languages would find directly useful for system tuning. It does not introduce new techniques or resolve open theoretical questions, but the comparison is executed in a straightforward way that adds to the record. I would send it for peer review so the experimental details and any statistical backing can be checked by referees familiar with the area.

Referee Report

2 major / 2 minor

Summary. The paper empirically compares prominent tokenization schemes for Arabic-English data in both statistical MT (SMT) and neural MT (NMT) across varying data and vocabulary sizes. It claims that the best tokenization choice depends primarily on model type and data scale, and that a system-selection approach combining NMT and SMT outputs yields significant improvements over either alone.

Significance. If the results hold under rigorous controls, the work supplies actionable guidance on preprocessing for a morphologically complex language pair and highlights a practical hybrid strategy. The controlled variation over data sizes is a positive feature that supports the dependence claim.

major comments (2)

[§4] §4 (Experimental Setup): the manuscript provides no information on the concrete parallel corpora used (source, domain, sentence counts per size bucket), making it impossible to judge whether the reported dependence on data size generalizes or is an artifact of the chosen collection.
[Table 3 / §5.2] Table 3 / §5.2: the system-selection gains are asserted to be 'significant' yet no statistical significance test, bootstrap interval, or multiple-comparison correction is reported; the numerical deltas alone do not establish that the hybrid result is reliably superior to the best single system.

minor comments (2)

[Abstract] Abstract: 'preprecossed' is a typographical error.
[§2] §2: the description of the tokenizers (Farasa, MADAMIRA, etc.) would benefit from explicit pseudocode or a small example showing how each scheme segments a sample Arabic sentence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below and will revise the manuscript to incorporate the requested details and analyses.

read point-by-point responses

Referee: [§4] §4 (Experimental Setup): the manuscript provides no information on the concrete parallel corpora used (source, domain, sentence counts per size bucket), making it impossible to judge whether the reported dependence on data size generalizes or is an artifact of the chosen collection.

Authors: We agree that §4 lacks explicit details on the corpora. The experiments drew from standard LDC Arabic-English parallel resources (primarily news and web domains), with data buckets constructed by subsampling to approximate small (~100k), medium (~500k), and large (~1M+) sentence counts. In the revised manuscript we will add a dedicated subsection and table in §4 listing the exact corpus identifiers, domains, and precise sentence counts per bucket to allow readers to evaluate generalizability. revision: yes
Referee: [Table 3 / §5.2] Table 3 / §5.2: the system-selection gains are asserted to be 'significant' yet no statistical significance test, bootstrap interval, or multiple-comparison correction is reported; the numerical deltas alone do not establish that the hybrid result is reliably superior to the best single system.

Authors: The referee is correct that no statistical tests appear in the current version. We will recompute the system-selection results with bootstrap resampling (following standard MT practice) and report 95% confidence intervals plus paired significance tests against the best single system in the revised Table 3 and §5.2. If any gains fall short of significance after correction, we will qualify the claims accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical comparison

full rationale

The paper reports controlled experiments comparing tokenization schemes on Arabic-English SMT and NMT across data/vocabulary sizes, plus a system-selection combination. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the load-bearing claims. The central results (tokenization optimality depends on model type and data size; gains from NMT+SMT selection) are direct empirical observations against external benchmarks, with no reduction to inputs by construction. This matches the default non-circular case for empirical studies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical machine translation study relying on standard experimental practices rather than new axioms or parameters.

pith-pipeline@v0.9.0 · 5656 in / 1102 out tokens · 30019 ms · 2026-05-25T14:51:03.959958+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We conduct learning curve experiments to study the interaction between data size and the choice of tokenization scheme.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 5 internal anchors

[1]

Almahairi, Amjad, Kyunghyun Cho, Nizar Habash, and Aaron Courville. 2016. First result on A rabic neural machine translation. arXiv preprint arXiv:1606.02680

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv e-prints , abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014
[3]

Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics , 5:135--146

work page 2017
[4]

Cho, Kyunghyun, Bart Van, Dzmitry Bahdanau, and Yoshua Bengio. 2014a. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST-8 Eighth Workshop on Syntax Semantics and Structure in Statistical Translation , pages 103--111. Association for Computational Linguistics

work page
[5]

Cho, Kyunghyun, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014b. Learning phrase representations using rnn encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1724--1734. Association ...

work page 2014
[6]

Dahlmann, Leonard, Evgeny Matusov, Pavel Petrushkov, and Shahram Khadivi. 2017. Neural machine translation leveraging phrase-based models in a hybrid search. CoRR

work page 2017
[7]

Devlin, Jacob and Spyros Matsoukas. 2012. Trait-based hypothesis selection for machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , NAACL HLT 12, pages 528--532. Association for Computational Linguistics

work page 2012
[8]

Durrani, Nadir, Fahim Dalvi, Hassan Sajjad, and Stephan Vogel. 2017. Qcri machine translation systems for iwslt 16. CoRR

work page 2017
[9]

El Kholy, Ahmed and Nizar Habash. 2012. Orthographic and morphological processing for English--Arabic statistical machine translation . Machine Translation , 26(1-2):25--45

work page 2012
[10]

Erdmann, Alexander, Nasser Zalmout, and Nizar Habash. 2018. Addressing noise in multidialectal word embeddings. In Proceedings of Conference of the Association for Computational Linguistics , Melbourne, Australia

work page 2018
[11]

Escolano, Carlos, Marta Costa-jussa, and Jose Fonollosa. 2017. The talp-upc neural machine translation system for german/finnish-english using the inverse direction model in rescoring. In Proceedings of the Second Conference on Machine Translation , pages 283--287. Association for Computational Linguistics

work page 2017
[12]

Graff, David and Christopher Cieri. 2003. English gigaword, ldc catalog no ldc2003t05. Linguistic Data Consortium, University of Pennsylvania

work page 2003
[13]

Habash, Nizar and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In HLT-NAACL

work page 2006
[14]

Hochreiter, Sepp and J\" u rgen Schmidhuber. 1997. Long short-term memory. Neural Comput. , 9(8):1735--1780, November

work page 1997
[15]

Klein, Guillaume, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. Opennmt: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017 System Demonstrations , pages 67--72. Association for Computational Linguistics

work page 2017
[16]

Koehn, Philipp and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation , pages 28--39. Association for Computational Linguistics

work page 2017
[17]

Koehn, Philipp, Hieu Hoang, Alexandra Birch, Christopher Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Christopher Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the As...

work page 2007
[18]

Koehn, Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP , Barcelona, Spain

work page 2004
[19]

Luong, Thang, Hieu Pham, and Christopher Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pages 1412--1421. Association for Computational Linguistics"

work page 2015
[20]

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR , abs/1301.3781

work page internal anchor Pith review Pith/arXiv arXiv 2013
[21]

Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation . In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , pages 311--318, Philadelphia, PA

work page 2002
[22]

Parker, Robert, David Graff, Ke Chen, Junbo Kong, and Kazuaki Maeda. 2011. Arabic Gigaword Fifth Edition . LDC catalog number No. LDC2011T11, ISBN 1-58563-595-2

work page 2011
[23]

Pasha, Arfath, Mohamed Al-Badrashiny, Ahmed El Kholy, Ramy Eskander, Mona Diab, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan Roth. 2014. Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In In Proceedings of LREC

work page 2014
[24]

Qi, Ye, Devendra Singh, Matthieu Felix, Sarguna Janani, and Graham Neubig. 2018. When and why are pre-trained word embeddings useful for neural machine translation? CoRR

work page 2018
[25]

Rehurek, Radim and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks , pages 45--50, Valletta, Malta. ELRA

work page 2010
[26]

Salloum, Wael, Heba Elfardy, Linda Alamir-Salloum, Nizar Habash, and Mona Diab. 2014. Sentence level dialect identification for machine translation system selection. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics , pages 772--778

work page 2014
[27]

Sennrich, Rico, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 1715--1725. Association for Computational Linguistics

work page 2016
[28]

Unanue, Inigo, Lierni Arratibel, Ehsan Borzeshi, and Massimo Piccardi. 2018. English-basque statistical and neural machine translation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) , Paris, France. European Language Resources Association (ELRA)

work page 2018
[29]

Attention Is All You Need

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR , abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, and Wolfgang Macherey. 2016. Googles neural machine translation system: Bridging the gap between human and machine translation. CoRR , abs/1609.08144

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

Zalmout, Nasser and Nizar Habash. 2017. Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages . The Prague Bulletin of Mathematical Linguistics , 108:257--270, June

work page 2017
[32]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page

[1] [1]

Almahairi, Amjad, Kyunghyun Cho, Nizar Habash, and Aaron Courville. 2016. First result on A rabic neural machine translation. arXiv preprint arXiv:1606.02680

work page internal anchor Pith review Pith/arXiv arXiv 2016

[2] [2]

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv e-prints , abs/1409.0473

work page internal anchor Pith review Pith/arXiv arXiv 2014

[3] [3]

Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics , 5:135--146

work page 2017

[4] [4]

Cho, Kyunghyun, Bart Van, Dzmitry Bahdanau, and Yoshua Bengio. 2014a. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST-8 Eighth Workshop on Syntax Semantics and Structure in Statistical Translation , pages 103--111. Association for Computational Linguistics

work page

[5] [5]

Cho, Kyunghyun, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014b. Learning phrase representations using rnn encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1724--1734. Association ...

work page 2014

[6] [6]

Dahlmann, Leonard, Evgeny Matusov, Pavel Petrushkov, and Shahram Khadivi. 2017. Neural machine translation leveraging phrase-based models in a hybrid search. CoRR

work page 2017

[7] [7]

Devlin, Jacob and Spyros Matsoukas. 2012. Trait-based hypothesis selection for machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , NAACL HLT 12, pages 528--532. Association for Computational Linguistics

work page 2012

[8] [8]

Durrani, Nadir, Fahim Dalvi, Hassan Sajjad, and Stephan Vogel. 2017. Qcri machine translation systems for iwslt 16. CoRR

work page 2017

[9] [9]

El Kholy, Ahmed and Nizar Habash. 2012. Orthographic and morphological processing for English--Arabic statistical machine translation . Machine Translation , 26(1-2):25--45

work page 2012

[10] [10]

Erdmann, Alexander, Nasser Zalmout, and Nizar Habash. 2018. Addressing noise in multidialectal word embeddings. In Proceedings of Conference of the Association for Computational Linguistics , Melbourne, Australia

work page 2018

[11] [11]

Escolano, Carlos, Marta Costa-jussa, and Jose Fonollosa. 2017. The talp-upc neural machine translation system for german/finnish-english using the inverse direction model in rescoring. In Proceedings of the Second Conference on Machine Translation , pages 283--287. Association for Computational Linguistics

work page 2017

[12] [12]

Graff, David and Christopher Cieri. 2003. English gigaword, ldc catalog no ldc2003t05. Linguistic Data Consortium, University of Pennsylvania

work page 2003

[13] [13]

Habash, Nizar and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In HLT-NAACL

work page 2006

[14] [14]

Hochreiter, Sepp and J\" u rgen Schmidhuber. 1997. Long short-term memory. Neural Comput. , 9(8):1735--1780, November

work page 1997

[15] [15]

Klein, Guillaume, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. Opennmt: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017 System Demonstrations , pages 67--72. Association for Computational Linguistics

work page 2017

[16] [16]

Koehn, Philipp and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation , pages 28--39. Association for Computational Linguistics

work page 2017

[17] [17]

Koehn, Philipp, Hieu Hoang, Alexandra Birch, Christopher Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Christopher Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the As...

work page 2007

[18] [18]

Koehn, Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP , Barcelona, Spain

work page 2004

[19] [19]

Luong, Thang, Hieu Pham, and Christopher Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pages 1412--1421. Association for Computational Linguistics"

work page 2015

[20] [20]

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR , abs/1301.3781

work page internal anchor Pith review Pith/arXiv arXiv 2013

[21] [21]

Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation . In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , pages 311--318, Philadelphia, PA

work page 2002

[22] [22]

Parker, Robert, David Graff, Ke Chen, Junbo Kong, and Kazuaki Maeda. 2011. Arabic Gigaword Fifth Edition . LDC catalog number No. LDC2011T11, ISBN 1-58563-595-2

work page 2011

[23] [23]

Pasha, Arfath, Mohamed Al-Badrashiny, Ahmed El Kholy, Ramy Eskander, Mona Diab, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan Roth. 2014. Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In In Proceedings of LREC

work page 2014

[24] [24]

Qi, Ye, Devendra Singh, Matthieu Felix, Sarguna Janani, and Graham Neubig. 2018. When and why are pre-trained word embeddings useful for neural machine translation? CoRR

work page 2018

[25] [25]

Rehurek, Radim and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks , pages 45--50, Valletta, Malta. ELRA

work page 2010

[26] [26]

Salloum, Wael, Heba Elfardy, Linda Alamir-Salloum, Nizar Habash, and Mona Diab. 2014. Sentence level dialect identification for machine translation system selection. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics , pages 772--778

work page 2014

[27] [27]

Sennrich, Rico, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 1715--1725. Association for Computational Linguistics

work page 2016

[28] [28]

Unanue, Inigo, Lierni Arratibel, Ehsan Borzeshi, and Massimo Piccardi. 2018. English-basque statistical and neural machine translation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) , Paris, France. European Language Resources Association (ELRA)

work page 2018

[29] [29]

Attention Is All You Need

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR , abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, and Wolfgang Macherey. 2016. Googles neural machine translation system: Bridging the gap between human and machine translation. CoRR , abs/1609.08144

work page internal anchor Pith review Pith/arXiv arXiv 2016

[31] [31]

Zalmout, Nasser and Nizar Habash. 2017. Optimizing Tokenization Choice for Machine Translation across Multiple Target Languages . The Prague Bulletin of Mathematical Linguistics , 108:257--270, June

work page 2017

[32] [32]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page