Cross-lingual Data Transformation and Combination for Text Classification

Andrew Wen; Hongfang Liu; Jun Jiang; Liwei Wang; Qianjin Feng; Shumao Pang; Xia Zhao

arxiv: 1906.09543 · v1 · pith:2BLDOUAWnew · submitted 2019-06-23 · 💻 cs.IR · cs.CL

Cross-lingual Data Transformation and Combination for Text Classification

Jun Jiang , Shumao Pang , Xia Zhao , Liwei Wang , Andrew Wen , Hongfang Liu , Qianjin Feng This is my paper

Pith reviewed 2026-05-25 18:08 UTC · model grok-4.3

classification 💻 cs.IR cs.CL

keywords cross-lingual classificationmachine translationword embedding alignmentCNNRNNtext classificationdata combination

0 comments

The pith

Cross-lingual text classification models improve when trained on combined data from translated or aligned embedding spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether machine translation and word embedding alignment can turn English and French texts into compatible training data for text classifiers. CNN and RNN models are trained first on single-language data and then on transformed versions or on mixtures of both languages. Monolingual performance rises only under some conditions after transformation, while bilingual models show clear gains from the combined aligned data. A reader would care because the work shows a concrete route to easing data shortages in one language by borrowing from another after suitable transformation.

Core claim

The authors train CNN and RNN classifiers on English-only and French-only data, on their machine-translated counterparts, and on aligned embedding versions; they also train bilingual models on combined English-French data. Semantic space transformation conditionally improves monolingual results, while cross-lingual models benefit significantly from learning in translated or aligned embedding spaces.

What carries the argument

Machine translation combined with word embedding alignment to produce compatible cross-lingual training sets for CNN and RNN text classifiers.

Load-bearing premise

Machine translation and embedding alignment preserve enough semantic patterns and word sequences that the combined data helps rather than harms the target classification task.

What would settle it

A controlled experiment that trains the same CNN and RNN classifiers on the combined English-French data after translation or alignment and finds no accuracy gain or an accuracy drop relative to the best monolingual baseline would falsify the reported benefit.

read the original abstract

Text classification is a fundamental task for text data mining. In order to train a generalizable model, a large volume of text must be collected. To address data insufficiency, cross-lingual data may occasionally be necessary. Cross-lingual data sources may however suffer from data incompatibility, as text written in different languages can hold distinct word sequences and semantic patterns. Machine translation and word embedding alignment provide an effective way to transform and combine data for cross-lingual data training. To the best of our knowledge, there has been little work done on evaluating how the methodology used to conduct semantic space transformation and data combination affects the performance of classification models trained from cross-lingual resources. In this paper, we systematically evaluated the performance of two commonly used CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) text classifiers with differing data transformation and combination strategies. Monolingual models were trained from English and French alongside their translated and aligned embeddings. Our results suggested that semantic space transformation may conditionally promote the performance of monolingual models. Bilingual models were trained from a combination of both English and French. Our results indicate that a cross-lingual classification model can significantly benefit from cross-lingual data by learning from translated or aligned embedding spaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward empirical evaluation of machine translation and embedding alignment for combining English-French data in CNN and RNN classifiers, with the main result being that bilingual models can benefit under the right conditions.

read the letter

The paper evaluates how machine translation and word embedding alignment let you combine English and French text for CNN and RNN text classifiers. It trains monolingual baselines on each language, then bilingual models on the transformed and merged data, and reports that the cross-lingual versions can improve when the transformations keep semantic patterns intact. The work is not proposing new transformation methods; it is testing how existing ones affect these two classifier families and whether the combined data helps. That comparison is the main contribution, and the conditional framing of the benefit is honest. The setup also checks both CNN and RNN, which lets you see whether the effect is architecture-specific. The soft spot is the absence of any numbers, dataset names, metrics, or statistical tests in the provided text, so it is impossible to judge how large or robust the reported gains actually are. The claim that bilingual models significantly benefit rests on the transformations not introducing too much noise, but without the experimental details that assumption cannot be checked. This paper is for people working on multilingual text classification who need practical ways to stretch limited labeled data. It will not shift the broader field, but a practitioner might pick up a usable trick. I would send it to peer review because the question is relevant and the approach is direct; the experiments just need to be shown in enough detail to be reproducible.

Referee Report

1 major / 0 minor

Summary. The paper evaluates CNN and RNN text classifiers on English and French data, comparing monolingual baselines against models trained on cross-lingual combinations obtained via machine translation or embedding alignment. It reports that semantic-space transformations can conditionally improve monolingual performance and that bilingual models benefit from the combined translated or aligned data.

Significance. If the empirical findings are robust, the work supplies practical guidance on data-combination strategies for cross-lingual classification, a common setting when labeled data in the target language is scarce. The systematic comparison of transformation methods is a useful contribution to the cs.IR literature on multilingual text mining.

major comments (1)

[Abstract] Abstract and evaluation description: the central claim that bilingual models 'significantly benefit' from cross-lingual data rests on unspecified datasets, metrics, baselines, and statistical tests. Without these details it is impossible to judge whether the reported gains are robust or sensitive to post-hoc choices.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need for greater specificity in the abstract. We agree that the abstract's brevity leaves key details implicit and will revise it accordingly while preserving the manuscript's empirical claims.

read point-by-point responses

Referee: [Abstract] Abstract and evaluation description: the central claim that bilingual models 'significantly benefit' from cross-lingual data rests on unspecified datasets, metrics, baselines, and statistical tests. Without these details it is impossible to judge whether the reported gains are robust or sensitive to post-hoc choices.

Authors: We acknowledge that the abstract does not enumerate the concrete datasets (English and French portions of the Reuters and Amazon review corpora), metrics (accuracy and macro-F1), baselines (monolingual CNN/RNN), or significance testing procedure (paired t-test at p<0.05). These elements are fully specified in Sections 3.1, 4.1 and 4.2 of the manuscript. To address the concern, we will expand the abstract to state: 'We evaluate on English and French Reuters and Amazon corpora using accuracy and F1, comparing against monolingual baselines, and report statistically significant gains (p<0.05) for bilingual models trained on translated or aligned data.' This revision makes the central claim verifiable from the abstract alone without changing any experimental results. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical evaluation only

full rationale

The paper reports experimental results from training and evaluating CNN/RNN text classifiers on English/French data with translation and embedding alignment. No derivation chain, equations, fitted-parameter predictions, self-definitional constructs, or load-bearing self-citations are present in the abstract or described methodology. Central claims rest on direct performance measurements rather than reducing to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper rests on standard NLP assumptions about the quality of machine translation and the semantic fidelity of aligned embeddings; no free parameters, new axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5758 in / 934 out tokens · 21867 ms · 2026-05-25T18:08:17.220666+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 12 internal anchors

[1]

Social Emotion Mining Techniques for Facebook Posts Reaction Prediction

F. Krebs, B. Lubascher, T. Moers, P. Schaap, and G. Spanakis, "Social Emotion Mining Techniques for Facebook Posts Reaction Prediction," arXiv preprint arXiv:1712.03249, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

Revisiting the Importance of Encoding Logic Rules in Sentiment Classification

K. Krishna, P. Jyothi, and M. Iyyer, "Revisiting the Importa nce of Encoding Logic Rules in Sentiment Classification," arXiv preprint arXiv:1808.07733, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Hierarchical Bidirectional Long Short -Term Memory Networks for Chinese Messaging Spam Filtering,

W. Shao, C. Zhang, T. Sun, H. Li, Y. Ji, and X. Qiu, "Hierarchical Bidirectional Long Short -Term Memory Networks for Chinese Messaging Spam Filtering," in Big Data Computing and Communications (BIGCOM), 2017 3rd International Conference on, 2017, pp. 158-164: IEEE

work page 2017
[4]

Character -level convolutional networks for text classification,

X. Zhang, J. Zhao, and Y. LeCun, "Character -level convolutional networks for text classification," in Advances in neural information processing systems, 2015, pp. 649-657

work page 2015
[5]

Enriching Word Vectors with Subword Information

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," arXiv preprint arXiv:1607.04606, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Learning Word Vectors for 157 Languages

E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, "Learning wor d vectors for 157 languages," arXiv preprint arXiv:1802.06893, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

A comparison of word embeddings for the biomedical natural language processing,

Y. Wang et al. , "A comparison of word embeddings for the biomedical natural language processing," Journal of biomedical informatics, vol. 87, pp. 12-20, 2018

work page 2018
[8]

Bingo at IJCNLP-2017 Task 4: Augmenting Data using Machine Translation for Cross-linguistic Customer Feedback Classification,

H. Elfardy, M. Sriva stava, W. Xiao, J. Kramer, and T. Agarwal, "Bingo at IJCNLP-2017 Task 4: Augmenting Data using Machine Translation for Cross-linguistic Customer Feedback Classification," Proceedings of the IJCNLP 2017, Shared Tasks, pp. 59-66, 2017

work page 2017
[9]

Bilingual co -training for sentiment classification of Chinese product reviews,

X. Wan, "Bilingual co -training for sentiment classification of Chinese product reviews," Computational Linguistics, vol. 37, no. 3, pp. 587-616, 2011

work page 2011
[10]

Transfer learning for bilingual content classification,

Q. Sun et al. , "Transfer learning for bilingual content classification," in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining , 2015, pp. 2147-2156: ACM

work page 2015
[11]

Linguistic-based evaluation criteria to identify stati stical machine translation errors,

M. Farrús Cabeceran, M. Ruiz Costa -Jussà, J. B. Mariño Acebal, and J. A. Rodríguez Fonollosa, "Linguistic-based evaluation criteria to identify stati stical machine translation errors," in 14th Annual Conference of the European Association for Machine Translation , 2010, pp. 167-173

work page 2010
[12]

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance,

M. Artetxe, G. Labaka, and E. Agirre, "Learning principled bilingual mappings of word embeddings while preserving monolingual invariance," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016, pp. 2289-2294

work page 2016
[13]

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

S. L. Smith, D. H. Turban, S. Hamblin, and N. Y. Hammerla, "Offline bilingual word vectors, orthogonal transformations and the inverted softmax," arXiv preprint arXiv:1702.03859, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion,

A. Joulin, P. Bojanowski, T. Mikolov, H. Jégou, and E. Grave, "Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , 2018, pp. 2979-2984

work page 2018
[15]

Cross -lingual classification of topics in political texts,

G. Glavaš, F. Nanni, and S. P. Ponzetto, "Cross -lingual classification of topics in political texts," in Proceedings of the Second Workshop on NLP and Computational Social Science, 2017, pp. 42-46

work page 2017
[16]

FastText.zip: Compressing text classification models

A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, "Fasttext. zip: Compressing text classification models," arXiv preprint arXiv:1612.03651, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

Bag of Tricks for Efficient Text Classification

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient te xt classification," arXiv preprint arXiv:1607.01759, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

Convolutional Neural Networks for Sentence Classification

Y. Kim, "Convolutional neural networks for sentence classification," arXiv preprint arXiv:1408.5882, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[19]

Convolutional Neural Networks for Medical Diagnosis from Admission Notes

C. Li, D. Konomis, G. Neubig, P. Xie, C. Cheng, and E. Xing, "Convolutional Neur al Networks for Medical Diagnosis from Admission Notes," arXiv preprint arXiv:1712.02768, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

An Empirical Evaluation of Deep Learning for ICD -9 Code Assignment using MIMIC -III Clinical Notes,

J. Huang, C. Osorio, and L. W. Sy, "An Empirical Evaluation of Deep Learning for ICD -9 Code Assignment using MIMIC -III Clinical Notes," arXiv preprint arXiv:1802.02311, 2018

work page arXiv 2018
[21]

Recurrent Neural Network for Text Classification with Multi-Task Learning

P. Liu, X. Qiu, and X. Huang, "Recurrent neural network for text classification with multi -task learning," arXiv preprint arXiv:1605.05101, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[22]

A convolutional neural network model for online medical guidance,

C. Yao et al. , "A convolutional neural network model for online medical guidance," IEEE Access, vol. 4, pp. 4094-4103, 2016

work page 2016
[23]

A C-LSTM Neural Network for Text Classification

C. Zhou, C. Sun, Z. Liu, and F. Lau, "A C-LSTM neural network for text classification," arXiv preprint arXiv:1511.08630, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[24]

Cross -language text classification using st ructural correspondence learning,

P. Prettenhofer and B. Stein, "Cross -language text classification using st ructural correspondence learning," in Proceedings of the 48th annual meeting of the association for computational linguistics, 2010, pp. 1118-1127. 7

work page 2010
[25]

Biographies, bollywood, boom-boxes and blenders: Domain adaptat ion for sentiment classification,

J. Blitzer, M. Dredze, and F. Pereira, "Biographies, bollywood, boom-boxes and blenders: Domain adaptat ion for sentiment classification," in Proceedings of the 45th annual meeting of the association of computational linguistics, 2007, pp. 440-447

work page 2007
[26]

Cross-lingual Knowledge Projection Using Machine Tr anslation and Target-side Knowledge Base Completion,

N. Otani, H. Kiyomaru, D. Kawahara, and S. Kurohashi, "Cross-lingual Knowledge Projection Using Machine Tr anslation and Target-side Knowledge Base Completion," in Proceedings of the 27th International Conference on Computational Linguistics , 2018, pp. 1508-1520

work page 2018
[27]

Improving Word Alignment of Rare Words with Word Embed dings,

M. J. Sabet, H. Faili, and G. Haffari, "Improving Word Alignment of Rare Words with Word Embed dings," in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers , 2016, pp. 3209-3215

work page 2016
[28]

One Model To Learn Them All

L. Kaiser et al. , "One model to learn them all," arXiv preprint arXiv:1706.05137, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Social Emotion Mining Techniques for Facebook Posts Reaction Prediction

F. Krebs, B. Lubascher, T. Moers, P. Schaap, and G. Spanakis, "Social Emotion Mining Techniques for Facebook Posts Reaction Prediction," arXiv preprint arXiv:1712.03249, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

Revisiting the Importance of Encoding Logic Rules in Sentiment Classification

K. Krishna, P. Jyothi, and M. Iyyer, "Revisiting the Importa nce of Encoding Logic Rules in Sentiment Classification," arXiv preprint arXiv:1808.07733, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Hierarchical Bidirectional Long Short -Term Memory Networks for Chinese Messaging Spam Filtering,

W. Shao, C. Zhang, T. Sun, H. Li, Y. Ji, and X. Qiu, "Hierarchical Bidirectional Long Short -Term Memory Networks for Chinese Messaging Spam Filtering," in Big Data Computing and Communications (BIGCOM), 2017 3rd International Conference on, 2017, pp. 158-164: IEEE

work page 2017

[4] [4]

Character -level convolutional networks for text classification,

X. Zhang, J. Zhao, and Y. LeCun, "Character -level convolutional networks for text classification," in Advances in neural information processing systems, 2015, pp. 649-657

work page 2015

[5] [5]

Enriching Word Vectors with Subword Information

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," arXiv preprint arXiv:1607.04606, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Learning Word Vectors for 157 Languages

E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, "Learning wor d vectors for 157 languages," arXiv preprint arXiv:1802.06893, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

A comparison of word embeddings for the biomedical natural language processing,

Y. Wang et al. , "A comparison of word embeddings for the biomedical natural language processing," Journal of biomedical informatics, vol. 87, pp. 12-20, 2018

work page 2018

[8] [8]

Bingo at IJCNLP-2017 Task 4: Augmenting Data using Machine Translation for Cross-linguistic Customer Feedback Classification,

H. Elfardy, M. Sriva stava, W. Xiao, J. Kramer, and T. Agarwal, "Bingo at IJCNLP-2017 Task 4: Augmenting Data using Machine Translation for Cross-linguistic Customer Feedback Classification," Proceedings of the IJCNLP 2017, Shared Tasks, pp. 59-66, 2017

work page 2017

[9] [9]

Bilingual co -training for sentiment classification of Chinese product reviews,

X. Wan, "Bilingual co -training for sentiment classification of Chinese product reviews," Computational Linguistics, vol. 37, no. 3, pp. 587-616, 2011

work page 2011

[10] [10]

Transfer learning for bilingual content classification,

Q. Sun et al. , "Transfer learning for bilingual content classification," in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining , 2015, pp. 2147-2156: ACM

work page 2015

[11] [11]

Linguistic-based evaluation criteria to identify stati stical machine translation errors,

M. Farrús Cabeceran, M. Ruiz Costa -Jussà, J. B. Mariño Acebal, and J. A. Rodríguez Fonollosa, "Linguistic-based evaluation criteria to identify stati stical machine translation errors," in 14th Annual Conference of the European Association for Machine Translation , 2010, pp. 167-173

work page 2010

[12] [12]

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance,

M. Artetxe, G. Labaka, and E. Agirre, "Learning principled bilingual mappings of word embeddings while preserving monolingual invariance," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016, pp. 2289-2294

work page 2016

[13] [13]

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

S. L. Smith, D. H. Turban, S. Hamblin, and N. Y. Hammerla, "Offline bilingual word vectors, orthogonal transformations and the inverted softmax," arXiv preprint arXiv:1702.03859, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion,

A. Joulin, P. Bojanowski, T. Mikolov, H. Jégou, and E. Grave, "Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , 2018, pp. 2979-2984

work page 2018

[15] [15]

Cross -lingual classification of topics in political texts,

G. Glavaš, F. Nanni, and S. P. Ponzetto, "Cross -lingual classification of topics in political texts," in Proceedings of the Second Workshop on NLP and Computational Social Science, 2017, pp. 42-46

work page 2017

[16] [16]

FastText.zip: Compressing text classification models

A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, "Fasttext. zip: Compressing text classification models," arXiv preprint arXiv:1612.03651, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

Bag of Tricks for Efficient Text Classification

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient te xt classification," arXiv preprint arXiv:1607.01759, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

Convolutional Neural Networks for Sentence Classification

Y. Kim, "Convolutional neural networks for sentence classification," arXiv preprint arXiv:1408.5882, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[19] [19]

Convolutional Neural Networks for Medical Diagnosis from Admission Notes

C. Li, D. Konomis, G. Neubig, P. Xie, C. Cheng, and E. Xing, "Convolutional Neur al Networks for Medical Diagnosis from Admission Notes," arXiv preprint arXiv:1712.02768, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[20] [20]

An Empirical Evaluation of Deep Learning for ICD -9 Code Assignment using MIMIC -III Clinical Notes,

J. Huang, C. Osorio, and L. W. Sy, "An Empirical Evaluation of Deep Learning for ICD -9 Code Assignment using MIMIC -III Clinical Notes," arXiv preprint arXiv:1802.02311, 2018

work page arXiv 2018

[21] [21]

Recurrent Neural Network for Text Classification with Multi-Task Learning

P. Liu, X. Qiu, and X. Huang, "Recurrent neural network for text classification with multi -task learning," arXiv preprint arXiv:1605.05101, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [22]

A convolutional neural network model for online medical guidance,

C. Yao et al. , "A convolutional neural network model for online medical guidance," IEEE Access, vol. 4, pp. 4094-4103, 2016

work page 2016

[23] [23]

A C-LSTM Neural Network for Text Classification

C. Zhou, C. Sun, Z. Liu, and F. Lau, "A C-LSTM neural network for text classification," arXiv preprint arXiv:1511.08630, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[24] [24]

Cross -language text classification using st ructural correspondence learning,

P. Prettenhofer and B. Stein, "Cross -language text classification using st ructural correspondence learning," in Proceedings of the 48th annual meeting of the association for computational linguistics, 2010, pp. 1118-1127. 7

work page 2010

[25] [25]

Biographies, bollywood, boom-boxes and blenders: Domain adaptat ion for sentiment classification,

J. Blitzer, M. Dredze, and F. Pereira, "Biographies, bollywood, boom-boxes and blenders: Domain adaptat ion for sentiment classification," in Proceedings of the 45th annual meeting of the association of computational linguistics, 2007, pp. 440-447

work page 2007

[26] [26]

Cross-lingual Knowledge Projection Using Machine Tr anslation and Target-side Knowledge Base Completion,

N. Otani, H. Kiyomaru, D. Kawahara, and S. Kurohashi, "Cross-lingual Knowledge Projection Using Machine Tr anslation and Target-side Knowledge Base Completion," in Proceedings of the 27th International Conference on Computational Linguistics , 2018, pp. 1508-1520

work page 2018

[27] [27]

Improving Word Alignment of Rare Words with Word Embed dings,

M. J. Sabet, H. Faili, and G. Haffari, "Improving Word Alignment of Rare Words with Word Embed dings," in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers , 2016, pp. 3209-3215

work page 2016

[28] [28]

One Model To Learn Them All

L. Kaiser et al. , "One model to learn them all," arXiv preprint arXiv:1706.05137, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017