arxiv: 1806.05210 · v2 · pith:L4724ROLnew · submitted 2018-06-13 · 💻 cs.CL

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

Gongbo Tang , Fabienne Cap , Eva Pettersson , Joakim Nivre This is my paper

classification 💻 cs.CL

keywords modelsdifferenthistoricalnormalizationspellingbetterlanguagesneural

0 comments p. Extension

Add this Pith Number to your LaTeX paper

\usepackage{pith}
\pithnumber{L4724ROL}

Prints a linked pith:L4724ROL badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The NMT models are at different levels, have different attention mechanisms, and different neural network architectures. Our results show that NMT models are much better than SMT models in terms of character error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization. Transformer models perform better only when provided with more training data. We also find that subword-level models with a small subword vocabulary are better than character-level models for low-resource languages. In addition, we propose a hybrid method which further improves the performance of historical spelling normalization.

This paper has not been read by Pith yet.

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

discussion (0)