pith. sign in

arxiv: 1610.07844 · v1 · pith:MFHXH36Hnew · submitted 2016-10-25 · 💻 cs.CL

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

classification 💻 cs.CL
keywords historicalnormalizationdatadeeplearningmodelmulti-tasknetwork
0
0 comments X
read the original abstract

Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitability of a deep neural network architecture for this task, particularly a deep bi-LSTM network applied on a character level. Our model compares well to previously established normalization algorithms when evaluated on a diverse set of texts from Early New High German. We show that multi-task learning with additional normalization data can improve our model's performance further.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.