Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Alfonso Medina-Urrea; Elisabeth Mager; Ivan Meza; Katharina Kann; Manuel Mager

arxiv: 1807.00286 · v1 · pith:YPBGOKGKnew · submitted 2018-07-01 · 💻 cs.CL

Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

Manuel Mager , Elisabeth Mager , Alfonso Medina-Urrea , Ivan Meza , Katharina Kann This is my paper

classification 💻 cs.CL

keywords translationpolysyntheticlanguagesamountanalysisfurtherfusionalinformation

0 comments

read the original abstract

Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic translation to and from polysynthetic languages, we study translations from three low-resource, polysynthetic languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. Doing so, we find that in a morpheme-to-morpheme alignment an important amount of information contained in polysynthetic morphemes has no Spanish counterpart, and its translation is often omitted. We further conduct a qualitative analysis and, thus, identify morpheme types that are commonly hard to align or ignored in the translation process.

This paper has not been read by Pith yet.

Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages

discussion (0)