arxiv: 1712.06273 · v1 · pith:WDPEK5TNnew · submitted 2017-12-18 · 💻 cs.CL

Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic

Alexander Erdmann , Nizar Habash , Dima Taji , Houda Bouamor This is my paper

classification 💻 cs.CL

keywords arabicdatamodelingparalleltranslationdialect-to-dialectdialectalmachine

0 comments p. Extension

Add this Pith Number to your LaTeX paper

\usepackage{pith}
\pithnumber{WDPEK5TN}

Prints a linked pith:WDPEK5TN badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus. The subject has not previously received serious attention due to lack of naturally occurring parallel data; yet its importance is evidenced by dialectal Arabic's wide usage and breadth of inter-dialect variation, comparable to that of Romance languages. Our results suggest that modeling morphology and syntax significantly improves dialect-to-dialect translation, though optimizing such data-sparse models requires consideration of the linguistic differences between dialects and the nature of available data and resources. On a single-reference blind test set where untranslated input scores 6.5 BLEU and a model trained only on parallel data reaches 14.6, pivot techniques and morphosyntactic modeling significantly improve performance to 17.5.

This paper has not been read by Pith yet.

Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic

discussion (0)