pith. the verified trust layer for science. sign in

arxiv: 1907.06210 · v1 · pith:7EXELTDWnew · submitted 2019-07-14 · 💻 cs.CL

Simple Automatic Post-editing for Arabic-Japanese Machine Translation

classification 💻 cs.CL
keywords translationlanguagemachineparallelarabic-japaneseautomaticcorpusdomains
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{7EXELTDW}

Prints a linked pith:7EXELTDW badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

A common bottleneck for developing machine translation (MT) systems for some language pairs is the lack of direct parallel translation data sets, in general and in certain domains. Alternative solutions such as zero-shot models or pivoting techniques are successful in getting a strong baseline, but are often below the more supported language-pair systems. In this paper, we focus on Arabic-Japanese machine translation, a less studied language pair; and we work with a unique parallel corpus of Arabic news articles that were manually translated to Japanese. We use this parallel corpus to adapt a state-of-the-art domain/genre agnostic neural MT system via a simple automatic post-editing technique. Our results and detailed analysis suggest that this approach is quite viable for less supported language pairs in specific domains.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.