Empirical Methods for Compound Splitting

arxiv: cs/0302032 · v1 · submitted 2003-02-22 · 💻 cs.CL

Empirical Methods for Compound Splitting

Philipp Koehn , Kevin Knight This is my paper

classification 💻 cs.CL

keywords methodsperformancesplittingtranslationaccuracyapplicationsbleuchallenge

0 comments p. Extension

read the original abstract

Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of 99.1% and performance gains for MT of 0.039 BLEU on a German-English noun phrase translation task.

This paper has not been read by Pith yet.

Empirical Methods for Compound Splitting

discussion (0)