Bridging Neural Machine Translation and Bilingual Dictionaries

Chengqing Zong; Jiajun Zhang

arxiv: 1610.07272 · v1 · pith:IIG5BNKJnew · submitted 2016-10-24 · 💻 cs.CL

Bridging Neural Machine Translation and Bilingual Dictionaries

Jiajun Zhang , Chengqing Zong This is my paper

classification 💻 cs.CL

keywords bilingualtranslationdictionariesdictionarymachinemethodsneuralpairs

0 comments

read the original abstract

Neural Machine Translation (NMT) has become the new state-of-the-art in several language pairs. However, it remains a challenging problem how to integrate NMT with a bilingual dictionary which mainly contains words rarely or never seen in the bilingual training data. In this paper, we propose two methods to bridge NMT and the bilingual dictionaries. The core idea behind is to design novel models that transform the bilingual dictionaries into adequate sentence pairs, so that NMT can distil latent bilingual mappings from the ample and repetitive phenomena. One method leverages a mixed word/character model and the other attempts at synthesizing parallel sentences guaranteeing massive occurrence of the translation lexicon. Extensive experiments demonstrate that the proposed methods can remarkably improve the translation quality, and most of the rare words in the test sentences can obtain correct translations if they are covered by the dictionary.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models
cs.CL 2025-07 conditional novelty 6.0

SLoW selects low-frequency word dictionaries to boost LLM translation quality and efficiency across 100 languages from FLORES.
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models
cs.CL 2024-11 unverdicted novelty 6.0

DIP interleaves English word translations into non-English prompts to boost multilingual reasoning on synthetic benchmarks spanning 10-200 languages.