pith. sign in

arxiv: 1705.04350 · v2 · pith:WDX5CI5Fnew · submitted 2017-05-11 · 💻 cs.CL · cs.CV

Imagination improves Multimodal Translation

classification 💻 cs.CL cs.CV
keywords translationlearningdatasetexternalgroundedimageimproveslearned
0
0 comments X
read the original abstract

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation

    cs.CL 2019-07 unverdicted novelty 7.0

    The paper releases the first multimodal English-Hindi machine translation dataset of 31,525 segments with images and a challenge test set of 1,400 segments selected via embedding similarity for image-resolvable ambiguities.