Image Pivoting for Learning Multilingual Multimodal Representations

Frank Keller; Mirella Lapata; Rico Sennrich; Spandana Gella

arxiv: 1707.07601 · v1 · pith:MHNYWRU4new · submitted 2017-07-24 · 💻 cs.CL · cs.CV

Image Pivoting for Learning Multilingual Multimodal Representations

Spandana Gella , Rico Sennrich , Frank Keller , Mirella Lapata This is my paper

classification 💻 cs.CL cs.CV

keywords imagelanguagesmultilingualdescriptionsdifferentenglishimagesmodel

0 comments

read the original abstract

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

This paper has not been read by Pith yet.

Image Pivoting for Learning Multilingual Multimodal Representations

discussion (0)