pith. machine review for the scientific record. sign in

arxiv: 1605.00459 · v1 · submitted 2016-05-02 · 💻 cs.CL · cs.CV

Recognition: unknown

Multi30K: Multilingual English-German Image Descriptions

Authors on Pith no claims yet
classification 💻 cs.CL cs.CV
keywords descriptionsimagedatasetdescriptionenglishmultilingualdatamulti30k
0
0 comments X
read the original abstract

We introduce the Multi30K dataset to stimulate multilingual multimodal research. Recent advances in image description have been demonstrated on English-language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions. We outline how the data can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Multimodal Dataset for Visually Grounded Ambiguity in Machine Translation

    cs.CL 2026-05 unverdicted novelty 6.0

    VIDA provides 2,500 visually-dependent ambiguous MT instances and LLM-judge metrics; chain-of-thought SFT improves disambiguation accuracy over standard SFT, especially out-of-distribution.

  2. Video-guided Machine Translation with Global Video Context

    cs.CV 2026-04 unverdicted novelty 4.0

    A globally video-guided multimodal translation framework retrieves semantically related video segments with a vector database and applies attention mechanisms to improve subtitle translation accuracy in long videos.