Generating Diverse and Meaningful Captions

Abhijit Mahalunkar; Annika Lindh; Giancarlo Salton; John D. Kelleher; Robert J. Ross

arxiv: 1812.08126 · v1 · pith:V6E7LS56new · submitted 2018-12-19 · 💻 cs.CV · cs.CL· cs.LG

Generating Diverse and Meaningful Captions

Annika Lindh , Robert J. Ross , Abhijit Mahalunkar , Giancarlo Salton , John D. Kelleher This is my paper

classification 💻 cs.CV cs.CLcs.LG

keywords captionsdiverseimagemodelmodelsstate-of-the-arttaskunderstanding

0 comments

read the original abstract

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.

This paper has not been read by Pith yet.

Generating Diverse and Meaningful Captions

discussion (0)