Punny Captions: Witty Wordplay in Image Descriptions

Arjun Chandrasekaran , Devi Parikh , Mohit Bansal

Authors on Pith no claims yet

classification 💻 cs.CL cs.AIcs.CV

keywords descriptionsimagewittymodelapproachesgivenhumanstyle

read the original abstract

Wit is a form of rich interaction that is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns, in image descriptions. We develop two approaches which involve retrieving witty descriptions for a given image from a large corpus of sentences, or generating them via an encoder-decoder neural network architecture. We compare our approach against meaningful baseline approaches via human studies and show substantial improvements. We find that when a human is subject to similar constraints as the model regarding word usage and style, people vote the image descriptions generated by our model to be slightly wittier than human-written witty descriptions. Unsurprisingly, humans are almost always wittier than the model when they are free to choose the vocabulary, style, etc.

This paper has not been read by Pith yet.

Punny Captions: Witty Wordplay in Image Descriptions

discussion (0)