Unsupervised Textual Grounding: Linking Words to Image Concepts

Alexander G. Schwing; Minh N. Do; Raymond A. Yeh

arxiv: 1803.11185 · v1 · pith:HJPGIB52new · submitted 2018-03-29 · 💻 cs.CV

Unsupervised Textual Grounding: Linking Words to Image Concepts

Raymond A. Yeh , Minh N. Do , Alexander G. Schwing This is my paper

classification 💻 cs.CV

keywords groundingtextualwordsconceptsdatasetdeepimagelearning

0 comments

read the original abstract

Textual grounding, i.e., linking words to objects in images, is a challenging but important task for robotics and human-computer interaction. Existing techniques benefit from recent progress in deep learning and generally formulate the task as a supervised learning problem, selecting a bounding box from a set of possible options. To train these deep net based approaches, access to a large-scale datasets is required, however, constructing such a dataset is time-consuming and expensive. Therefore, we develop a completely unsupervised mechanism for textual grounding using hypothesis testing as a mechanism to link words to detected image concepts. We demonstrate our approach on the ReferIt Game dataset and the Flickr30k data, outperforming baselines by 7.98% and 6.96% respectively.

This paper has not been read by Pith yet.

Unsupervised Textual Grounding: Linking Words to Image Concepts

discussion (0)