Grounding Visual Explanations (Extended Abstract)

Lisa Anne Hendricks; Ronghang Hu; Trevor Darrell; Zeynep Akata

arxiv: 1711.06465 · v1 · pith:N46CGTKYnew · submitted 2017-11-17 · 💻 cs.CV

Grounding Visual Explanations (Extended Abstract)

Lisa Anne Hendricks , Ronghang Hu , Trevor Darrell , Zeynep Akata This is my paper

classification 💻 cs.CV

keywords explanationsimagecandidatemodelexplanationgeneratedgroundingloss

0 comments

read the original abstract

Existing models which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a new model is proposed for generating explanations by utilizing localized grounding of constituent phrases in generated explanations to ensure image relevance. Specifically, we introduce a phrase-critic model to refine (re-score/re-rank) generated candidate explanations and employ a relative-attribute inspired ranking loss using "flipped" phrases as negative examples for training. At test time, our phrase-critic model takes an image and a candidate explanation as input and outputs a score indicating how well the candidate explanation is grounded in the image.

This paper has not been read by Pith yet.

Grounding Visual Explanations (Extended Abstract)

discussion (0)