pith. machine review for the scientific record. sign in

arxiv: 1904.05521 · v2 · submitted 2019-04-11 · 💻 cs.CV · cs.CL· cs.LG

Recognition: unknown

UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations

Authors on Pith no claims yet
classification 💻 cs.CV cs.CLcs.LG
keywords semanticvisualapproachconceptsembeddingslearningrobustnessspace
0
0 comments X
read the original abstract

We propose Unified Visual-Semantic Embeddings (UniVSE) for learning a joint space of visual and textual concepts. The space unifies the concepts at different levels, including objects, attributes, relations, and full scenes. A contrastive learning approach is proposed for the fine-grained alignment from only image-caption pairs. Moreover, we present an effective approach for enforcing the coverage of semantic components that appear in the sentence. We demonstrate the robustness of Unified VSE in defending text-domain adversarial attacks on cross-modal retrieval tasks. Such robustness also empowers the use of visual cues to resolve word dependencies in novel sentences.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.