Graph-RISE: Graph-Regularized Image Semantic Embedding

arxiv: 1902.10814 · v1 · pith:ILJXUJ4Tnew · submitted 2019-02-14 · 💻 cs.CV · cs.LG· stat.ML

Graph-RISE: Graph-Regularized Image Semantic Embedding

Da-Cheng Juan , Chun-Ta Lu , Zhen Li , Futang Peng , Aleksei Timofeev , Yi-Ting Chen , Yaxi Gao , Tom Duerig

show 2 more authors

Andrew Tomkins Sujith Ravi

This is my paper

classification 💻 cs.CV cs.LGstat.ML

keywords imagegraph-riseembeddingsemanticgraph-regularizedlearningsemanticsstate-of-the-art

0 comments p. Extension

pith:ILJXUJ4T Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{ILJXUJ4T}

Prints a linked pith:ILJXUJ4T badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PaLI: A Jointly-Scaled Multilingual Language-Image Model
cs.CV 2022-09 conditional novelty 7.0

PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.