pith. sign in

arxiv: 1603.01359 · v1 · pith:NCN2JJUYnew · submitted 2016-03-04 · 📊 stat.ML · cs.CV· cs.LG

Learning deep representation of multityped objects and tasks

classification 📊 stat.ML cs.CVcs.LG
keywords deepmodelimagemultiplerepresentationarchitecturedescribedmultimodal
0
0 comments X p. Extension
pith:NCN2JJUY Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{NCN2JJUY}

Prints a linked pith:NCN2JJUY badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued). At the same time, the image may have several social tags, which are best described using a sparse binary vector. Our deep model takes as input multiple type-specific features, narrows the cross-modality semantic gaps, learns cross-type correlation, and produces a high-level homogeneous representation. At the same time, the model supports heterogeneously typed tasks. We demonstrate the capacity of the model on two applications: social image retrieval and multiple concept prediction. The deep architecture produces more compact representation, naturally integrates multiviews and multimodalities, exploits better side information, and most importantly, performs competitively against baselines.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.