Cross-modal Variational Auto-encoder with Distributed Latent Spaces and Associators

ByeongJu Lee; Dae Ung Jo; Haanju Yoo; Jin Young Choi; Jongwon Choi

arxiv: 1905.12867 · v1 · pith:TXKFB7KVnew · submitted 2019-05-30 · 💻 cs.LG · stat.ML

Cross-modal Variational Auto-encoder with Distributed Latent Spaces and Associators

Dae Ung Jo , ByeongJu Lee , Jongwon Choi , Haanju Yoo , Jin Young Choi This is my paper

classification 💻 cs.LG stat.ML

keywords structurecross-modaldataproposedvariationalassociatorsauto-encodersassociation

0 comments

read the original abstract

In this paper, we propose a novel structure for a cross-modal data association, which is inspired by the recent research on the associative learning structure of the brain. We formulate the cross-modal association in Bayesian inference framework realized by a deep neural network with multiple variational auto-encoders and variational associators. The variational associators transfer the latent spaces between auto-encoders that represent different modalities. The proposed structure successfully associates even heterogeneous modal data and easily incorporates the additional modality to the entire network via the proposed cross-modal associator. Furthermore, the proposed structure can be trained with only a small amount of paired data since auto-encoders can be trained by unsupervised manner. Through experiments, the effectiveness of the proposed structure is validated on various datasets including visual and auditory data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Template Collapse and Information-Theoretic Limits in Camera rPPG Pulse Morphology Restoration
cs.CV 2026-06 unverdicted novelty 6.0

Empirical tests of 16 architectures on 153 subjects show camera rPPG signals contain no recoverable subject-specific pulse morphology, with all models exhibiting template collapse.