Autoencoding beyond pixels using a learned similarity metric

Anders Boesen Lindbo Larsen; Hugo Larochelle; Ole Winther; S{\o}ren Kaae S{\o}nderby

arxiv: 1512.09300 · v2 · pith:IUZRCJH2new · submitted 2015-12-31 · 💻 cs.LG · cs.CV· stat.ML

Autoencoding beyond pixels using a learned similarity metric

Anders Boesen Lindbo Larsen , S{\o}ren Kaae S{\o}nderby , Hugo Larochelle , Ole Winther This is my paper

classification 💻 cs.LG cs.CVstat.ML

keywords learnedautoencoderbetterdataelement-wiseerrorsmethodrepresentations

0 comments

read the original abstract

We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Density estimation using Real NVP
cs.LG 2016-05 accept novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.
Hierarchical Sequence to Sequence Voice Conversion with Limited Data
eess.AS 2019-07 unverdicted novelty 4.0

Hierarchical seq2seq model for parallel voice conversion pretrained as autoencoder on single-speaker data then adapted to limited multispeaker data, using mel spectrograms converted via wavenet vocoder.