pith. machine review for the scientific record. sign in

arxiv: 1603.08155 · v1 · pith:VT6YGX43new · submitted 2016-03-27 · 💻 cs.CV · cs.LG

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

classification 💻 cs.CV cs.LG
keywords imagelossperceptualfeed-forwardnetworksresultsemphfunctions
0
0 comments X
read the original abstract

We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a \emph{per-pixel} loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing \emph{perceptual} loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Voxify3D: Pixel Art Meets Volumetric Rendering

    cs.CV 2025-12 unverdicted novelty 7.0

    Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

  2. Phenaki: Variable Length Video Generation From Open Domain Textual Description

    cs.CV 2022-10 unverdicted novelty 7.0

    Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images ...

  3. Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations

    cs.LG 2026-05 unverdicted novelty 5.0

    Di-BiLPS combines a variational autoencoder, latent diffusion, and contrastive learning to achieve state-of-the-art accuracy on PDE problems with as little as 3% observations while supporting zero-shot super-resolutio...

  4. MSDS: Deep Structural Similarity with Multiscale Representation

    cs.CV 2026-04 unverdicted novelty 4.0

    MSDS computes DeepSSIM at multiple pyramid scales and fuses the scores with learned weights, producing consistent improvements over single-scale DeepSSIM on IQA benchmarks with negligible extra cost.

  5. SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs

    cs.CV 2026-04 unverdicted novelty 4.0

    SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.

  6. Low Light Image Enhancement Challenge at NTIRE 2026

    cs.CV 2026-04 unverdicted novelty 2.0

    NTIRE 2026 challenge report shows progress in low-light image enhancement via 22 submitted networks evaluated on a new dataset.