Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Alexandre Alahi; Justin Johnson; Li Fei-Fei

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1603.08155 v1 pith:VT6YGX43 submitted 2016-03-27 cs.CV cs.LG

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Justin Johnson , Alexandre Alahi , Li Fei-Fei This is my paper

classification cs.CV cs.LG

keywords imagelossperceptualfeed-forwardnetworksresultsemphfunctions

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a \emph{per-pixel} loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing \emph{perceptual} loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DeLux: Cross-Modal Local Artifact Restoration in Video Using Neuromorphic Data
cs.CV 2026-06 unverdicted novelty 7.0

DeLux restores local lighting artifacts in RGB video by leveraging neuromorphic event data, outperforming RGB-only and event-guided HDR baselines with MS-SSIM over 0.99 and up to 88% artifact reduction.
How Neural Losses Shape VAE Latents
cs.LG 2026-05 unverdicted novelty 7.0

Neural reconstruction losses in VAEs reduce latent information content and produce more isotropic latent geometries with even uncertainty distribution.
Voxify3D: Pixel Art Meets Volumetric Rendering
cs.CV 2025-12 unverdicted novelty 7.0

Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.
Phenaki: Variable Length Video Generation From Open Domain Textual Description
cs.CV 2022-10 unverdicted novelty 7.0

Phenaki generates arbitrary-length videos from sequences of text prompts by tokenizing videos with causal temporal attention and generating tokens with a text-conditioned masked transformer, trained jointly on images ...
Unsupervised Susceptibility Distortion Correction of EPI without Calibration Scans via Image Translation-Based Registration
eess.IV 2026-06 unverdicted novelty 5.0

SACRED performs unsupervised susceptibility distortion correction of EPI fMRI via image translation-based registration between T1w and unidirectional BOLD images, with test-time adaptation for robustness.
Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations
cs.LG 2026-05 unverdicted novelty 5.0

Di-BiLPS combines a variational autoencoder, latent diffusion, and contrastive learning to achieve state-of-the-art accuracy on PDE problems with as little as 3% observations while supporting zero-shot super-resolutio...
Multimodal Image Colorization: Quantifying the Impact of Text-Conditioned Guidance on Grayscale-to-Color Translation
cs.GR 2026-06 unverdicted novelty 4.0

Text conditioning improves PSNR by ~5.7%, SSIM by ~1.4%, colorfulness by up to 36.6%, and reduces LPIPS by ~9.5% across U-Net and Stable Diffusion colorization models.
Layer Selection in Feature-Based Losses Affects Image Quality and Microstructural Consistency in Deep Learning Super-Resolution of Brain Diffusion MRI
eess.IV 2026-05 unverdicted novelty 4.0

Deeper VGG16 layers in feature losses for diffusion MRI super-resolution introduce persistent grid artifacts in images and anisotropy maps, whereas the shallowest layer preserves consistency with ground truth at high ...
MSDS: Deep Structural Similarity with Multiscale Representation
cs.CV 2026-04 unverdicted novelty 4.0

MSDS computes DeepSSIM at multiple pyramid scales and fuses the scores with learned weights, producing consistent improvements over single-scale DeepSSIM on IQA benchmarks with negligible extra cost.
SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs
cs.CV 2026-04 unverdicted novelty 4.0

SAGE-GAN integrates a self-attention U-Net into a CycleGAN framework to generate realistic synthetic electron microscopy image-mask pairs that augment training data for nanoparticle segmentation without human labeling.
Machine Learning Techniques for Astrophysics and Cosmology: Lyman-$\alpha$ forest
astro-ph.CO 2026-05 unverdicted novelty 2.0

Review of machine learning applications for analyzing Lyman-alpha forest observations to probe cosmology, reionization, and dark matter.
Low Light Image Enhancement Challenge at NTIRE 2026
cs.CV 2026-04 unverdicted novelty 2.0

NTIRE 2026 challenge report shows progress in low-light image enhancement via 22 submitted networks evaluated on a new dataset.
Low Light Image Enhancement Challenge at NTIRE 2026
cs.CV 2026-04 unverdicted novelty 2.0

Report on the NTIRE 2026 Low Light Image Enhancement Challenge that evaluates 22 team submissions for joint denoising and enhancement on a new dataset.