Instance Normalization: The Missing Ingredient for Fast Stylization
read the original abstract
It this paper we revisit the fast stylization method introduced in Ulyanov et. al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code will is made available on github at https://github.com/DmitryUlyanov/texture_nets. Full paper can be found at arXiv:1701.02096.
This paper has not been read by Pith yet.
Forward citations
Cited by 38 Pith papers
-
Riemannian Networks over Full-Rank Correlation Matrices
Riemannian networks are introduced for the full-rank correlation matrix manifold by extending MLR, FC, and convolutional layers to five geometries with backpropagation methods for two, showing effectiveness over SPD a...
-
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
-
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
-
QuadNorm: Resolution-Robust Normalization for Neural Operators
QuadNorm uses quadrature-based moments instead of uniform averaging in normalization layers, achieving O(h²) consistency across resolutions and better cross-resolution transfer in neural operators.
-
Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity
Every fixed finite feedforward neural network definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting.
-
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
A parameter-free input-output wrapper exactly parameterizes all normalization-equivariant functions on arbitrary backbones and improves blind denoising robustness to noise mismatch with zero GPU overhead.
-
Normalization Equivariance for Arbitrary Backbones, with Application to Image Denoising
A normalize-process-denormalize wrapper enforces normalization equivariance on arbitrary backbones, improving robustness to distribution shift in image denoising with no overhead.
-
StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
StyleID supplies human-perception-aligned benchmarks and fine-tuned encoders that improve facial identity recognition robustness across stylization types and strengths.
-
High-Speed Full-Color HDR Imaging via Unwrapping Modulo-Encoded Spike Streams
An exposure-decoupled modulo formulation and iteration-free diffusion-prior unwrapping enable 1000 FPS full-color HDR imaging on spike cameras while cutting bandwidth from 20 Gbps to 6 Gbps.
-
Deep Time Series Models: A Comprehensive Survey and Benchmark
This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
-
Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, ...
-
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
U-Mamba is a hybrid CNN-SSM architecture that outperforms prior CNN and Transformer networks on biomedical image segmentation tasks by efficiently modeling long-range dependencies.
-
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
PatchTST uses subseries patching and channel-independent Transformers to deliver significantly better long-term multivariate time series forecasting and strong self-supervised transfer performance.
-
Switchable Normalization for Learning-to-Normalize Deep Representation
Switchable Normalization learns per-layer weights to combine channel, layer, and minibatch normalizers, claiming robustness to batch size and better results than fixed normalizers on ImageNet, COCO, CityScapes, ADE20K...
-
Half a Percent of Labels is Enough: Efficient Animal Detection in UAV Imagery using Deep CNNs and Active Learning
Transfer Sampling with Optimal Transport and window cropping finds nearly 80% of animals in new UAV datasets using under 0.5% of labels.
-
Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver
The CARM module boosts neural routing solvers by adaptively modulating embeddings with constraint variables, enabling better use of global observations and improved performance on constrained VRPs.
-
Linearizing Vision Transformer with Test-Time Training
Using Test-Time Training's structural match to Softmax attention plus key normalization and locality modules allows inheriting pretrained weights and fine-tuning Stable Diffusion 3.5 in one hour to match quality while...
-
Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?
Natural-domain foundation models provide competitive and more robust priors than task-specific models for accelerated cardiac MRI reconstruction in cross-domain settings.
-
A Fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation
A hybrid Transformer-UNet model with energy-shifting inputs generates 6 MV LINAC dose maps from monoenergetic data, achieving over 98% gamma passing rate (3%/3mm) versus full Monte Carlo for prostate radiotherapy.
-
Time-Domain Voice Identity Morphing (TD-VIM): A Signal-Level Approach to Morphing Attacks on Speaker Verification Systems
TD-VIM creates signal-level morphed voice samples that achieve G-MAP attack success rates up to 99.74% against deep-learning and commercial speaker verification systems.
-
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
GCGNet uses a variational generator, graph structure aligner, and graph refiner to jointly capture temporal and channel correlations in time series forecasting with exogenous variables, outperforming baselines on 12 r...
-
Learning to accelerate distributed ADMM using graph neural networks
A GNN is trained to predict adaptive step sizes and weights for distributed ADMM by unrolling a fixed number of iterations and minimizing solution error on a problem class.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
-
Order Matters: Shuffling Sequence Generation for Video Prediction
SEE-Net improves video prediction by using frame shuffling to enforce learning of natural temporal order, reporting state-of-the-art results on three synthetic and real-world datasets.
-
Generative Modeling by Estimating Gradients of the Data Distribution
Score-based generative modeling via multi-noise-level score matching and annealed Langevin dynamics produces samples on par with GANs and sets a new inception score record on CIFAR-10.
-
A Convolutional Decoder for Point Clouds using Adaptive Instance Normalization
A point cloud decoder using Adaptive Instance Normalization outperforms prior methods in auto-encoding, upsampling, and single-view reconstruction tasks.
-
SegGuidedNet: Sub-Region-Aware Attention Supervision for Interpretable Brain Tumor Segmentation
SegGuidedNet achieves Dice scores of 0.905 on BraTS2021 and 0.897 on BraTS2023 with sub-region attention supervision that adds under 0.2% parameters and provides free spatial interpretability.
-
USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation
USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medi...
-
Style-Based Neural Architectures for Real-Time Weather Classification
Three style-based neural architectures are proposed for real-time weather classification from images, with two truncated ResNet variants claimed to outperform prior methods and generalize across public datasets.
-
Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift
Reversible Residual Normalization (RRN) introduces spatially-aware invertible residual blocks that combine center normalization with spectral-constrained graph convolutions to mitigate spatio-temporal distribution shi...
-
TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting
TimePre unifies MLP speed and MCL distributional power via Stabilized Instance Normalization to deliver SOTA probabilistic accuracy, orders-of-magnitude faster inference, and improved stability over prior MCL methods.
-
Annotation-Free Cardiac Vessel Segmentation via Knowledge Transfer from Retinal Images
SC-GAN performs annotation-free coronary artery segmentation by transferring shape-consistent knowledge from retinal vessel annotations via a GAN trained on 1092 DSA images.
-
High-throughput Onboard Hyperspectral Image Compression with Ground-based CNN Reconstruction
Prequantization-based lossless predictive compression onboard hyperspectral images with CNN ground reconstruction recovers the entire SNR drop at 2 bpp.
-
Disentangled Makeup Transfer with Generative Adversarial Network
DMT uses identity and makeup encoders in a GAN to enable controllable makeup transfer from references and sampling of new styles from a prior distribution.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
Adapted Center and Scale Prediction: More Stable and More Accurate
Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
-
Mean Spectral Normalization of Deep Neural Networks for Embedded Automation
Proposes MSN reparameterization to address mean-drift in SN, claiming ~16% faster inference than BN with fewer parameters on CNNs and GANs.
-
Fast Universal Style Transfer for Artistic and Photorealistic Rendering
ArtNet and PhotoNet enable one-pass fast universal style transfer with fewer artifacts, better detail preservation, and 3-100x speedup over prior AE-based methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.