Self-Attention Generative Adversarial Networks

Han Zhang , Ian Goodfellow , Dimitris Metaxas , Augustus Odena

Authors on Pith no claims yet

classification 📊 stat.ML cs.LG

keywords generatorsaganadversarialdetailsfeaturegenerativeimageinception

read the original abstract

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Scale GAN Training for High Fidelity Natural Image Synthesis
cs.LG 2018-09 accept novelty 7.0

BigGANs achieve state-of-the-art class-conditional synthesis on ImageNet 128x128 with Inception Score 166.5 and FID 7.4 by scaling GANs and applying orthogonal regularization plus truncation.
From DES to KiDS: Domain adaptation for cross-survey detection of low-surface-brightness galaxies
astro-ph.GA 2026-05 unverdicted novelty 6.0

Domain adaptation with an ensemble of CNN and transformer models trained on DES detects 20,180 LSBGs and 434 UDGs in KiDS DR5, with structural parameters and environmental trends consistent with known samples.