Does Diffusion Beat GAN in Image Super Resolution?

Daniil Shlenskii; Denis Kuznedelev; Sergey Kastryulin; Valerii Startsev

arxiv: 2405.17261 · v2 · pith:ZVIERXISnew · submitted 2024-05-27 · 📡 eess.IV · cs.CV

Does Diffusion Beat GAN in Image Super Resolution?

Denis Kuznedelev , Valerii Startsev , Daniil Shlenskii , Sergey Kastryulin This is my paper

classification 📡 eess.IV cs.CV

keywords diffusion-basedmodelsgan-basedmodelresolutionsupercomputationaldiffusion

0 comments

read the original abstract

There is a prevalent opinion that diffusion-based models outperform GAN-based counterparts in the Image Super Resolution (ISR) problem. However, in most studies, diffusion-based ISR models employ larger networks and are trained longer than the GAN baselines. This raises the question of whether the high performance stems from the superiority of the diffusion paradigm or if it is a consequence of the increased scale and the greater computational resources of the contemporary studies. In our work, we thoroughly compare diffusion-based and GAN-based Super Resolution models under controlled settings, with both approaches having matched architecture, model and dataset sizes, and computational budget. We show that a GAN-based model can achieve results comparable or superior to a diffusion-based model. Additionally, we explore the impact of popular design choices, such as text conditioning and augmentation on the performance of ISR models, showcasing their effect in several downstream tasks. We will release the inference code and weights of our scaled GAN.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
cs.CV 2025-12 unverdicted novelty 5.0

FRAMER improves real-world super-resolution by decomposing features into low- and high-frequency bands via FFT, applying intra- and inter-contrastive losses with adaptive modulators, and using the final layer as teach...