Recognition: 2 theorem links
· Lean TheoremSDXL-Lightning: Progressive Adversarial Diffusion Distillation
Pith reviewed 2026-05-17 05:04 UTC · model grok-4.3
The pith
A distillation method combines progressive and adversarial training to enable one-step high-quality 1024-pixel image generation from SDXL.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that through progressive adversarial diffusion distillation on the SDXL model, they achieve new state-of-the-art results in one-step and few-step 1024px text-to-image generation by balancing perceptual quality with mode coverage, supported by theoretical analysis and specific training techniques.
What carries the argument
The progressive adversarial distillation process, which integrates staged reduction of diffusion steps with adversarial losses from a discriminator to preserve both fidelity and variety in the generated images.
If this is right
- The resulting SDXL-Lightning models generate images in one or few steps instead of many.
- They maintain better mode coverage than previous distillation methods.
- Both LoRA adapters and full model weights are made available for users.
- The method scales to 1024px resolution without major quality degradation.
Where Pith is reading between the lines
- Applying similar distillation to other diffusion-based models could accelerate generation in related domains like video or 3D synthesis.
- The open-sourced weights may enable community experiments on further optimization or fine-tuning for specific tasks.
- Testing the approach on even larger models might reveal if the balance between quality and coverage holds at greater scales.
Load-bearing premise
The progressive adversarial training maintains both high perceptual quality and broad mode coverage without causing artifacts or mode collapse at the scale of the SDXL model.
What would settle it
Running the model on a diverse set of prompts and measuring diversity metrics or human evaluations showing significant drop in variety or introduction of artifacts compared to full SDXL would indicate the claim does not hold.
read the original abstract
We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SDXL-Lightning, a diffusion distillation method that combines progressive and adversarial distillation applied to the SDXL model. It claims new state-of-the-art results for one-step and few-step 1024px text-to-image generation by balancing perceptual quality and mode coverage. The manuscript covers theoretical analysis, discriminator design, model formulation, training techniques, and experimental validation, with open-sourced LoRA and full UNet weights.
Significance. If the central claims hold, the work would advance efficient inference for high-capacity diffusion models at high resolution. The progressive adversarial combination and explicit discriminator scaling to SDXL represent a practical engineering contribution. Open-sourcing both LoRA and full weights is a clear strength that aids reproducibility and downstream use.
major comments (2)
- [§5.2] §5.2 (Quantitative Results): The reported metrics focus on FID, CLIP score, and qualitative examples, but no recall, precision-recall curves, or intra-class diversity statistics are provided to verify mode coverage. This is load-bearing for the abstract claim that the method achieves a balance between quality and mode coverage without collapse when the discriminator is scaled to the full SDXL UNet.
- [§4.1] §4.1 (Discriminator Design): The architecture description scales the discriminator to SDXL but does not include an explicit diversity regularization term or ablation on its effect. Without this, the assumption that progressive adversarial training avoids the common artifact and collapse regimes at 1024 px remains untested in the reported experiments.
minor comments (2)
- [Figure 4] Figure 4 caption: Add the exact prompt templates and random seeds used for the qualitative comparisons to improve reproducibility.
- [§3.3] §3.3: The weighting schedule between progressive and adversarial losses is described qualitatively; a precise equation or pseudocode for the schedule would clarify implementation.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. The comments highlight important aspects of our quantitative evaluation and discriminator design that we address below. We have prepared revisions to incorporate additional analysis where feasible.
read point-by-point responses
-
Referee: [§5.2] §5.2 (Quantitative Results): The reported metrics focus on FID, CLIP score, and qualitative examples, but no recall, precision-recall curves, or intra-class diversity statistics are provided to verify mode coverage. This is load-bearing for the abstract claim that the method achieves a balance between quality and mode coverage without collapse when the discriminator is scaled to the full SDXL UNet.
Authors: We agree that recall and precision-recall analysis would provide more direct evidence for mode coverage claims. Standard metrics like FID and CLIP score are used in the field, and our qualitative examples at 1024px demonstrate diversity without visible collapse. To strengthen the manuscript, we will add precision-recall curves and recall statistics computed on a held-out set to Section 5.2 in the revision. revision: yes
-
Referee: [§4.1] §4.1 (Discriminator Design): The architecture description scales the discriminator to SDXL but does not include an explicit diversity regularization term or ablation on its effect. Without this, the assumption that progressive adversarial training avoids the common artifact and collapse regimes at 1024 px remains untested in the reported experiments.
Authors: Our theoretical analysis in the paper argues that the progressive schedule combined with adversarial distillation promotes coverage by gradually increasing the discriminator's capacity, which helps avoid collapse even at full SDXL scale. We did not add an explicit diversity regularization term to keep the objective focused. We acknowledge that an ablation isolating the progressive component's role in preventing artifacts would be valuable, and we will include such an ablation study in the revised Section 4.1. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper describes a combination of progressive and adversarial distillation applied to the SDXL model for one-step and few-step text-to-image generation. No equations, model formulations, or training procedures are presented in the provided abstract or context that reduce any claimed prediction or result to a fitted parameter defined by the target metric itself, nor do they rely on self-citations or imported uniqueness theorems in a load-bearing manner. The central claims rest on empirical application of existing distillation ideas to a new architecture and scale, with the derivation remaining self-contained against external benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 18 Pith papers
-
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.
-
Asymmetric Flow Models
Asymmetric Flow Modeling restricts noise prediction to a low-rank subspace for high-dimensional flow generation, reaching 1.57 FID on ImageNet 256x256 and new state-of-the-art pixel text-to-image performance via finet...
-
Inverse Design for Conditional Distribution Matching
Defines Conditional Distribution Matching (CDM) as finding inputs whose induced conditional distributions match a target distribution and proposes the MLGD-F inference-time algorithm using pretrained diffusion models ...
-
GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models
GeoEdit constructs local tangent frames from small perturbations to initial noise, enabling Jacobian-free on-manifold edits in diffusion models via alternating tangent steps and diffusion projections.
-
Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning
GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
-
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation
1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.
-
Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting
Drift-AR achieves 3.8-5.5x speedup in AR-diffusion image models by using entropy to enable entropy-informed speculative decoding and single-step (1-NFE) anti-symmetric drifting decoding.
-
FlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching
FlashClear delivers up to 122x faster object removal than prior diffusion models via adversarial step distillation and asymmetric attention caching while preserving visual quality.
-
FlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching
FlashClear achieves up to 8.26x speedup over its base diffusion model and 122x over OmniPaint for image object removal via region-aware adversarial distillation and foreground-prioritized caching while claiming to mai...
-
Efficient Diffusion Distillation via Embedding Loss
Embedding Loss aligns feature distributions via MMD in random network embeddings to boost one-step diffusion distillation, reaching SOTA FID of 1.475 on CIFAR-10 unconditional generation.
-
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
By requiring and using highly discriminative LLM text features, the work enables the first effective one-step text-conditioned image generation with MeanFlow.
-
BiasIG: Benchmarking Multi-dimensional Social Biases in Text-to-Image Models
BiasIG is a multi-dimensional benchmark for social biases in T2I models that shows debiasing interventions frequently cause confounding discrimination effects.
-
Continuous Adversarial Flow Models
Continuous adversarial flow models replace MSE in flow matching with adversarial training via a discriminator, improving guidance-free FID on ImageNet from 8.26 to 3.63 for SiT and similar gains for JiT and text-to-im...
-
ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop
ExpressEdit delivers fast, artifact-free stylized facial expression editing inside Photoshop via a diffusion model plugin and an accompanying expression database.
-
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
WorldPlay uses dual action representation, reconstituted context memory, and context forcing distillation to produce consistent 720p streaming video at 24 FPS for interactive world modeling.
-
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
-
Reward-Aware Trajectory Shaping for Few-step Visual Generation
RATS lets few-step visual generators surpass multi-step teachers by shaping trajectories with reward-based adaptive guidance instead of strict imitation.
-
TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation
TurboTalk uses progressive distillation from 4 steps to 1 step with distribution matching and adversarial training to achieve 120x faster single-step audio-driven talking avatar video generation.
Reference graph
Works this paper leans on
- [1]
-
[2]
Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets, 2023. 1
work page 2023
-
[3]
Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis
A. Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent dif- fusion models. 2023 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR) , pages 22563–22575,
work page 2023
-
[4]
Coyo-700m: Image-text pair dataset
Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, and Saehoon Kim. Coyo-700m: Image-text pair dataset. https : / / github . com / kakaobrain/coyo-dataset, 2022. 6
work page 2022
-
[5]
Pixart-$\alpha$: Fast training of diffusion transformer for photorealistic text-to-image syn- thesis
Junsong Chen, Jincheng YU, Chongjian GE, Lewei Yao, Enze Xie, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart-$\alpha$: Fast training of diffusion transformer for photorealistic text-to-image syn- thesis. In The Twelfth International Conference on Learning Representations, 2024. 1
work page 2024
-
[6]
Flashattention-2: Faster attention with better par- allelism and work partitioning
Tri Dao. Flashattention-2: Faster attention with better par- allelism and work partitioning. In The Twelfth International Conference on Learning Representations, 2024. 6
work page 2024
-
[7]
Flashattention: Fast and memory-efficient exact attention with IO-awareness
Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher Re. Flashattention: Fast and memory-efficient exact attention with IO-awareness. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems , 2022. 6
work page 2022
-
[8]
Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. Generative adversarial networks. Com- munications of the ACM, 63:139 – 144, 2014. 3, 4
work page 2014
-
[9]
Smooth diffusion: Crafting smooth latent spaces in dif- fusion models, 2023
Jiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree Vasu, Shiji Song, Gao Huang, and Humphrey Shi. Smooth diffusion: Crafting smooth latent spaces in dif- fusion models, 2023. 4
work page 2023
-
[10]
Animatediff: Animate your personalized text- to-image diffusion models without specific tuning
Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. Animatediff: Animate your personalized text- to-image diffusion models without specific tuning. In The Twelfth International Conference on Learning Representa- tions, 2024. 1, 2, 3
work page 2024
-
[11]
Gaussian error linear units (gelus), 2023
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus), 2023. 5
work page 2023
-
[12]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. In Isabelle Guyon, Ulrike von Luxburg, Samy Ben- gio, Hanna M. Wallach, Rob Fergus, S. V . N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Informa- tion Processing S...
work page 2017
-
[13]
Kingma, Ben Poole, Mohammad Norouzi, David J
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, and Tim Sali- mans. Imagen video: High definition video generation with diffusion models, 2022. 1
work page 2022
-
[14]
Denoising diffu- sion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan- Tien Lin, editors, Advances in Neural Information Process- ing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtua...
work page 2020
-
[15]
Classifier-free diffusion guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. 5
work page 2021
-
[16]
LoRA: Low-rank adaptation of large language models
Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations , 2022. 2, 3, 4
work page 2022
-
[17]
Scaling up gans for text-to-image synthesis
Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, and Taesung Park. Scaling up gans for text-to-image synthesis. 2023 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) , pages 10124–10134, 2023. 5, 6
work page 2023
-
[18]
MSG-GAN: multi- scale gradients for generative adversarial networks
Animesh Karnewar and Oliver Wang. MSG-GAN: multi- scale gradients for generative adversarial networks. In 2020 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, CVPR 2020, Seattle, WA, USA, June 13- 19, 2020, pages 7796–7805. IEEE, 2020. 6
work page 2020
-
[19]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Informa- tion Processing Systems, 2022. 2
work page 2022
-
[20]
Training generative ad- versarial networks with limited data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative ad- versarial networks with limited data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Informa- tion Processing Systems 33: Annual Conference on Neural Information Pr...
work page 2020
-
[21]
Analyzing and improving the image quality of stylegan
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. In 2020 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 , pages 8107–
work page 2020
-
[22]
Consistency trajectory mod- els: Learning probability flow ODE trajectory of diffusion
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Mu- rata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory mod- els: Learning probability flow ODE trajectory of diffusion. In The Twelfth International Conference on Learning Repre- sentations, 2024. 2, 3 10
work page 2024
-
[23]
The lipschitz constant of self-attention
Hyunjik Kim, George Papamakarios, and Andriy Mnih. The lipschitz constant of self-attention. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 5562–5571. PMLR, 2021. 5
work page 2021
-
[24]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Represen- tations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. 6
work page 2015
-
[25]
Diederik P. Kingma and Max Welling. Auto-encoding vari- ational bayes. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Confer- ence Track Proceedings, 2014. 2
work page 2014
-
[26]
Common diffusion noise schedules and sample steps are flawed
Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Common diffusion noise schedules and sample steps are flawed. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages 5404– 5411, January 2024. 5
work page 2024
-
[27]
Diffusion model with per- ceptual loss, 2024
Shanchuan Lin and Xiao Yang. Diffusion model with per- ceptual loss, 2024. 4
work page 2024
-
[28]
Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014. 8
work page 2014
-
[29]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matthew Le. Flow matching for genera- tive modeling. In The Eleventh International Conference on Learning Representations, 2023. 2, 4
work page 2023
-
[30]
Pseudo numerical methods for diffusion models on manifolds
Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. InIn- ternational Conference on Learning Representations , 2022. 2
work page 2022
-
[31]
Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022. 2, 3, 4
work page 2022
-
[32]
Instaflow: One step is enough for high-quality diffusion-based text-to-image generation
Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, and qiang liu. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. In The Twelfth In- ternational Conference on Learning Representations , 2024. 2, 3
work page 2024
-
[33]
Decoupled weight de- cay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. In International Conference on Learning Representations, 2019. 6
work page 2019
-
[34]
DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongx- uan Li, and Jun Zhu. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. 2
work page 2022
-
[35]
Dpm-solver++: Fast solver for guided sam- pling of diffusion probabilistic models, 2023
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver++: Fast solver for guided sam- pling of diffusion probabilistic models, 2023. 2
work page 2023
-
[36]
Latent consistency models: Synthesizing high- resolution images with few-step inference, 2023
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high- resolution images with few-step inference, 2023. 2, 3, 7, 8, 9
work page 2023
-
[37]
Lcm-lora: A universal stable-diffusion acceleration module, 2023
Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolin´ario Passos, Longbo Huang, Jian Li, and Hang Zhao. Lcm-lora: A universal stable-diffusion acceleration module, 2023. 2, 3, 4, 6, 8, 9
work page 2023
-
[38]
SDEdit: Guided image synthesis and editing with stochastic differential equa- tions
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equa- tions. In International Conference on Learning Representa- tions, 2022. 3, 6
work page 2022
-
[39]
Mescheder, Andreas Geiger, and Sebastian Nowozin
Lars M. Mescheder, Andreas Geiger, and Sebastian Nowozin. Which training methods for gans do actually con- verge? In Jennifer G. Dy and Andreas Krause, editors, Pro- ceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm ¨assan, Stockholm, Swe- den, July 10-15, 2018, volume 80 ofProceedings of Machine Learning Research, ...
work page 2018
-
[40]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training. In International Conference on Learning Representations, 2018. 6
work page 2018
-
[41]
Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...
work page 2024
-
[42]
On aliased resizing and surprising subtleties in gan evaluation
Gaurav Parmar, Richard Zhang, and Jun-Yan Zhu. On aliased resizing and surprising subtleties in gan evaluation. 2022 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 11400–11410, 2022. 8
work page 2022
-
[43]
W ¨urstchen: An ef- ficient architecture for large-scale text-to-image diffusion models
Pablo Pernias, Dominic Rampas, Mats Leon Richter, Christopher Pal, and Marc Aubreville. W ¨urstchen: An ef- ficient architecture for large-scale text-to-image diffusion models. In The Twelfth International Conference on Learn- ing Representations, 2024. 1
work page 2024
-
[44]
SDXL: Improving latent diffusion models for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. In The Twelfth Interna- tional Conference on Learning Representations , 2024. 1, 2, 7, 8
work page 2024
-
[45]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representa- tions, 2023. 3
work page 2023
-
[46]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen 11 Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th Interna- tional Conference on Ma...
work page 2021
-
[47]
Zero: Memory optimizations toward training trillion parameter models
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. Zero: Memory optimizations toward training trillion parameter models. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16, 2019. 6
work page 2019
-
[48]
Prajit Ramachandran, Barret Zoph, and Quoc V . Le. Search- ing for activation functions, 2017. 5
work page 2017
-
[49]
Hierarchical text-conditional image gener- ation with clip latents, 2022
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image gener- ation with clip latents, 2022. 1
work page 2022
- [50]
-
[51]
Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer
Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2021. 1, 2
work page 2022
-
[52]
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. ArXiv, abs/1505.04597, 2015. 4
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[53]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo-Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. In Alice H. Oh, Alekh Agar- wal, Danielle Belgrave, and ...
work page 2022
-
[54]
Progressive distillation for fast sampling of diffusion models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. In International Confer- ence on Learning Representations, 2022. 2, 3
work page 2022
-
[55]
https://civitai.com/ models/81270
Samaritan 3D Cartoon V4. https://civitai.com/ models/81270. 9
-
[56]
Projected gans converge faster
Axel Sauer, Kashyap Chitta, Jens M ¨uller, and Andreas Geiger. Projected gans converge faster. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neu- ral Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14,...
work page 2021
-
[57]
Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis
Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, and Timo Aila. Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis. In International Confer- ence on Machine Learning , 2023. 3
work page 2023
-
[58]
Adversarial diffusion distillation, 2023
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation, 2023. 2, 3, 5, 7, 8, 9
work page 2023
-
[59]
LAION-5b: An open large-scale dataset for train- ing next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade W Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa R Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION-5b: An open large-scale dataset for train- ing next generation image-text...
work page 2022
-
[60]
SDXL-ControlNet Canny. https://huggingface. co/diffusers/controlnet-canny-sdxl-1.0 . 9
-
[61]
SDXL-ControlNet Depth. https://huggingface. co/diffusers/controlnet-depth-sdxl-1.0 . 9
-
[62]
Make-a-video: Text-to-video generation without text-video data
Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. Make-a-video: Text-to-video generation without text-video data. In The Eleventh International Conference on Learning Representations, 2023. 1
work page 2023
-
[63]
Weiss, Niru Mah- eswaranathan, and Surya Ganguli
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Fran- cis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 , volume 37 of JMLR Workshop and Conference Proceedi...
work page 2015
-
[64]
Denois- ing diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. In International Conference on Learning Representations, 2021. 2
work page 2021
-
[65]
Improved techniques for training consistency models
Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. In The Twelfth International Conference on Learning Representations, 2024. 2, 3
work page 2024
-
[66]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, 2023. 2, 3
work page 2023
-
[67]
Score-based generative modeling through stochastic differential equa- tions
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions. In International Conference on Learning Represen- tations, 2021. 1, 2
work page 2021
-
[68]
Rethinking the in- ception architecture for computer vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the in- ception architecture for computer vision. In 2016 IEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 2016, Las V egas, NV , USA, June 27-30, 2016 , pages 2818–
work page 2016
-
[69]
IEEE Computer Society, 2016. 8
work page 2016
-
[70]
Diffusers: State-of-the-art diffusion models
Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, and Thomas Wolf. Diffusers: State-of-the-art diffusion models. https : / / github . com / huggingface / diffusers, 2022. 5
work page 2022
-
[71]
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. 3
work page 2023
-
[72]
Yuxin Wu and Kaiming He. Group normalization. Inter- national Journal of Computer Vision , 128:742 – 755, 2018. 5 12
work page 2018
-
[73]
Tackling the generative learning trilemma with denoising diffusion GANs
Zhisheng Xiao, Karsten Kreis, and Arash Vahdat. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Represen- tations, 2022. 3, 4
work page 2022
-
[74]
Ufogen: You forward once large scale text-to-image genera- tion via diffusion gans, 2023
Yanwu Xu, Yang Zhao, Zhisheng Xiao, and Tingbo Hou. Ufogen: You forward once large scale text-to-image genera- tion via diffusion gans, 2023. 2, 3
work page 2023
-
[75]
Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models, 2023
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models, 2023. 2, 3
work page 2023
-
[76]
Tianwei Yin, Micha ¨el Gharbi, Richard Zhang, Eli Shecht- man, Fredo Durand, William T. Freeman, and Taesung Park. One-step diffusion with distribution matching distillation,
-
[77]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3813–3824, 2023. 2, 3, 9
work page 2023
-
[78]
Efros, Eli Shecht- man, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 586–
work page 2018
-
[79]
IEEE Computer Society, 2018. 4
work page 2018
-
[80]
Unipc: A unified predictor-corrector framework for fast sampling of diffusion models
Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. NeurIPS, 2023. 2
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.