Recognition: 2 theorem links
· Lean TheoremProgressive Distillation for Fast Sampling of Diffusion Models
Pith reviewed 2026-05-11 09:31 UTC · model grok-4.3
The pith
Progressive distillation reduces diffusion model sampling from thousands of steps to 4 while keeping high image quality on standard benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Starting from a deterministic diffusion sampler that uses up to 8192 steps, the authors apply a repeated distillation procedure in which each new model is trained to reproduce the previous model's output distribution using half the number of steps; together with parameterizations that increase stability at low step counts, this yields usable models that generate samples in only 4 steps on CIFAR-10, ImageNet, and LSUN while preserving most of the original perceptual quality.
What carries the argument
The progressive distillation procedure, which trains a student diffusion model to match a teacher sampler's multi-step trajectory using half the steps, combined with re-parameterizations that stabilize few-step sampling.
Load-bearing premise
That successive rounds of distillation do not accumulate enough error to degrade image quality and that the new parameterizations keep sampling stable when the step count is reduced across different image datasets.
What would settle it
A direct comparison on CIFAR-10 or ImageNet in which the 4-step distilled model produces visibly worse samples or a substantially higher FID than the original 8192-step sampler, or in which further distillation rounds cause a sudden quality collapse.
read the original abstract
Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that new parameterizations of diffusion models increase stability for few-step sampling, and that a progressive distillation procedure can iteratively halve the number of sampling steps (from up to 8192 down to 4) while preserving perceptual quality on image generation tasks. It reports concrete results such as an FID of 3.0 on CIFAR-10 with 4 steps, along with results on ImageNet and LSUN, and states that the full distillation procedure takes no more time than training the original model.
Significance. If the empirical results hold, the work is significant for addressing the slow sampling drawback of diffusion models, enabling fast generation competitive with alternatives like GANs while retaining quality and density estimation advantages. The progressive distillation approach combined with the new parameterizations provides a practical, efficient solution, and the manuscript supplies falsifiable benchmark outcomes across multiple standard datasets.
major comments (2)
- [§5] §5 (Experimental results): The central claim that progressive distillation preserves perceptual quality down to 4 steps (e.g., CIFAR-10 FID of 3.0) is load-bearing, yet the reported benchmark numbers lack error bars, multiple random seed statistics, or ablations isolating the new parameterizations from the distillation procedure; this directly affects assessment of robustness against error accumulation.
- [§3.2] §3.2 (New parameterizations): The claim that the introduced parameterizations reliably stabilize few-step sampling is central to enabling the progressive procedure, but the section provides no analysis or equations demonstrating their effect on sampling dynamics or variance reduction, relying only on end-to-end empirical outcomes.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly state the exact sequence of distillation steps applied and the base model architectures used for each benchmark.
- [§4] Notation for the teacher-student alignment in the distillation loss could be clarified with an additional equation showing how the student is trained to match the teacher's multi-step trajectory.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work's significance and for the constructive feedback. We address each major comment point by point below, providing clarifications from the manuscript and indicating revisions where we will strengthen the presentation of results and analysis.
read point-by-point responses
-
Referee: [§5] §5 (Experimental results): The central claim that progressive distillation preserves perceptual quality down to 4 steps (e.g., CIFAR-10 FID of 3.0) is load-bearing, yet the reported benchmark numbers lack error bars, multiple random seed statistics, or ablations isolating the new parameterizations from the distillation procedure; this directly affects assessment of robustness against error accumulation.
Authors: We acknowledge that error bars, multi-seed statistics, and explicit ablations would strengthen the assessment of robustness. The manuscript reports results from single runs with fixed seeds for reproducibility, but demonstrates consistency by applying the same progressive procedure across CIFAR-10, ImageNet, and LSUN while preserving quality from 8192 steps down to 4. The load-bearing claim is further supported by the fact that each halving step maintains perceptual quality without retraining from scratch. To address the concern directly, we will revise §5 to include error bars from additional runs (where feasible given compute), a note on seed consistency, and a targeted ablation isolating the new parameterizations' contribution from the distillation steps. revision: yes
-
Referee: [§3.2] §3.2 (New parameterizations): The claim that the introduced parameterizations reliably stabilize few-step sampling is central to enabling the progressive procedure, but the section provides no analysis or equations demonstrating their effect on sampling dynamics or variance reduction, relying only on end-to-end empirical outcomes.
Authors: Section 3.2 introduces the new parameterizations (including the velocity parameterization) as direct modifications to the standard diffusion model output that reduce sensitivity to accumulated errors in few-step regimes. The section provides the explicit functional forms and motivates them via their effect on the reverse-process update. While the primary validation is through the end-to-end progressive distillation results, we agree that additional equations would clarify the variance-reduction mechanism. We will revise §3.2 to include the sampling update equations under these parameterizations and a short derivation showing how they lower the effective variance of the predicted clean image relative to noise prediction, thereby enabling stable halving. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents an empirical training procedure (progressive distillation) and new parameterizations for diffusion models, with all load-bearing claims consisting of experimental outcomes measured on held-out benchmarks such as CIFAR-10 FID scores. No equations, predictions, or first-principles derivations reduce outputs to inputs by construction, and no self-citations serve as the sole justification for the central method or results. The procedure is self-contained against external validation.
Axiom & Free-Parameter Ledger
free parameters (1)
- distillation hyperparameters
axioms (1)
- domain assumption Diffusion models admit parameterizations that remain stable under few-step sampling
Lean theorems connected to this paper
-
IndisputableMonolith.Cost.FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps
-
IndisputableMonolith.Foundation.HierarchyEmergencehierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 55 Pith papers
-
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.
-
Query Lower Bounds for Diffusion Sampling
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
-
Training-Free Generative Sampling via Moment-Matched Score Smoothing
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
-
Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
-
Muninn: Your Trajectory Diffusion Model But Faster
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
-
HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation
HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.
-
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
-
PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution
PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated unce...
-
Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion
ActDiff-VC achieves up to 64.6% bitrate reduction at matched NIQE and improves perceptual metrics like KID and FID by using content-adaptive keyframe selection and budget-aware sparse trajectory selection to condition...
-
SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking
SpecEdit accelerates diffusion-based image editing up to 10x by using a low-resolution draft to identify edit-relevant tokens via semantic discrepancies for selective high-resolution denoising.
-
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies
CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.
-
Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes
Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.
-
Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning
GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
-
Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement
A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.
-
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
-
Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse
Chorus accelerates video DiT serving up to 45% via inter-request caching reuse in a three-stage denoising strategy with token-guided attention amplification.
-
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation
1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.
-
Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting
Drift-AR achieves 3.8-5.5x speedup in AR-diffusion image models by using entropy to enable entropy-informed speculative decoding and single-step (1-NFE) anti-symmetric drifting decoding.
-
Training Agents Inside of Scalable World Models
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
-
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
-
One Step Diffusion via Shortcut Models
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
-
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
-
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.
-
ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems
ROMER cuts perplexity by up to 59% in noisy analog CIM environments for MoE LLMs via expert replacement and router recalibration calibrated on real-chip measurements.
-
Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies
A multivariate diffusion generative downscaling method preserves inter-variable correlations in climate data under large resolution increases, enabling more accurate compound risk assessment.
-
FlashMol: High-Quality Molecule Generation in as Few as Four Steps
FlashMol produces chemically valid 3D molecules in 4 steps via distribution matching distillation with respaced timesteps and Jensen-Shannon regularization, matching or exceeding 1000-step teacher performance on QM9 a...
-
MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution
MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.
-
V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
V-GRPO makes ELBO surrogates stable and efficient for online RL alignment of denoising models, delivering SOTA text-to-image performance with 2-3x speedups over MixGRPO and DiffusionNFT.
-
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.
-
WFM: 3D Wavelet Flow Matching for Ultrafast Multi-Modal MRI Synthesis
WFM achieves near-diffusion quality for all four BraTS MRI modalities with one 82M model in 1-2 steps by flowing from the mean of conditioning modalities in wavelet space, running 250-1000x faster.
-
Allo{SR}$^2$: Rectifying One-Step Super-Resolution to Stay Real via Allomorphic Generative Flows
Allo{SR}^2 rectifies one-step super-resolution trajectories with allomorphic generative flows via SNR initialization, velocity supervision, and self-adversarial matching to deliver state-of-the-art fidelity and realism.
-
Fisher Decorator: Refining Flow Policy via a Local Transport Map
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
-
CoD-Lite: Real-Time Diffusion-Based Generative Image Compression
CoD-Lite delivers real-time generative image compression via a lightweight convolution-based diffusion codec with compression-oriented pre-training and distillation, achieving substantial bitrate savings.
-
Self-Adversarial One Step Generation via Condition Shifting
APEX derives self-adversarial gradients from condition-shifted velocity fields in flow models to achieve high-fidelity one-step generation, outperforming much larger models and multi-step teachers.
-
ELT: Elastic Looped Transformers for Visual Generation
Elastic Looped Transformers share weights across recurrent blocks and apply intra-loop self-distillation to deliver 4x parameter reduction while matching competitive FID and FVD scores on ImageNet and UCF-101.
-
Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning
JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.
-
Diffusion-Based Point-Cloud Generation of Heavy-Ion Events
A two-stage score-driven diffusion model with Point-Edge Transformer generates realistic high-multiplicity heavy-ion events as point clouds.
-
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.
-
MAGI-1: Autoregressive Video Generation at Scale
MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX generates coherent 10-second text-to-video outputs at high resolution using a 3D VAE, expert adaptive LayerNorm transformer, progressive training, and a custom data pipeline, claiming state-of-the-art results.
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results...
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-t...
-
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditi...
-
Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation
A one-step text-to-audio model using energy-distance training and contextual distillation outperforms prior fast baselines on AudioCaps and achieves up to 8.5x faster inference than the multi-step IMPACT system with c...
-
Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction
Training-inference input alignment outweighs framework choice for longitudinal retinal image prediction, with deterministic regression matching complex models when acquisition variability dominates disease progression.
-
ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression
ADP-DiT is a text-conditioned diffusion transformer for synthesizing longitudinal Alzheimer's MRI scans, reporting SSIM 0.8739 and PSNR 29.32 dB with improvements over a DiT baseline.
-
SubFlow: Sub-mode Conditioned Flow Matching for Diverse One-Step Generation
SubFlow restores full mode coverage in one-step flow matching by conditioning on sub-modes from semantic clustering, yielding higher diversity on ImageNet-256 while preserving FID.
-
Elucidating Representation Degradation Problem in Diffusion Model Training
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.
-
Seed3D 2.0: Advancing High-Fidelity Simulation-Ready 3D Content Generation
Seed3D 2.0 advances 3D content generation via a coarse-to-fine geometry pipeline, unified PBR material model, and simulation-ready scene tools, reporting 69-89.9% win rates over commercial systems in human studies.
-
From Redaction to Restoration: Deep Learning for Medical Image Anonymization and Reconstruction
An end-to-end framework redacts PHI from medical images via CRNN detection and restores them with Stable Diffusion inpainting to enable privacy-preserving data sharing without losing downstream utility.
-
Enhancing the accuracy of under-resolved numerical simulations of atmospheric flows with super resolution
A multi-scale CNN super-resolution model outperforms baseline CNN, attention CNN, and diffusion-based approaches in reconstructing fine-scale features from under-resolved atmospheric flow simulations on standard benchmarks.
-
Discrete Meanflow Training Curriculum
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.
-
Flow Matching Guide and Code
Flow Matching is a generative modeling framework with mathematical foundations, design choices, extensions, and open-source PyTorch code for applications like image and text generation.
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
The paper reviews the background, technology, applications, limitations, and future directions of OpenAI's Sora text-to-video generative model based on public information.
Reference graph
Works this paper leans on
- [1]
-
[2]
Learning gradient fields for shape generation
Ruojin Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, and Bharath Hariharan. Learning gradient fields for shape generation. arXiv preprint arXiv:2008.06520,
-
[3]
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal and Alex Nichol. Diffusion models beat GANs on image synthesis.arXiv preprint arXiv:2105.05233,
work page internal anchor Pith review arXiv
-
[4]
FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models
Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367,
-
[5]
Cascaded diffusion models for high fidelity image generation
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. arXiv preprint arXiv:2106.15282,
-
[6]
Argmax flows and multinomial diffusion: Learning categorical distributions, 2021
Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Towards non-autoregressive language models. arXiv preprint arXiv:2102.05379,
-
[7]
Alexia Jolicoeur-Martineau, Ke Li, Rémi Piché-Taillefer, Tal Kachman, and Ioannis Mitliagkas. Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080,
-
[8]
Diederik P Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. arXiv preprint arXiv:2107.00630,
-
[9]
On fast sampling of diffusion probabilistic models
10 Published as a conference paper at ICLR 2022 Zhifeng Kong and Wei Ping. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132,
-
[10]
Bilateral denoising diffusion models
Max WY Lam, Jun Wang, Rongjie Huang, Dan Su, and Dong Yu. Bilateral denoising diffusion models. arXiv preprint arXiv:2108.11514,
-
[11]
Srdiff: Single image super-resolution with diffusion probabilistic models
Haoying Li, Yifan Yang, Meng Chang, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. arXiv preprint arXiv:2104.14951,
-
[12]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Knowledge distillation in iterative generative models for improved sampling speed
Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388,
-
[14]
Non gaussian denoising diffusion models.arXiv preprint arXiv:2106.07582,
Eliya Nachmani, Robin San Roman, and Lior Wolf. Non gaussian denoising diffusion models.arXiv preprint arXiv:2106.07582,
-
[15]
Fast generation for convolutional autoregressive models
Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A Hasegawa-Johnson, Roy H Campbell, and Thomas S Huang. Fast genera- tion for convolutional autoregressive models. arXiv preprint arXiv:1704.06001,
-
[16]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement.arXiv preprint arXiv:2104.07636,
-
[17]
Noise estimation for generative diffusion models
Robin San-Roman, Eliya Nachmani, and Lior Wolf. Noise estimation for generative diffusion mod- els. arXiv preprint arXiv:2104.02600,
-
[18]
Maximum likelihood training of score- based diffusion models
Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score- based diffusion models. arXiv e-prints, pp. arXiv–2101, 2021b. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference ...
-
[19]
Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019a. 11 Published as a conference paper at ICLR 2022 Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference ...
- [20]
-
[21]
12 Published as a conference paper at ICLR 2022 A P ROBABILITY FLOW ODE IN TERMS OF LOG -SNR Song et al. (2021c) formulate the forward diffusion process in terms of an SDE of the form dz =f (z,t )dt +g(t)dW, (10) and show that samples from this diffusion process can be generated by solving the associated prob- ability flow ODE: dz = [f (z,t ) − 1 2g2(t)∇z ...
work page 2022
-
[22]
is given by zs = σs σt [zt −αt ˆxθ(zt)] +αs ˆxθ(zt), (20) fors < t. Taking the derivative of this expression with respect to λs, assuming again a variance preserving diffusion process, and using dαλ dλ = 1 2αλσ2 λ and dσλ dλ = − 1 2σλα2 λ, gives zλs dλs = dσλs dλs 1 σt [zt −αt ˆxθ(zt)] + dαλs dλs ˆxθ(zt) (21) = − 1 2α2 s σs σt [zt −αt ˆxθ(zt)] + 1 2αsσ2 s...
work page 2022
-
[23]
Figure 5: Visualization of reparameterizing the diffusion process in terms ofφ and vφ. E S ETTINGS USED IN EXPERIMENTS Our model architectures closely follow those described by Dhariwal & Nichol (2021). For 64 × 64 ImageNet we use their model exactly, with 192 channels at the highest resolution. All other models are slight variations with different hyperp...
work page 2021
-
[24]
We use single-headed attention, and only apply this at the 16 × 16 and 8 × 8 resolutions
At each resolution we apply 3 residual blocks, like described by Dhariwal & Nichol (2021). We use single-headed attention, and only apply this at the 16 × 16 and 8 × 8 resolutions. We use dropout of 0.2 when training the original model. No dropout is used during distillation. For LSUN we use a model similar to that for ImageNet, but with a reduced number ...
work page 2021
-
[25]
We clip the norm of gradients to a global norm of 1 before calculating parameter updates
with a constant of 0.001. We clip the norm of gradients to a global norm of 1 before calculating parameter updates. For CIFAR-10 we train for 800k parameter updates, for ImageNet we use 550k updates, and for LSUN we use 400k updates. During distillation we train for 50k updates per iteration, except for the distillation to 2 and 1 sampling steps, for whic...
work page 2022
-
[26]
25612864321684212 3 4 5 6 78910 20 sampling steps FID 64x64 ImageNet Distilled DDIM Distilled Stochastic Undistilled Stochastic Figure 6: FID of generated samples from distilled and undistilled models, using DDIM or stochastic sampling. For the stochastic sampling results we present the best FID obtained by a grid-search over 11 possible noise levels, spa...
work page 2020
-
[27]
forms a non-Gaussian distribution that falls outside the family of Gaus- sian distributions that can be modelled by a single DDPM student step: A multi-step stochastic DDPM sampler can thus not be distilled into a few-step sampler without some loss in fidelity. This is in contrast with the deterministic DDIM sampler: here both the two-step DDIM teacher upd...
work page 2021
-
[28]
For each schedule we selected the optimal learning rate from [5e−5, 1e−4, 2e−4, 3e−4]
All reported numbers are averages over 4 random seeds. For each schedule we selected the optimal learning rate from [5e−5, 1e−4, 2e−4, 3e−4]. 20 Published as a conference paper at ICLR 2022 25612864321684212 3 4 5678910 20 sampling steps FID 64x64 ImageNet 50k updates10k updates 2561286432168421 3 4 5678910 20 sampling steps 128x128 LSUN Bedrooms 50k upda...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.