Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
hub Mixed citations
Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed
Mixed citation behavior. Most common role is background (67%).
abstract
Iterative generative models, such as noise conditional score networks and denoising diffusion probabilistic models, produce high quality samples by gradually denoising an initial noise vector. However, their denoising process has many steps, making them 2-3 orders of magnitude slower than other generative models such as GANs and VAEs. In this paper, we establish a novel connection between knowledge distillation and image generation with a technique that distills a multi-step denoising process into a single step, resulting in a sampling speed similar to other single-step generative models. Our Denoising Student generates high quality samples comparable to GANs on the CIFAR-10 and CelebA datasets, without adversarial training. We demonstrate that our method scales to higher resolutions through experiments on 256 x 256 LSUN. Code and checkpoints are available at https://github.com/tcl9876/Denoising_Student
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.
StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.
STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.
Diffusion trajectory distillation is reframed as operator merging, yielding an optimal variance-driven merging strategy via Pareto dynamic programming in the linear Gaussian case and unavoidable approximation errors from exponential mixture growth in the nonlinear case.
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
Organizing diffusion model design choices yields SOTA FID of 1.79 on CIFAR-10 with only 35 network evaluations per image and similar gains on ImageNet-64.
Progressive distillation halves sampling steps repeatedly in diffusion models, reaching 4 steps with FID 3.0 on CIFAR-10 from 8192-step samplers.
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.
FlowSR enables single-step image super-resolution by learning a rectified flow from LR to HR with consistency distillation, HR regularization, and dual fast-slow timestep scheduling.
Mixing unconditional Gaussian noise with a κ-conditioned source during training of rectified flows reduces path curvature, yielding 12% better FID scores and faster sampling than standard rectified flows.
Jeffreys Flow distills Parallel Tempering trajectories via Jeffreys divergence to produce robust Boltzmann generators that suppress mode collapse and correct sampling inaccuracies for rare event sampling.
Drifting with Gaussian kernels exactly matches score-matching on smoothed distributions via Tweedie's formula, while Laplace kernels approximate this closely in high dimensions.
The work introduces rCM, a score-regularized continuous-time consistency model that matches DMD2 quality on large models up to 14B parameters while improving diversity and enabling 1-4 step sampling.
2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.
MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.
Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.
DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.
A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.
RATS lets few-step visual generators surpass multi-step teachers by shaping trajectories with reward-based adaptive guidance instead of strict imitation.
BADiff introduces joint training of diffusion models with quality conditioning derived from bandwidth to enable adaptive early-stop sampling that preserves appropriate perceptual quality.
Diffusion models have an SNR-timestep mismatch during inference that the authors mitigate with per-frequency differential correction, raising generation quality across IDDPM, ADM, DDIM and others.
citing papers explorer
-
Consistency Models
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
-
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
-
Generative Pseudo-Force Fields for Molecular Generation
Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.
-
StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow
StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.
-
Stochastic Transition-Map Distillation for Fast Probabilistic Inference
STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.
-
Toward Theoretical Insights into Diffusion Trajectory Distillation via Operator Merging
Diffusion trajectory distillation is reframed as operator merging, yielding an optimal variance-driven merging strategy via Pareto dynamic programming in the linear Gaussian case and unavoidable approximation errors from exponential mixture growth in the nonlinear case.
-
One Step Diffusion via Shortcut Models
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
-
Elucidating the Design Space of Diffusion-Based Generative Models
Organizing diffusion model design choices yields SOTA FID of 1.79 on CIFAR-10 with only 35 network evaluations per image and similar gains on ImageNet-64.
-
Progressive Distillation for Fast Sampling of Diffusion Models
Progressive distillation halves sampling steps repeatedly in diffusion models, reaching 4 steps with FID 3.0 on CIFAR-10 from 8192-step samplers.
-
Diffusion Models Beat GANs on Image Synthesis
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
-
ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices
ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.
-
Fast Image Super-Resolution via Consistency Rectified Flow
FlowSR enables single-step image super-resolution by learning a rectified flow from LR to HR with consistency distillation, HR regularization, and dual fast-slow timestep scheduling.
-
MixFlow: Mixed Source Distributions Improve Rectified Flows
Mixing unconditional Gaussian noise with a κ-conditioned source during training of rectified flows reduces path curvature, yielding 12% better FID scores and faster sampling than standard rectified flows.
-
Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation
Jeffreys Flow distills Parallel Tempering trajectories via Jeffreys divergence to produce robust Boltzmann generators that suppress mode collapse and correct sampling inaccuracies for rare event sampling.
-
A Unified View of Score-Based and Drifting Models
Drifting with Gaussian kernels exactly matches score-matching on smoothed distributions via Tweedie's formula, while Laplace kernels approximate this closely in high dimensions.
-
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
The work introduces rCM, a score-regularized continuous-time consistency model that matches DMD2 quality on large models up to 14B parameters while improving diversity and enabling 1-4 step sampling.
-
2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching
2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.
-
MAGI-1: Autoregressive Video Generation at Scale
MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.
-
Improved Techniques for Training Consistency Models
Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.
-
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.
-
Rectified Flow: A Marginal Preserving Approach to Optimal Transport
A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.
-
Reward-Aware Trajectory Shaping for Few-step Visual Generation
RATS lets few-step visual generators surpass multi-step teachers by shaping trajectories with reward-based adaptive guidance instead of strict imitation.
-
BADiff: Bandwidth Adaptive Diffusion Model
BADiff introduces joint training of diffusion models with quality conditioning derived from bandwidth to enable adaptive early-stop sampling that preserves appropriate perceptual quality.
-
Elucidating the SNR-t Bias of Diffusion Probabilistic Models
Diffusion models have an SNR-timestep mismatch during inference that the authors mitigate with per-frequency differential correction, raising generation quality across IDDPM, ADM, DDIM and others.
-
Discrete Meanflow Training Curriculum
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.
- D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
- LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention