hub Mixed citations

Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

Eric Luhman, Troy Luhman · 2021 · cs.LG · arXiv 2101.02388

Mixed citation behavior. Most common role is background (67%).

27 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 27 citing papers arXiv PDF

abstract

Iterative generative models, such as noise conditional score networks and denoising diffusion probabilistic models, produce high quality samples by gradually denoising an initial noise vector. However, their denoising process has many steps, making them 2-3 orders of magnitude slower than other generative models such as GANs and VAEs. In this paper, we establish a novel connection between knowledge distillation and image generation with a technique that distills a multi-step denoising process into a single step, resulting in a sampling speed similar to other single-step generative models. Our Denoising Student generates high quality samples comparable to GANs on the CIFAR-10 and CelebA datasets, without adversarial training. We demonstrate that our method scales to higher resolutions through experiments on 256 x 256 LSUN. Code and checkpoints are available at https://github.com/tcl9876/Denoising_Student

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 2 baseline 1

citation-polarity summary

background 6 baseline 1 unclear 1 use method 1

representative citing papers

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Generative Pseudo-Force Fields for Molecular Generation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.

StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow

stat.ML · 2026-05-15 · unverdicted · novelty 7.0

StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.

Stochastic Transition-Map Distillation for Fast Probabilistic Inference

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.

Toward Theoretical Insights into Diffusion Trajectory Distillation via Operator Merging

cs.LG · 2025-05-21 · unverdicted · novelty 7.0

Diffusion trajectory distillation is reframed as operator merging, yielding an optimal variance-driven merging strategy via Pareto dynamic programming in the linear Gaussian case and unavoidable approximation errors from exponential mixture growth in the nonlinear case.

One Step Diffusion via Shortcut Models

cs.LG · 2024-10-16 · conditional · novelty 7.0

Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.

Elucidating the Design Space of Diffusion-Based Generative Models

cs.CV · 2022-06-01 · accept · novelty 7.0

Organizing diffusion model design choices yields SOTA FID of 1.79 on CIFAR-10 with only 35 network evaluations per image and similar gains on ImageNet-64.

Progressive Distillation for Fast Sampling of Diffusion Models

cs.LG · 2022-02-01 · unverdicted · novelty 7.0

Progressive distillation halves sampling steps repeatedly in diffusion models, reaching 4 steps with FID 3.0 on CIFAR-10 from 8192-step samplers.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.

Fast Image Super-Resolution via Consistency Rectified Flow

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

FlowSR enables single-step image super-resolution by learning a rectified flow from LR to HR with consistency distillation, HR regularization, and dual fast-slow timestep scheduling.

MixFlow: Mixed Source Distributions Improve Rectified Flows

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Mixing unconditional Gaussian noise with a κ-conditioned source during training of rectified flows reduces path curvature, yielding 12% better FID scores and faster sampling than standard rectified flows.

Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

Jeffreys Flow distills Parallel Tempering trajectories via Jeffreys divergence to produce robust Boltzmann generators that suppress mode collapse and correct sampling inaccuracies for rare event sampling.

A Unified View of Score-Based and Drifting Models

cs.LG · 2026-03-08 · unverdicted · novelty 6.0

Drifting with Gaussian kernels exactly matches score-matching on smoothed distributions via Tweedie's formula, while Laplace kernels approximate this closely in high dimensions.

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

cs.CV · 2025-10-09 · conditional · novelty 6.0

The work introduces rCM, a score-regularized continuous-time consistency model that matches DMD2 quality on large models up to 14B parameters while improving diversity and enabling 1-4 step sampling.

2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching

cs.GR · 2025-06-03 · unverdicted · novelty 6.0

2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.

MAGI-1: Autoregressive Video Generation at Scale

cs.CV · 2025-05-19 · unverdicted · novelty 6.0

MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.

Improved Techniques for Training Consistency Models

cs.LG · 2023-10-22 · accept · novelty 6.0

Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

cs.LG · 2022-11-02 · conditional · novelty 6.0

DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

stat.ML · 2022-09-29 · unverdicted · novelty 6.0

A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.

Reward-Aware Trajectory Shaping for Few-step Visual Generation

cs.CV · 2026-04-16 · unverdicted · novelty 5.0

RATS lets few-step visual generators surpass multi-step teachers by shaping trajectories with reward-based adaptive guidance instead of strict imitation.

BADiff: Bandwidth Adaptive Diffusion Model

cs.CV · 2025-10-24 · unverdicted · novelty 5.0

BADiff introduces joint training of diffusion models with quality conditioning derived from bandwidth to enable adaptive early-stop sampling that preserves appropriate perceptual quality.

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

cs.CV · 2026-04-17 · unverdicted · novelty 4.0

Diffusion models have an SNR-timestep mismatch during inference that the authors mitigate with per-frequency differential correction, raising generation quality across IDDPM, ADM, DDIM and others.

citing papers explorer

Showing 27 of 27 citing papers.

Consistency Models cs.LG · 2023-03-02 · conditional · none · ref 39 · internal anchor
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow cs.LG · 2022-09-07 · unverdicted · none · ref 47 · internal anchor
Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.
Generative Pseudo-Force Fields for Molecular Generation cs.LG · 2026-05-18 · unverdicted · none · ref 68 · internal anchor
Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.
StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow stat.ML · 2026-05-15 · unverdicted · none · ref 15 · internal anchor
StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.
Stochastic Transition-Map Distillation for Fast Probabilistic Inference cs.LG · 2026-05-08 · unverdicted · none · ref 156 · internal anchor
STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.
Toward Theoretical Insights into Diffusion Trajectory Distillation via Operator Merging cs.LG · 2025-05-21 · unverdicted · none · ref 18 · internal anchor
Diffusion trajectory distillation is reframed as operator merging, yielding an optimal variance-driven merging strategy via Pareto dynamic programming in the linear Gaussian case and unavoidable approximation errors from exponential mixture growth in the nonlinear case.
One Step Diffusion via Shortcut Models cs.LG · 2024-10-16 · conditional · none · ref 15 · internal anchor
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
Elucidating the Design Space of Diffusion-Based Generative Models cs.CV · 2022-06-01 · accept · none · ref 33 · internal anchor
Organizing diffusion model design choices yields SOTA FID of 1.79 on CIFAR-10 with only 35 network evaluations per image and similar gains on ImageNet-64.
Progressive Distillation for Fast Sampling of Diffusion Models cs.LG · 2022-02-01 · unverdicted · none · ref 13 · internal anchor
Progressive distillation halves sampling steps repeatedly in diffusion models, reaching 4 steps with FID 3.0 on CIFAR-10 from 8192-step samplers.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 37 · internal anchor
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices cs.CV · 2026-05-15 · unverdicted · none · ref 3 · internal anchor
ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.
Fast Image Super-Resolution via Consistency Rectified Flow cs.CV · 2026-05-12 · unverdicted · none · ref 26 · internal anchor
FlowSR enables single-step image super-resolution by learning a rectified flow from LR to HR with consistency distillation, HR regularization, and dual fast-slow timestep scheduling.
MixFlow: Mixed Source Distributions Improve Rectified Flows cs.CV · 2026-04-10 · unverdicted · none · ref 21 · internal anchor
Mixing unconditional Gaussian noise with a κ-conditioned source during training of rectified flows reduces path curvature, yielding 12% better FID scores and faster sampling than standard rectified flows.
Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation cs.LG · 2026-04-07 · unverdicted · none · ref 56 · internal anchor
Jeffreys Flow distills Parallel Tempering trajectories via Jeffreys divergence to produce robust Boltzmann generators that suppress mode collapse and correct sampling inaccuracies for rare event sampling.
A Unified View of Score-Based and Drifting Models cs.LG · 2026-03-08 · unverdicted · none · ref 28 · internal anchor
Drifting with Gaussian kernels exactly matches score-matching on smoothed distributions via Tweedie's formula, while Laplace kernels approximate this closely in high dimensions.
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency cs.CV · 2025-10-09 · conditional · none · ref 18 · internal anchor
The work introduces rCM, a score-regularized continuous-time consistency model that matches DMD2 quality on large models up to 14B parameters while improving diversity and enabling 1-4 step sampling.
2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching cs.GR · 2025-06-03 · unverdicted · none · ref 29 · internal anchor
2ndMatch finetunes pruned diffusion models via second-order Jacobian matching inspired by Finite-Time Lyapunov Exponents to reduce the quality gap with dense models on image generation tasks.
MAGI-1: Autoregressive Video Generation at Scale cs.CV · 2025-05-19 · unverdicted · none · ref 29 · internal anchor
MAGI-1 is a 24B-parameter autoregressive video world model that predicts denoised frame chunks sequentially with increasing noise to enable causal, scalable, streaming generation up to 4M token contexts.
Improved Techniques for Training Consistency Models cs.LG · 2023-10-22 · accept · none · ref 9 · internal anchor
Improved consistency training techniques achieve FID scores of 2.51 on CIFAR-10 and 3.25 on ImageNet 64x64 in one sampling step, outperforming prior consistency training and distillation methods.
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models cs.LG · 2022-11-02 · conditional · none · ref 7 · internal anchor
DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.
Rectified Flow: A Marginal Preserving Approach to Optimal Transport stat.ML · 2022-09-29 · unverdicted · none · ref 51 · internal anchor
A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.
Reward-Aware Trajectory Shaping for Few-step Visual Generation cs.CV · 2026-04-16 · unverdicted · none · ref 21 · internal anchor
RATS lets few-step visual generators surpass multi-step teachers by shaping trajectories with reward-based adaptive guidance instead of strict imitation.
BADiff: Bandwidth Adaptive Diffusion Model cs.CV · 2025-10-24 · unverdicted · none · ref 34 · internal anchor
BADiff introduces joint training of diffusion models with quality conditioning derived from bandwidth to enable adaptive early-stop sampling that preserves appropriate perceptual quality.
Elucidating the SNR-t Bias of Diffusion Probabilistic Models cs.CV · 2026-04-17 · unverdicted · none · ref 32 · internal anchor
Diffusion models have an SNR-timestep mismatch during inference that the authors mitigate with per-frequency differential correction, raising generation quality across IDDPM, ADM, DDIM and others.
Discrete Meanflow Training Curriculum cs.LG · 2026-04-10 · unverdicted · none · ref 11 · internal anchor
A DMF curriculum initialized from pretrained flow models achieves one-step FID 3.36 on CIFAR-10 after only 2000 epochs by exploiting a discretized consistency property in the Meanflow objective.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models cs.CV · 2026-05-06 · unreviewed · ref 55 · 2 links · internal anchor
LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention cs.CV · 2026-05-06 · unreviewed · ref 233 · internal anchor

Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer