pith. sign in

arxiv: 2202.00512 · v2 · submitted 2022-02-01 · 💻 cs.LG · cs.AI· stat.ML

Progressive Distillation for Fast Sampling of Diffusion Models

Pith reviewed 2026-05-11 09:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords diffusion modelsprogressive distillationfast samplingimage generationgenerative modelingFID scoreCIFAR-10few-step sampling
0
0 comments X

The pith

Progressive distillation reduces diffusion model sampling from thousands of steps to 4 while keeping high image quality on standard benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to remove the main practical drawback of diffusion models—the need for hundreds or thousands of model evaluations to produce one sample—by combining two changes. First, it introduces new parameterizations that make the models more stable when run with very few steps. Second, it shows a distillation process that takes a trained many-step sampler and trains a new model to match its outputs using half as many steps, then repeats the halving until only 4 steps remain. A reader would care because the resulting models still reach low FID scores, for example 3.0 on CIFAR-10, and the entire sequence of distillations costs no more training time than the original model.

Core claim

Starting from a deterministic diffusion sampler that uses up to 8192 steps, the authors apply a repeated distillation procedure in which each new model is trained to reproduce the previous model's output distribution using half the number of steps; together with parameterizations that increase stability at low step counts, this yields usable models that generate samples in only 4 steps on CIFAR-10, ImageNet, and LSUN while preserving most of the original perceptual quality.

What carries the argument

The progressive distillation procedure, which trains a student diffusion model to match a teacher sampler's multi-step trajectory using half the steps, combined with re-parameterizations that stabilize few-step sampling.

Load-bearing premise

That successive rounds of distillation do not accumulate enough error to degrade image quality and that the new parameterizations keep sampling stable when the step count is reduced across different image datasets.

What would settle it

A direct comparison on CIFAR-10 or ImageNet in which the 4-step distilled model produces visibly worse samples or a substantially higher FID than the original 8192-step sampler, or in which further distillation rounds cause a sudden quality collapse.

read the original abstract

Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that new parameterizations of diffusion models increase stability for few-step sampling, and that a progressive distillation procedure can iteratively halve the number of sampling steps (from up to 8192 down to 4) while preserving perceptual quality on image generation tasks. It reports concrete results such as an FID of 3.0 on CIFAR-10 with 4 steps, along with results on ImageNet and LSUN, and states that the full distillation procedure takes no more time than training the original model.

Significance. If the empirical results hold, the work is significant for addressing the slow sampling drawback of diffusion models, enabling fast generation competitive with alternatives like GANs while retaining quality and density estimation advantages. The progressive distillation approach combined with the new parameterizations provides a practical, efficient solution, and the manuscript supplies falsifiable benchmark outcomes across multiple standard datasets.

major comments (2)
  1. [§5] §5 (Experimental results): The central claim that progressive distillation preserves perceptual quality down to 4 steps (e.g., CIFAR-10 FID of 3.0) is load-bearing, yet the reported benchmark numbers lack error bars, multiple random seed statistics, or ablations isolating the new parameterizations from the distillation procedure; this directly affects assessment of robustness against error accumulation.
  2. [§3.2] §3.2 (New parameterizations): The claim that the introduced parameterizations reliably stabilize few-step sampling is central to enabling the progressive procedure, but the section provides no analysis or equations demonstrating their effect on sampling dynamics or variance reduction, relying only on end-to-end empirical outcomes.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly state the exact sequence of distillation steps applied and the base model architectures used for each benchmark.
  2. [§4] Notation for the teacher-student alignment in the distillation loss could be clarified with an additional equation showing how the student is trained to match the teacher's multi-step trajectory.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our work's significance and for the constructive feedback. We address each major comment point by point below, providing clarifications from the manuscript and indicating revisions where we will strengthen the presentation of results and analysis.

read point-by-point responses
  1. Referee: [§5] §5 (Experimental results): The central claim that progressive distillation preserves perceptual quality down to 4 steps (e.g., CIFAR-10 FID of 3.0) is load-bearing, yet the reported benchmark numbers lack error bars, multiple random seed statistics, or ablations isolating the new parameterizations from the distillation procedure; this directly affects assessment of robustness against error accumulation.

    Authors: We acknowledge that error bars, multi-seed statistics, and explicit ablations would strengthen the assessment of robustness. The manuscript reports results from single runs with fixed seeds for reproducibility, but demonstrates consistency by applying the same progressive procedure across CIFAR-10, ImageNet, and LSUN while preserving quality from 8192 steps down to 4. The load-bearing claim is further supported by the fact that each halving step maintains perceptual quality without retraining from scratch. To address the concern directly, we will revise §5 to include error bars from additional runs (where feasible given compute), a note on seed consistency, and a targeted ablation isolating the new parameterizations' contribution from the distillation steps. revision: yes

  2. Referee: [§3.2] §3.2 (New parameterizations): The claim that the introduced parameterizations reliably stabilize few-step sampling is central to enabling the progressive procedure, but the section provides no analysis or equations demonstrating their effect on sampling dynamics or variance reduction, relying only on end-to-end empirical outcomes.

    Authors: Section 3.2 introduces the new parameterizations (including the velocity parameterization) as direct modifications to the standard diffusion model output that reduce sensitivity to accumulated errors in few-step regimes. The section provides the explicit functional forms and motivates them via their effect on the reverse-process update. While the primary validation is through the end-to-end progressive distillation results, we agree that additional equations would clarify the variance-reduction mechanism. We will revise §3.2 to include the sampling update equations under these parameterizations and a short derivation showing how they lower the effective variance of the predicted clean image relative to noise prediction, thereby enabling stable halving. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical training procedure (progressive distillation) and new parameterizations for diffusion models, with all load-bearing claims consisting of experimental outcomes measured on held-out benchmarks such as CIFAR-10 FID scores. No equations, predictions, or first-principles derivations reduce outputs to inputs by construction, and no self-citations serve as the sole justification for the central method or results. The procedure is self-contained against external validation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard diffusion model assumptions plus the empirical claim that distillation can be applied progressively without quality collapse. No new physical entities or unstated mathematical axioms beyond typical ML training.

free parameters (1)
  • distillation hyperparameters
    Choices such as learning rate and step-halving schedule are tuned to achieve reported results.
axioms (1)
  • domain assumption Diffusion models admit parameterizations that remain stable under few-step sampling
    Invoked as the first contribution enabling distillation.

pith-pipeline@v0.9.0 · 10101 in / 1056 out tokens · 78185 ms · 2026-05-11T09:31:49.097576+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps

  • IndisputableMonolith.Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

    cs.CV 2026-05 unverdicted novelty 8.0

    CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.

  2. Query Lower Bounds for Diffusion Sampling

    cs.LG 2026-04 unverdicted novelty 8.0

    Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.

  3. DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

    cs.CV 2026-05 unverdicted novelty 7.0

    DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adapt...

  4. VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

    cs.CV 2026-05 unverdicted novelty 7.0

    VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

  5. Generative Pseudo-Force Fields for Molecular Generation

    cs.LG 2026-05 unverdicted novelty 7.0

    Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.

  6. RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation

    cs.RO 2026-05 unverdicted novelty 7.0

    RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.

  7. StreamingEffect: Real-Time Human-Centric Video Effect Generation

    cs.CV 2026-05 unverdicted novelty 7.0

    StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.

  8. Training-Free Generative Sampling via Moment-Matched Score Smoothing

    stat.ML 2026-05 unverdicted novelty 7.0

    MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.

  9. Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation

    cs.CV 2026-05 unverdicted novelty 7.0

    A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.

  10. One-Step Generative Modeling via Wasserstein Gradient Flows

    cs.LG 2026-05 conditional novelty 7.0

    W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...

  11. Muninn: Your Trajectory Diffusion Model But Faster

    cs.RO 2026-05 unverdicted novelty 7.0

    Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.

  12. HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

    cs.HC 2026-05 unverdicted novelty 7.0

    HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.

  13. LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

    cs.CV 2026-05 unverdicted novelty 7.0

    LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

  14. PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution

    cs.LG 2026-05 unverdicted novelty 7.0

    PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated unce...

  15. Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion

    cs.CV 2026-05 unverdicted novelty 7.0

    ActDiff-VC achieves up to 64.6% bitrate reduction at matched NIQE and improves perceptual metrics like KID and FID by using content-adaptive keyframe selection and budget-aware sparse trajectory selection to condition...

  16. SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking

    cs.CV 2026-05 unverdicted novelty 7.0

    SpecEdit accelerates diffusion-based image editing up to 10x by using a low-resolution draft to identify edit-relevant tokens via semantic discrepancies for selective high-resolution denoising.

  17. CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

    cs.CV 2026-04 unverdicted novelty 7.0

    CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.

  18. Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes

    cs.CV 2026-04 unverdicted novelty 7.0

    Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.

  19. Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.

  20. Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement

    cs.CV 2026-04 unverdicted novelty 7.0

    A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.

  21. LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

    cs.CV 2026-04 unverdicted novelty 7.0

    LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.

  22. Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse

    cs.CV 2026-04 unverdicted novelty 7.0

    Chorus accelerates video DiT serving up to 45% via inter-request caching reuse in a three-stage denoising strategy with token-guided attention amplification.

  23. 1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation

    cs.CV 2026-04 conditional novelty 7.0

    1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.

  24. Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting

    cs.CV 2026-03 unverdicted novelty 7.0

    Drift-AR achieves 3.8-5.5x speedup in AR-diffusion image models by using entropy to enable entropy-informed speculative decoding and single-step (1-NFE) anti-symmetric drifting decoding.

  25. Flow Map Language Models: One-step Language Modeling via Continuous Denoising

    cs.CL 2026-02 unverdicted novelty 7.0

    Continuous flow language models match discrete diffusion baselines and their distilled one-step flow map versions exceed 8-step discrete diffusion quality on LM1B and OWT.

  26. DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching

    cs.CV 2026-02 unverdicted novelty 7.0

    DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.

  27. Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

    cs.LG 2026-02 unverdicted novelty 7.0

    Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.

  28. Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion

    cs.CV 2025-12 conditional novelty 7.0

    Stream-DiffVSR enables practical low-latency video super-resolution by combining a four-step distilled denoiser, auto-regressive temporal guidance, and a temporal processor in a strictly causal pipeline.

  29. Large Video Planner Enables Generalizable Robot Control

    cs.RO 2025-12 conditional novelty 7.0

    A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.

  30. Training Agents Inside of Scalable World Models

    cs.AI 2025-09 conditional novelty 7.0

    Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

  31. StereoFoley: Object-Aware Stereo Audio Generation from Video

    cs.SD 2025-09 conditional novelty 7.0

    StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.

  32. Lipschitz-Guided Design of Interpolation Schedules in Generative Models

    stat.ML 2025-09 unverdicted novelty 7.0

    Minimizing averaged squared Lipschitzness of the drift produces interpolation schedules that improve numerical accuracy and mitigate mode collapse in generative models, with closed-form optima for Gaussians and valida...

  33. MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

    cs.AI 2025-07 unverdicted novelty 7.0

    MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.

  34. Training-Free Inference for High-Resolution Sinogram Completion

    cs.CV 2025-06 unverdicted novelty 7.0

    HRSino is a training-free adaptive diffusion inference approach for high-resolution sinogram completion that reduces peak memory by up to 30.81% and inference time by up to 17.58% while maintaining accuracy.

  35. History-Guided Video Diffusion

    cs.LG 2025-02 unverdicted novelty 7.0

    DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.

  36. One Step Diffusion via Shortcut Models

    cs.LG 2024-10 conditional novelty 7.0

    Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.

  37. Diffusion Models Are Real-Time Game Engines

    cs.LG 2024-08 conditional novelty 7.0

    A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.

  38. Learning Interactive Real-World Simulators

    cs.AI 2023-10 conditional novelty 7.0

    UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

  39. Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

    cs.CV 2023-10 unverdicted novelty 7.0

    Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.

  40. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

    cs.LG 2022-08 unverdicted novelty 7.0

    Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.

  41. RiT: Vanilla Diffusion Transformers Suffice in Representation Space

    cs.CV 2026-05 conditional novelty 6.0

    A vanilla Diffusion Transformer trained via x-prediction on frozen DINOv2 features reaches FID 1.14 on ImageNet 256x256 with fewer parameters and faster sampling than prior DiT variants.

  42. Variance Reduction for Expectations with Diffusion Teachers

    cs.LG 2026-05 unverdicted novelty 6.0

    CARV amortizes upstream diffusion teacher costs over noise resamples with timestep importance sampling and stratified-inverse-CDF sampling, delivering 2-3x effective compute gains in text-to-3D experiments and order-o...

  43. Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

    cs.LG 2026-05 unverdicted novelty 6.0

    REPA-P aligns intermediate representations in diffusion models with physical states using first-principles PDE residuals to accelerate convergence and boost out-of-distribution robustness on PDE tasks.

  44. LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

    cs.CV 2026-05 unverdicted novelty 6.0

    LIFT and PLACE enable stable knowledge distillation for extremely lightweight diffusion models by decomposing the task into coarse alignment followed by fine refinement with piecewise local adaptive guidance.

  45. LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models

    cs.CV 2026-05 unverdicted novelty 6.0

    LIFT and PLACE enable stable training of extremely compressed diffusion models by breaking distillation into coarse linear alignment followed by local adaptive refinement.

  46. WavFlow: Audio Generation in Waveform Space

    cs.SD 2026-05 conditional novelty 6.0

    WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.

  47. DCFold: Efficient Protein Structure Generation with Single Forward Pass

    cs.LG 2026-05 unverdicted novelty 6.0

    DCFold achieves AlphaFold3-level protein structure prediction accuracy in a single forward pass using Dual Consistency training and a Temporal Geodesic Matching scheduler, delivering 15x inference acceleration.

  48. Taming Audio VAEs via Target-KL Regularization

    cs.SD 2026-05 unverdicted novelty 6.0

    The paper introduces target-KL regularization to train audio VAEs at specific bitrates, enabling rate-distortion curves and comparison to discrete audio codecs for improved text-to-sound generation.

  49. DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers

    cs.CV 2026-05 unverdicted novelty 6.0

    DiRotQ uses PCA-based rotation-aware activation quantization combined with GPTQ to achieve better FID and PSNR in 4-bit diffusion transformers than prior methods like SVDQuant.

  50. ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

    cs.CV 2026-05 unverdicted novelty 6.0

    ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency p...

  51. FLASH: Efficient Visuomotor Policy via Sparse Sampling

    cs.RO 2026-05 unverdicted novelty 6.0

    FLASH Policy uses sparse Legendre polynomial trajectory fitting and history-anchored flow matching to enable single-step inference for visuomotor control, reporting 31.4 ms per-episode latency and >=92% success on fiv...

  52. Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

    cs.CV 2026-05 unverdicted novelty 6.0

    CLVR couples verified logical planning with pixel diffusion, uses proxy reinforcement learning on distilled histories, and merges weights to cut inference to 4 NFEs while outperforming open-source T2I models on comple...

  53. Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

    cs.CV 2026-05 unverdicted novelty 6.0

    CLVR framework adds closed-loop visual verification, proxy prompt reinforcement learning, and delta-space weight merge to improve complex text-to-image generation over single-step or unverified multi-step baselines.

  54. ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems

    cs.LG 2026-05 conditional novelty 6.0

    ROMER cuts perplexity by up to 59% in noisy analog CIM environments for MoE LLMs via expert replacement and router recalibration calibrated on real-chip measurements.

  55. Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies

    physics.ao-ph 2026-05 unverdicted novelty 6.0

    A multivariate diffusion generative downscaling method preserves inter-variable correlations in climate data under large resolution increases, enabling more accurate compound risk assessment.

  56. FlashMol: High-Quality Molecule Generation in as Few as Four Steps

    cs.LG 2026-05 unverdicted novelty 6.0

    FlashMol produces chemically valid 3D molecules in 4 steps via distribution matching distillation with respaced timesteps and Jensen-Shannon regularization, matching or exceeding 1000-step teacher performance on QM9 a...

  57. MetaSR: Content-Adaptive Metadata Orchestration for Generative Super-Resolution

    cs.CV 2026-04 unverdicted novelty 6.0

    MetaSR adaptively orchestrates metadata in a DiT-based generative SR model to deliver up to 1 dB PSNR gains and 50% bitrate savings across diverse content and degradations.

  58. V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

    cs.LG 2026-04 unverdicted novelty 6.0

    V-GRPO makes ELBO surrogates stable and efficient for online RL alignment of denoising models, delivering SOTA text-to-image performance with 2-3x speedups over MixGRPO and DiffusionNFT.

  59. Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

    cs.CV 2026-04 unverdicted novelty 6.0

    Synthetic data complements real data in diffusion-based controllable human video generation, with effective sample selection improving motion realism, temporal consistency, and identity preservation.

  60. WFM: 3D Wavelet Flow Matching for Ultrafast Multi-Modal MRI Synthesis

    cs.CV 2026-04 unverdicted novelty 6.0

    WFM achieves near-diffusion quality for all four BraTS MRI modalities with one 82M model in 1-2 steps by flowing from the mean of conditioning modalities in wavelet space, running 250-1000x faster.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 111 Pith papers · 3 internal anchors

  1. [1]

    Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg

    Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. Structured denoising diffusion models in discrete state-spaces. CoRR, abs/2107.03006,

  2. [2]

    Learning gradient fields for shape generation

    Ruojin Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, and Bharath Hariharan. Learning gradient fields for shape generation. arXiv preprint arXiv:2008.06520,

  3. [3]

    Diffusion Models Beat GANs on Image Synthesis

    Prafulla Dhariwal and Alex Nichol. Diffusion models beat GANs on image synthesis.arXiv preprint arXiv:2105.05233,

  4. [4]

    FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

    Will Grathwohl, Ricky TQ Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367,

  5. [5]

    Cascaded diffusion models for high fidelity image generation

    Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation. arXiv preprint arXiv:2106.15282,

  6. [6]

    Argmax flows and multinomial diffusion: Learning categorical distributions, 2021

    Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Towards non-autoregressive language models. arXiv preprint arXiv:2102.05379,

  7. [7]

    Gotta go fast when generating data with score-based models,

    Alexia Jolicoeur-Martineau, Ke Li, Rémi Piché-Taillefer, Tal Kachman, and Ioannis Mitliagkas. Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080,

  8. [8]

    Kingma, Tim Salimans, Ben Poole, and Jonathan Ho

    Diederik P Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. arXiv preprint arXiv:2107.00630,

  9. [9]

    On fast sampling of diffusion probabilistic models,

    10 Published as a conference paper at ICLR 2022 Zhifeng Kong and Wei Ping. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132,

  10. [10]

    Bilateral denoising diffusion models

    Max WY Lam, Jun Wang, Rongjie Huang, Dan Su, and Dong Yu. Bilateral denoising diffusion models. arXiv preprint arXiv:2108.11514,

  11. [11]

    Li and Y

    Haoying Li, Yifan Yang, Meng Chang, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. arXiv preprint arXiv:2104.14951,

  12. [12]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101,

  13. [13]

    Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

    Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388,

  14. [14]

    Non gaussian denoising diffusion models

    Eliya Nachmani, Robin San Roman, and Lior Wolf. Non gaussian denoising diffusion models.arXiv preprint arXiv:2106.07582,

  15. [15]

    Fast generation for convolutional autoregressive models

    Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A Hasegawa-Johnson, Roy H Campbell, and Thomas S Huang. Fast genera- tion for convolutional autoregressive models. arXiv preprint arXiv:1704.06001,

  16. [16]

    Saharia, J

    Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement.arXiv preprint arXiv:2104.07636,

  17. [17]

    Noise estim ation for generative diffusion models

    Robin San-Roman, Eliya Nachmani, and Lior Wolf. Noise estimation for generative diffusion mod- els. arXiv preprint arXiv:2104.02600,

  18. [18]

    Maximum likelihood training of score- based diffusion models

    Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score- based diffusion models. arXiv e-prints, pp. arXiv–2101, 2021b. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference ...

  19. [19]

    Neural Stochastic Differ- ential Equations: Deep Latent Gaussian Models in the Diffu- sion Limit, 2019

    Belinda Tzen and Maxim Raginsky. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019a. 11 Published as a conference paper at ICLR 2022 Belinda Tzen and Maxim Raginsky. Theoretical guarantees for sampling and inference in generative models with latent diffusions. In Conference ...

  20. [20]

    InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2478–2488

    Daniel Watson, Jonathan Ho, Mohammad Norouzi, and William Chan. Learning to efficiently sam- ple from diffusion probabilistic models. arXiv preprint arXiv:2106.03802,

  21. [21]

    12 Published as a conference paper at ICLR 2022 A P ROBABILITY FLOW ODE IN TERMS OF LOG -SNR Song et al. (2021c) formulate the forward diffusion process in terms of an SDE of the form dz =f (z,t )dt +g(t)dW, (10) and show that samples from this diffusion process can be generated by solving the associated prob- ability flow ODE: dz = [f (z,t ) − 1 2g2(t)∇z ...

  22. [22]

    is given by zs = σs σt [zt −αt ˆxθ(zt)] +αs ˆxθ(zt), (20) fors < t. Taking the derivative of this expression with respect to λs, assuming again a variance preserving diffusion process, and using dαλ dλ = 1 2αλσ2 λ and dσλ dλ = − 1 2σλα2 λ, gives zλs dλs = dσλs dλs 1 σt [zt −αt ˆxθ(zt)] + dαλs dλs ˆxθ(zt) (21) = − 1 2α2 s σs σt [zt −αt ˆxθ(zt)] + 1 2αsσ2 s...

  23. [23]

    E S ETTINGS USED IN EXPERIMENTS Our model architectures closely follow those described by Dhariwal & Nichol (2021)

    Figure 5: Visualization of reparameterizing the diffusion process in terms ofφ and vφ. E S ETTINGS USED IN EXPERIMENTS Our model architectures closely follow those described by Dhariwal & Nichol (2021). For 64 × 64 ImageNet we use their model exactly, with 192 channels at the highest resolution. All other models are slight variations with different hyperp...

  24. [24]

    We use single-headed attention, and only apply this at the 16 × 16 and 8 × 8 resolutions

    At each resolution we apply 3 residual blocks, like described by Dhariwal & Nichol (2021). We use single-headed attention, and only apply this at the 16 × 16 and 8 × 8 resolutions. We use dropout of 0.2 when training the original model. No dropout is used during distillation. For LSUN we use a model similar to that for ImageNet, but with a reduced number ...

  25. [25]

    We clip the norm of gradients to a global norm of 1 before calculating parameter updates

    with a constant of 0.001. We clip the norm of gradients to a global norm of 1 before calculating parameter updates. For CIFAR-10 we train for 800k parameter updates, for ImageNet we use 550k updates, and for LSUN we use 400k updates. During distillation we train for 50k updates per iteration, except for the distillation to 2 and 1 sampling steps, for whic...

  26. [26]

    25612864321684212 3 4 5 6 78910 20 sampling steps FID 64x64 ImageNet Distilled DDIM Distilled Stochastic Undistilled Stochastic Figure 6: FID of generated samples from distilled and undistilled models, using DDIM or stochastic sampling. For the stochastic sampling results we present the best FID obtained by a grid-search over 11 possible noise levels, spa...

  27. [27]

    forms a non-Gaussian distribution that falls outside the family of Gaus- sian distributions that can be modelled by a single DDPM student step: A multi-step stochastic DDPM sampler can thus not be distilled into a few-step sampler without some loss in fidelity. This is in contrast with the deterministic DDIM sampler: here both the two-step DDIM teacher upd...

  28. [28]

    For each schedule we selected the optimal learning rate from [5e−5, 1e−4, 2e−4, 3e−4]

    All reported numbers are averages over 4 random seeds. For each schedule we selected the optimal learning rate from [5e−5, 1e−4, 2e−4, 3e−4]. 20 Published as a conference paper at ICLR 2022 25612864321684212 3 4 5678910 20 sampling steps FID 64x64 ImageNet 50k updates10k updates 2561286432168421 3 4 5678910 20 sampling steps 128x128 LSUN Bedrooms 50k upda...