hub Mixed citations

Elucidating the Design Space of Diffusion-Based Generative Models

Tero Karras, Miika Aittala, Timo Aila, Samuli Laine · 2022 · cs.CV · arXiv 2206.00364

Mixed citation behavior. Most common role is background (56%).

38 Pith papers citing it

Background 56% of classified citations

open full Pith review browse 38 citing papers arXiv PDF

abstract

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 dataset 1 method 1

citation-polarity summary

background 5 unclear 2 use dataset 1 use method 1

representative citing papers

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

cs.RO · 2023-03-07 · accept · novelty 8.0

Diffusion Policy models robot actions as a conditional diffusion process, outperforming prior state-of-the-art methods by 46.9% on average across 12 manipulation tasks from four benchmarks.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

cs.LG · 2022-09-07 · unverdicted · novelty 8.0

Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.

Covariance-aware sampling for Diffusion Models

stat.ML · 2026-05-13 · conditional · novelty 7.0

A covariance-aware extension of DDIM sampling for pixel-space diffusion models that uses Tweedie's formula and Fourier decomposition to model reverse-process covariance and improves sample quality at low NFE.

Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.

Tempered Guided Diffusion

stat.ML · 2026-05-05 · unverdicted · novelty 7.0

Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.

$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.

Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A diffusion generative inverse model conditioned on temperature targets produces diverse, physically plausible urban vegetation patterns that achieve specified regional temperature shifts.

GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow

cs.CV · 2026-03-27 · unverdicted · novelty 7.0

GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS reduction over DCVC-RT.

B\'ezierFlow: Learning B\'ezier Stochastic Interpolant Schedulers for Few-Step Generation

cs.LG · 2025-12-15 · unverdicted · novelty 7.0

Béz ierFlow parameterizes stochastic interpolant schedulers as Béz ier functions to learn optimal sampling trajectories, achieving 2-3x better few-step performance than prior timestep optimization methods.

Diff-ANO: Towards Fast High-Resolution Ultrasound Computed Tomography via Conditional Consistency Models and Adjoint Neural Operators

math.NA · 2025-07-22 · unverdicted · novelty 7.0

Diff-ANO uses conditional consistency models and adjoint neural operator surrogates to enable fast, high-quality USCT reconstructions under sparse and partial views by replacing slow PDE solvers and enabling few-step sampling.

Beyond Blur: A Fluid Perspective on Generative Diffusion Models

cs.GR · 2025-06-20 · unverdicted · novelty 7.0

Proposes an advection-diffusion PDE corruption process with stochastic velocity fields and Lattice Boltzmann solver for diffusion models, generalizing prior PDE methods.

An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models

cs.LG · 2025-03-05 · unverdicted · novelty 7.0

Analytic solution of full-batch gradient flow for linear and convolutional denoisers in diffusion models yields a universal inverse-variance spectral law for learning times of eigenmodes.

Imagen Video: High Definition Video Generation with Diffusion Models

cs.CV · 2022-10-05 · unverdicted · novelty 7.0

Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.

Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.

Discrete Stochastic Localization for Non-autoregressive Generation

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.

Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.

Diffusion Models Memorize in Training -- and Generalize in Inference

cs.LG · 2026-03-12 · unverdicted · novelty 6.0

Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.

Discrete Bayesian Sample Inference for Graph Generation

cs.LG · 2025-11-04 · unverdicted · novelty 6.0

GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

cs.CV · 2024-03-05 · conditional · novelty 6.0

Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

cs.CV · 2023-11-25 · conditional · novelty 6.0

Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

cs.CV · 2023-07-04 · conditional · novelty 6.0

SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.

Shap-E: Generating Conditional 3D Implicit Functions

cs.CV · 2023-05-03 · accept · novelty 6.0

Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.

Learning Climate Variability from Scarce Data with Diffusion Models: A Test Case for ENSO

physics.ao-ph · 2026-06-25 · unverdicted · novelty 5.0

Diffusion models recover known ENSO variability structure from synthetic LIM data when given enough samples, but require pre-training on CMIP6 plus fine-tuning to match observations with the ~700 samples available in ERSSTv5.

Multiscale Fourier Neural Operator for Inverse Wave Scattering in Highly Oscillatory Media

math.NA · 2026-06-07 · unverdicted · novelty 5.0

MscaleFNO learns mappings from oscillatory media to wavefields for Helmholtz inverse problems and pairs it with diffusion regularization for partial-aperture 2D reconstructions.

citing papers explorer

Showing 23 of 23 citing papers after filters.

Covariance-aware sampling for Diffusion Models stat.ML · 2026-05-13 · conditional · none · ref 10 · internal anchor
A covariance-aware extension of DDIM sampling for pixel-space diffusion models that uses Tweedie's formula and Fourier decomposition to model reverse-process covariance and improves sample quality at low NFE.
Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs cs.CV · 2026-05-10 · unverdicted · none · ref 18 · internal anchor
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
Tempered Guided Diffusion stat.ML · 2026-05-05 · unverdicted · none · ref 26 · internal anchor
Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 15 · internal anchor
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns cs.CV · 2026-04-14 · unverdicted · none · ref 16 · internal anchor
A diffusion generative inverse model conditioned on temperature targets produces diverse, physically plausible urban vegetation patterns that achieve specified regional temperature shifts.
GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow cs.CV · 2026-03-27 · unverdicted · none · ref 16 · internal anchor
GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS reduction over DCVC-RT.
Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing cs.LG · 2026-05-15 · unverdicted · none · ref 118 · internal anchor
Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.
Discrete Stochastic Localization for Non-autoregressive Generation cs.LG · 2026-05-13 · unverdicted · none · ref 10 · internal anchor
DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.
Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models cs.CV · 2026-04-29 · unverdicted · none · ref 11 · internal anchor
SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.
Diffusion Models Memorize in Training -- and Generalize in Inference cs.LG · 2026-03-12 · unverdicted · none · ref 36 · internal anchor
Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.
Learning Climate Variability from Scarce Data with Diffusion Models: A Test Case for ENSO physics.ao-ph · 2026-06-25 · unverdicted · none · ref 26 · internal anchor
Diffusion models recover known ENSO variability structure from synthetic LIM data when given enough samples, but require pre-training on CMIP6 plus fine-tuning to match observations with the ~700 samples available in ERSSTv5.
Multiscale Fourier Neural Operator for Inverse Wave Scattering in Highly Oscillatory Media math.NA · 2026-06-07 · unverdicted · none · ref 21 · internal anchor
MscaleFNO learns mappings from oscillatory media to wavefields for Helmholtz inverse problems and pairs it with diffusion regularization for partial-aperture 2D reconstructions.
CaloTrilogy: Toward a Breakthrough in One-Step, End-to-End, Physics-Guided Shower Generation for Modern Calorimeters hep-ex · 2026-06-02 · unverdicted · none · ref 76 · internal anchor
Presents CaloTrilogy, a unified one-step generative model for high-granularity calorimeter showers that combines velocity field integration, learned priors, and physics losses to match SOTA quality.
Lossless Anti-Distillation Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 93 · internal anchor
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation physics.ins-det · 2026-05-12 · unverdicted · none · ref 59 · internal anchor
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditional flow matching.
Consistency Regularised Gradient Flows for Inverse Problems stat.ML · 2026-05-08 · unverdicted · none · ref 10 · internal anchor
A consistency-regularized Euclidean-Wasserstein-2 gradient flow performs joint posterior sampling and prompt optimization in latent space for efficient low-NFE inverse problem solving with diffusion models.
The Physical Limit of Neural Hypoxia Detection in the Black Sea from Satellite Observations physics.ao-ph · 2026-04-28 · unverdicted · none · ref 1 · 2 links · internal anchor
Neural networks can detect 38% of summer hypoxic events shelf-wide from satellites with 47% precision, but only within the homogeneous mixed layer.
Score-Based Matching with Target Guidance for Cryo-EM Denoising cs.CV · 2026-04-20 · unverdicted · none · ref 14 · internal anchor
Score-based denoising with reference-density guidance improves particle-background separability and downstream 3D reconstruction consistency on cryo-EM datasets.
Rethinking the Diffusion Model from a Langevin Perspective cs.LG · 2026-04-12 · unverdicted · none · ref 4 · internal anchor
Diffusion models are reorganized under a Langevin perspective that unifies ODE and SDE formulations and shows flow matching is equivalent to denoising under maximum likelihood.
Downscaling weather forecasts from Low- to High-Resolution with Diffusion Models physics.ao-ph · 2026-03-30 · unverdicted · none · ref 8 · internal anchor
A conditional diffusion model downscales global atmospheric forecasts from 100 km to 30 km resolution while improving probabilistic skill, matching power spectra, and preserving physical relationships.
A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models cs.LG · 2026-05-07 · unverdicted · none · ref 16 · internal anchor
Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.
LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention cs.CV · 2026-05-06 · unreviewed · ref 193 · internal anchor
U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster cs.LG · 2026-04-10 · unreviewed · ref 30 · internal anchor

Elucidating the Design Space of Diffusion-Based Generative Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer