super hub Mixed citations

Denoising Diffusion Implicit Models

Chenlin Meng, Jiaming Song · 2020 · cs.LG · arXiv 2010.02502

Mixed citation behavior. Most common role is background (67%).

525 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 525 citing papers more from Chenlin Meng arXiv PDF

abstract

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 58 method 23 baseline 2

citation-polarity summary

background 56 use method 23 baseline 2 support 1 unclear 1

claims ledger

abstract Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose revers

authors

and Stefano Ermon Chenlin Meng Jiaming Song

co-cited works

representative citing papers

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

cs.CV · 2026-04-04 · unverdicted · novelty 8.0

ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Flow-Map GRPO: Reinforcement Learning for Few-Step Flow-Map Generators via Anchored Stochastic Composition

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

Flow-Map GRPO uses anchored stochastic flow map composition to enable GRPO-based RL alignment of deterministic few-step flow-map generators while preserving their marginal paths.

Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.

Language-Assisted Super-Resolution from Real-World Low-Resolution Patches

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.

MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.

ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.

Diffusion Model Attribution via Spectral Coupling of Denoiser Responses

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

SDS extracts stable spectral signatures from diffusion model denoisers via frequency-controlled perturbations, achieving 99.9% attribution accuracy across eight models and 96.2% under prompt shift.

Where the Score Lives: A Wavelet View of Diffusion

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

Derives optimal score functions for diffusion models as wavelet expansions in terms of data moments, enabling architecture-agnostic analysis of which distribution attributes matter for denoising.

Consistent-Inversion: Reverse Consistency Guidance for Structure-Preserving Visual Editing

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

Consistent-Inversion introduces reverse consistency guidance that corrects early target denoising steps by checking reversibility toward the source inversion trajectory under the original prompt.

Parallel Jacobi Decoding for Fast Autoregressive Image Generation

cs.CV · 2026-06-04 · conditional · novelty 7.0

Parallel Jacobi Decoding accelerates autoregressive image models 4.8x-6.4x by using 2D spatial draft expansion and adjusted attention masks while keeping generation quality competitive.

Reflection Separation from a Single Image via Joint Latent Diffusion

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

A joint latent diffusion model with cross-layer self-attention and disjoint sampling separates reflection and transmission layers from single images more effectively than prior methods on real-world benchmarks.

Diffusing in the Right Space: A Systematic Study of Latent Diffusability

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

A large-scale empirical study across tokenizers and diffusion backbones identifies Velocity Irreducible Variance (VIV) as one of the most stable predictors of latent diffusion generation quality.

Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.

Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation

cs.CV · 2026-05-31 · unverdicted · novelty 7.0

DRDD decouples diffusion into independent noise and residual stages to preserve domain harmonization and enable unified data-efficient I2I translation.

Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance

cs.RO · 2026-05-28 · unverdicted · novelty 7.0

CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.

Midpoint Generative Models

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

Midpoint Generative Models define a midpoint divergence from flow matching symmetry and derive its variational form as a tractable objective for training competitive one-step generators.

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

Spectral Guidance learns singular functions via self-supervised objective to project guidance signals onto diffusion sampling trajectories, enabling stable control without retraining or backpropagation and improving CIFAR-10 accuracy by 37 points with 4x faster sampling.

Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

ASAP generates over 10K synthetic anatomical preference pairs via targeted degradation of high-fidelity images and applies a localized margin-bounded DPO to reduce anatomical errors in text-to-image human generation, supported by the new HAP dataset and HAF-Bench.

DeltaCam: Differential Intrinsic Camera Modeling for Video Generation

cs.CV · 2026-05-24 · unverdicted · novelty 7.0

DeltaCam models relative changes in camera intrinsics via Δ-parameterized neural adaptors in video diffusion models trained on synthetic data to enable controllable generation and real-world transfer.

Loki: Representation over Architecture for Diffusion-Based Portrait Animation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

Loki replaces RGB conditioning stacks with identity-orthogonal parametric face encodings rasterized for diffusion, achieving efficient cross-ID portrait animation without cross-ID training data.

Point Tracking Improves World Action Models

cs.RO · 2026-05-22 · unverdicted · novelty 7.0

JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.

citing papers explorer

Showing 25 of 525 citing papers.

Geometry-Editable and Appearance-Preserving Object Compositon cs.CV · 2025-05-27 · unverdicted · none · ref 7 · internal anchor
DGAD disentangles geometry editing via semantic embeddings from appearance preservation via cross-attention retrieval inside diffusion models for object composition.
Open-Sora Plan: Open-Source Large Video Generation Model cs.CV · 2024-11-28 · unverdicted · none · ref 17 · internal anchor
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.
MSG Score: Automated Video Verification for Reliable Multi-Scene Generation cs.CV · 2024-11-28 · unverdicted · none · ref 2 · internal anchor
Proposes MSG score as core of CGS framework plus IID distillation for automated, fast verification of long-form text-to-video outputs.
Flemme: A Flexible and Modular Learning Platform for Medical Images eess.IV · 2024-08-18 · unverdicted · none · ref 15 · internal anchor
Flemme is a modular platform separating encoders (conv/transformer/SSM) from encoder-decoder architectures for medical images, with a hierarchical pyramid loss yielding reported average gains of 5.6% Dice and 5.57% PSNR.
ModelScope Text-to-Video Technical Report cs.CV · 2023-08-12 · unverdicted · none · ref 54 · internal anchor
ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.
Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory cs.LG · 2026-06-04 · unverdicted · none · ref 96 · internal anchor
The book presents principles from optimization and information theory to explain deep network architectures and enable new interpretable models.
Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So? cs.CV · 2026-06-03 · unverdicted · none · ref 55 · internal anchor
Exploration of pre-generation prediction of human preference metrics (HPM) from noise seeds in diffusion models to improve output quality with negligible added cost.
Follow-Your-Preference++: Rethinking Preference Alignment for Image Inpainting cs.CV · 2026-06-02 · unverdicted · none · ref 58 · internal anchor
Empirical study shows reward model ensembles mitigate biases like brightness and composition in preference data for image inpainting, yielding better performance than prior methods without architecture changes.
Machine Learning for Multi-messenger Probes of New Physics and Cosmology: A Review and Perspective hep-ph · 2026-04-24 · unverdicted · none · ref 277 · internal anchor
A review summarizing machine learning methods for multi-messenger probes of dark matter and new physics, with a proposed plan for future integrated analyses.
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation cs.CV · 2026-04-13 · unverdicted · none · ref 147 · internal anchor
This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challenges like instance permanence and consistent interaction.
A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation cs.CV · 2025-08-13 · unverdicted · none · ref 192 · internal anchor
A survey that categorizes and summarizes methods applying 3D Gaussian Splatting to segmentation, editing, generation, and related tasks, including datasets and evaluation protocols.
When AI meets quantum information: A comprehensive review quant-ph · 2026-07-01 · unverdicted · none · ref 135 · internal anchor
A comprehensive review organizing progress at the AI-quantum information intersection from both directions.
Introduction to Stochastic Differential Equations for Generative Machine Learning: A Variational Perspective cs.LG · 2026-06-30 · unverdicted · none · ref 15 · internal anchor
An expository tutorial deriving the ELBO for SDE-based generative models and presenting diffusion, score, and flow matching as variational parameterizations illustrated on a 1D example.
HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation cs.CV · 2026-05-13 · unreviewed · ref 30 · internal anchor
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unreviewed · ref 62 · 2 links · internal anchor
Integrated Investment and Policy Planning for Power Systems via Differentiable Scenario Generation eess.SY · 2026-04-15 · unreviewed · ref 31 · internal anchor
MENO: MeanFlow-Enhanced Neural Operators for Dynamical Systems cs.LG · 2026-04-08 · unreviewed · ref 14 · internal anchor
IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales physics.ao-ph · 2026-03-23 · unreviewed · ref 19 · internal anchor
You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector cs.RO · 2026-03-16 · unreviewed · ref 35 · internal anchor
Dual-End Consistency Model cs.CV · 2026-02-11 · unreviewed · ref 43 · internal anchor
Prompt Reinjection: Alleviating Prompt Forgetting in Multimodal Diffusion Transformers cs.CV · 2026-02-06 · unreviewed · ref 13 · internal anchor
HunyuanImage 3.0 Technical Report cs.CV · 2025-09-28 · unreviewed · ref 2 · internal anchor
Training-Free Inference for High-Resolution Sinogram Completion cs.CV · 2025-06-10 · unreviewed · ref 34 · 2 links · internal anchor
Joint Reconstruction of Activity and Attenuation in PET by Diffusion Posterior Sampling in Wavelet Coefficient Space physics.med-ph · 2025-05-24 · unreviewed · ref 31 · internal anchor
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting cs.CE · 2025-02-26 · unreviewed · ref 63 · internal anchor

Denoising Diffusion Implicit Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer