pith. sign in

arxiv: 2010.02502 · v4 · submitted 2020-10-06 · 💻 cs.LG · cs.CV

Denoising Diffusion Implicit Models

Pith reviewed 2026-05-24 14:39 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords denoising diffusionimplicit modelssampling accelerationnon-Markovian processesimage generationlatent space interpolationgenerative models
0
0 comments X

The pith

Denoising diffusion implicit models produce high-quality samples using the same training as DDPMs but with far fewer sampling steps via non-Markovian processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that non-Markovian diffusion processes can be defined to match the training objective of standard DDPMs while making the reverse generative process much quicker to run. This addresses the slow sampling that has limited diffusion models in practice, since each sample normally requires simulating many steps of a Markov chain. A sympathetic reader would care because the change keeps training unchanged yet cuts the number of steps needed at generation time by a large factor. If the construction holds, it opens a direct way to balance speed against quality without retraining and supports operations like interpolation inside the model's latent space.

Core claim

We construct a class of non-Markovian diffusion processes that lead to the same training objective as DDPMs, but whose reverse process can be much faster to sample from. DDIMs therefore allow high-quality samples to be produced 10 times to 50 times faster in wall-clock time, let users trade computation for sample quality, and support semantically meaningful image interpolation directly in the latent space.

What carries the argument

The non-Markovian diffusion process, which is constructed to share the identical training objective with the Markovian DDPM forward process while permitting accelerated reverse sampling.

If this is right

  • Samples of comparable quality can be generated in 10x to 50x less wall-clock time than with DDPMs.
  • Users can choose fewer or more sampling steps to trade computation directly against sample quality.
  • Image interpolation performed in the latent space produces semantically meaningful results.
  • The generative process remains iterative and implicit but no longer requires the full Markov chain simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same non-Markovian construction might be applied to other iterative generative models that currently rely on Markovian forward processes.
  • Fewer sampling steps could make diffusion-based generation feasible inside interactive or real-time applications.
  • Latent-space interpolation raises the possibility of controlled editing or morphing tasks without additional supervision.

Load-bearing premise

Non-Markovian diffusion processes can be built that keep exactly the same training objective as the Markovian DDPM forward process yet allow a faster reverse sampling procedure.

What would settle it

Training a model on the DDIM objective and then measuring whether its few-step samples match the quality of a standard DDPM run with hundreds of steps on the same data.

Figures

Figures reproduced from arXiv: 2010.02502 by Chenlin Meng, Jiaming Song, Stefano Ermon.

Figure 1
Figure 1. Figure 1: Graphical models for diffusion (left) and non-Markovian (right) inference models. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Graphical model for accelerated generation, where [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CIFAR10 and CelebA samples with dim(τ ) = 10 and dim(τ ) = 100. 5.1 SAMPLE QUALITY AND EFFICIENCY In [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hours to sample 50k images with one Nvidia 2080 Ti GPU and samples at different steps. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Samples from DDIM with the same random xT and different number of steps. quality are encoded in the parameters, as longer sample trajectories gives better quality samples but do not significantly affect the high-level features. We show more samples in Appendix D.4. 5.3 INTERPOLATION IN DETERMINISTIC GENERATIVE PROCESSES [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Interpolation of samples from DDIM with dim(τ ) = 50. Since the high level features of the DDIM sample is encoded by xT , we are interested to see whether it would exhibit the semantic interpolation effect similar to that observed in other implicit proba￾8 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: CIFAR10 samples from 1000 step DDPM, 1000 step DDIM and 100 step DDIM. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: CelebA samples from 1000 step DDPM, 1000 step DDIM and 100 step DDIM. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: CelebA samples from DDIM with the same random [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Church samples from 100 step DDPM and 100 step DDIM. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: More interpolations from the CelebA DDIM with [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: More interpolations from the Bedroom DDIM with [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: More interpolations from the Church DDIM with [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
read the original abstract

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces denoising diffusion implicit models (DDIMs), a generalization of DDPMs that replaces the Markovian forward diffusion with a family of non-Markovian processes whose marginals q(x_t | x_0) remain identical. This permits reuse of the identical trained denoising network while allowing a non-Markovian reverse process that supports substantially larger steps, yielding 10–50× wall-clock speedups, a compute–quality tradeoff, and direct latent-space interpolation.

Significance. If the marginal-equivalence construction is exact, the result is significant: it removes the need to retrain when accelerating sampling and directly addresses the primary practical bottleneck of DDPMs. The additional capabilities (trade-off control and interpolation) further increase the method’s utility for downstream generative tasks.

major comments (2)
  1. [§3.2] §3.2 (construction of the non-Markovian process): the claim that any variance schedule β_t yields identical marginals q(x_t | x_0) to the DDPM forward process must be shown to hold without additional restrictions on the schedule; otherwise the same trained weights cannot be reused for accelerated sampling without distribution shift.
  2. [§4] §4 (experiments): the reported 10–50× wall-clock speedups and quality claims lack any description of the exact sampling schedules, hardware, batch sizes, number of runs, or variance across seeds; without these the empirical support for the central speedup claim cannot be assessed.
minor comments (2)
  1. [§3] Notation for the implicit reverse process (Eq. (7) or equivalent) should explicitly distinguish the deterministic limit (η=0) from the stochastic case to avoid reader confusion about when the process remains probabilistic.
  2. [§4] Figure 3 (interpolation examples) would benefit from a quantitative metric (e.g., LPIPS or FID between interpolated and endpoint images) rather than relying solely on visual inspection.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. Below we respond point-by-point to the two major comments. We will revise the manuscript to address both concerns.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (construction of the non-Markovian process): the claim that any variance schedule β_t yields identical marginals q(x_t | x_0) to the DDPM forward process must be shown to hold without additional restrictions on the schedule; otherwise the same trained weights cannot be reused for accelerated sampling without distribution shift.

    Authors: Section 3.2 constructs the non-Markovian forward process by defining q(x_{t-1}|x_t,x_0) so that the marginal q(x_t|x_0) is identical to the DDPM marginal for any schedule {β_t} that satisfies the standard DDPM conditions (0<β_t<1 and the usual cumulative product definitions). The derivation uses only the law of total probability and the Gaussian parameterization already present in DDPMs; no further restrictions on the schedule are imposed. Consequently the training objective remains unchanged and the same network weights can be reused. We will add an explicit sentence in §3.2 stating that the marginal equivalence holds for arbitrary valid β schedules. revision: partial

  2. Referee: [§4] §4 (experiments): the reported 10–50× wall-clock speedups and quality claims lack any description of the exact sampling schedules, hardware, batch sizes, number of runs, or variance across seeds; without these the empirical support for the central speedup claim cannot be assessed.

    Authors: We agree that the experimental section is missing these reproducibility details. In the revised manuscript we will report: (i) the exact DDIM sampling schedules (number of steps and η values) used for each speedup factor, (ii) the hardware (GPU model and count), (iii) batch sizes, and (iv) mean and standard deviation of FID/IS over at least three independent runs with different random seeds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation of non-Markovian equivalence is self-contained

full rationale

The paper derives a family of non-Markovian forward processes whose marginal distributions at each timestep match those of the DDPM Markov chain, thereby preserving the identical variational lower bound training objective for the shared denoising network. This equivalence follows directly from the closed-form expressions for the forward process means and variances (standard Gaussian conditioning) without reference to the reverse sampling speed or empirical outcomes. The accelerated reverse sampling is then obtained by choosing larger implicit steps in the non-Markovian chain, a consequence rather than an input. No equations reduce a prediction to a fitted parameter, no self-citation supplies the uniqueness of the construction, and the central claim remains independently verifiable from the stated assumptions on the diffusion schedule.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger entries; the method inherits standard diffusion assumptions without introducing new fitted parameters or entities in the provided text.

axioms (1)
  • domain assumption The forward process gradually adds noise to data in a manner compatible with a learned reverse process.
    Implicit in the statement that DDIMs share the same training objective as DDPMs.

pith-pipeline@v0.9.0 · 5678 in / 1136 out tokens · 21977 ms · 2026-05-24T14:39:11.967551+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

    cs.CV 2026-04 unverdicted novelty 8.0

    ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.

  2. Flow-GRPO: Training Flow Matching Models via Online RL

    cs.CV 2025-05 unverdicted novelty 8.0

    Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

  3. Consistency Models

    cs.LG 2023-03 conditional novelty 8.0

    Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

  4. Point Tracking Improves World Action Models

    cs.RO 2026-05 unverdicted novelty 7.0

    JOPAT jointly models pixels, point tracks, and actions in a diffusion transformer and reports gains over pixel-only baselines on long-horizon robot tasks with occlusion and off-screen motion.

  5. DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

    cs.CV 2026-05 unverdicted novelty 7.0

    DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adapt...

  6. VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

    cs.CV 2026-05 unverdicted novelty 7.0

    VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

  7. Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

    cs.CV 2026-05 unverdicted novelty 7.0

    Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.

  8. DrawMotion: Generating 3D Human Motions by Freehand Drawing

    cs.CV 2026-05 unverdicted novelty 7.0

    DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.

  9. CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

    cs.LG 2026-05 unverdicted novelty 7.0

    CAdam reinterprets densification in generative 3DGS as signal verification via gradient-moment interference, quantile context, and SNR gating to achieve large reductions in primitive count with comparable quality.

  10. DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

    cs.RO 2026-05 unverdicted novelty 7.0

    A hypernetwork generates complete task-specific visuomotor policy parameters from instructions alone to structurally eliminate observation leakage in language-conditioned robotic control.

  11. BrepForge: Factorized B-rep Synthesis via Wireframe Composition and Boundary-Conditioned Surface Instantiation

    cs.GR 2026-05 unverdicted novelty 7.0

    BrepForge factorizes B-rep synthesis into face-aware autoregressive wireframe composition followed by boundary-conditioned surface instantiation using learning-free geometric priors.

  12. Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement

    cs.LG 2026-05 unverdicted novelty 7.0

    IPR improves valid solution rates on MNIST Sudoku from 55.8% to 75.0% by iteratively refining partial regions in sequential diffusion models without external verifiers or reward models.

  13. PolycubeNet: A Dual-latent Diffusion Model for Polycube-Based Hexahedral Mesh Generation

    cs.GR 2026-05 unverdicted novelty 7.0

    PolycubeNet applies a dual-latent diffusion architecture to generate polycube point clouds from input point clouds, enabling robust hexahedral mesh creation without surface segmentation or templates.

  14. Functionalization via Structure Completion and Motion Rectification

    cs.CV 2026-05 unverdicted novelty 7.0

    Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture wi...

  15. StreamingEffect: Real-Time Human-Centric Video Effect Generation

    cs.CV 2026-05 unverdicted novelty 7.0

    StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.

  16. Towards Generalized Image Manipulation Localization via Score-based Model

    cs.CV 2026-05 conditional novelty 7.0

    DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.

  17. VMU-Diff: A Coarse-to-fine Multi-source Data Fusion Framework for Precipitation Nowcasting

    cs.CV 2026-05 unverdicted novelty 7.0

    VMU-Diff improves precipitation nowcasting via coarse multi-source Vision Mamba fusion followed by residual conditional diffusion refinement.

  18. HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention

    cs.CV 2026-05 unverdicted novelty 7.0

    HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.

  19. What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions

    cs.LG 2026-05 unverdicted novelty 7.0

    Introduces the task of counterfactual time series forecasting with textual conditions plus a text-attribution mechanism that improves accuracy by distinguishing mutable from immutable factors.

  20. Training-Free Generative Sampling via Moment-Matched Score Smoothing

    stat.ML 2026-05 unverdicted novelty 7.0

    MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.

  21. Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers

    cs.CV 2026-05 unverdicted novelty 7.0

    Text embeddings in MM-DiTs contain a detectable omission signal for missing concepts, and amplifying it via OSI reduces concept omission in generated images on FLUX.1-Dev and SD3.5-Medium.

  22. HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation

    cs.CV 2026-05 unverdicted novelty 7.0

    HIR-ALIGN augments limited target data for hyperspectral restoration by creating proxy clean images, synthesizing aligned HSIs with blur-robust diffusion and warp-based transfer, then finetuning models to lower target...

  23. Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation

    cs.CV 2026-05 unverdicted novelty 7.0

    A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.

  24. Stable Attention Response for Reliable Precipitation Nowcasting

    cs.LG 2026-05 conditional novelty 7.0

    HARECast stabilizes cross-sample variance in attention-response energy via group-wise regularization to reduce prediction errors in precipitation nowcasting.

  25. Amortized Guidance for Image Inpainting with Pretrained Diffusion Models

    cs.CV 2026-05 unverdicted novelty 7.0

    AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.

  26. ImageAttributionBench: How Far Are We from Generalizable Attribution?

    cs.CV 2026-05 unverdicted novelty 7.0

    ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.

  27. DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport

    cs.CV 2026-05 unverdicted novelty 7.0

    DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generativ...

  28. Discrete Stochastic Localization for Non-autoregressive Generation

    cs.LG 2026-05 unverdicted novelty 7.0

    Discrete Stochastic Localization provides a continuous-state framework with SNR-invariant denoisers on unit-sphere embeddings, enabling one network to support multiple per-token noise paths and improving MAUVE on OpenWebText.

  29. Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

    cs.LG 2026-05 unverdicted novelty 7.0

    GDPD treats partial student features as degraded observations and uses a learned diffusion prior over teacher features to sample restorative long-context targets for improved partial time-series classification.

  30. LatentHDR: Decoupling Exposure from Diffusion via Conditional Latent-to-Latent Mapping for Text/Image-to-Panoramic HDR

    cs.CV 2026-05 unverdicted novelty 7.0

    LatentHDR generates structurally consistent panoramic HDR images by producing one scene latent with a diffusion backbone then deterministically mapping it to multiple exposure latents via a lightweight conditional head.

  31. Muninn: Your Trajectory Diffusion Model But Faster

    cs.RO 2026-05 unverdicted novelty 7.0

    Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.

  32. Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs

    cs.CV 2026-05 unverdicted novelty 7.0

    PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.

  33. NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models

    cs.RO 2026-05 unverdicted novelty 7.0

    NoiseGate learns per-latent timestep schedules as an information-gating policy in diffusion-based world action models, yielding consistent gains on RoboTwin manipulation tasks.

  34. OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos

    cs.CV 2026-05 unverdicted novelty 7.0

    OphEdit enables text-guided editing of eye surgery videos without training by injecting preserved attention value tensors into the diffusion denoising process to maintain anatomical structure.

  35. GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

    cs.CV 2026-05 unverdicted novelty 7.0

    GPO-V is a visual jailbreak framework that bypasses safety guardrails in diffusion VLMs by globally manipulating generative probabilities during denoising.

  36. GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

    cs.CV 2026-05 unverdicted novelty 7.0

    GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.

  37. LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

    cs.CV 2026-05 unverdicted novelty 7.0

    LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

  38. MotionGRPO: Overcoming Low Intra-Group Diversity in GRPO-Based Egocentric Motion Recovery

    cs.CV 2026-05 unverdicted novelty 7.0

    MotionGRPO models diffusion sampling as a Markov decision process optimized with Group Relative Policy Optimization, using hybrid rewards and noise injection to boost sample diversity and local joint precision in egoc...

  39. D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

    cs.CV 2026-05 unverdicted novelty 7.0

    D-OPSD formulates supervised fine-tuning of step-distilled diffusion models as on-policy self-distillation by minimizing distribution differences between a text-only student and a multimodal teacher on the student's o...

  40. Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Symmetry breaking and nonlocality phase transitions occur nearly simultaneously during diffusion model generation in modern transformers.

  41. LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection

    cs.CV 2026-05 unverdicted novelty 7.0

    LEGO uses multiple generator-specific LoRA modules modulated by an MLP and fused with attention to detect synthetic images, achieving better performance than prior methods while using under 10% of the training data.

  42. DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models

    cs.CV 2026-05 unverdicted novelty 7.0

    DMGD achieves better performance than fine-tuned SOTA methods in dataset distillation on ImageNet subsets by using semantic matching through conditional likelihood optimization and OT-based distribution matching in a ...

  43. PerFlow: Physics-Embedded Rectified Flow for Efficient Reconstruction and Uncertainty Quantification of Spatiotemporal Dynamics

    cs.LG 2026-05 unverdicted novelty 7.0

    PerFlow embeds physics constraints into rectified flow sampling through guidance-free conditioning and constraint-preserving projections, achieving efficient sparse reconstruction and uncertainty quantification for sp...

  44. PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution

    cs.LG 2026-05 unverdicted novelty 7.0

    PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated unce...

  45. DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing

    cs.CV 2026-05 unverdicted novelty 7.0

    DirectEdit achieves step-level accurate inversion for flow-based image editing by directly aligning forward paths, using attention feature injection and mask-guided noise blending to balance fidelity and editability w...

  46. Generative Modeling with Orbit-Space Particle Flow Matching

    cs.GR 2026-05 unverdicted novelty 7.0

    OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.

  47. SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking

    cs.CV 2026-05 unverdicted novelty 7.0

    SpecEdit accelerates diffusion-based image editing up to 10x by using a low-resolution draft to identify edit-relevant tokens via semantic discrepancies for selective high-resolution denoising.

  48. Cast3: Translating numerical weather prediction principles into data-driven forecasting

    physics.ao-ph 2026-05 unverdicted novelty 7.0

    Cast3 translates NWP principles into a data-driven model using cubed-sphere grids, super-ensembles, and generative nudging to achieve state-of-the-art ensemble predictions that outperform baselines.

  49. Hyper-DP3: Frequency-Aware Right-Sizing of 3D Diffusion Policies for Visuomotor Control

    cs.RO 2026-05 conditional novelty 7.0

    Frequency analysis of smooth robot actions bounds denoising error to low-frequency modes, enabling a sub-1% parameter 3D diffusion policy with two-step inference that reaches SOTA on manipulation benchmarks.

  50. Fusing Urban Structure and Semantics: A Conditional Diffusion Model for Cross-City OD Matrix Generation

    cs.LG 2026-05 unverdicted novelty 7.0

    SEDAN fuses graph-based urban semantics and spatial structure inside a conditional diffusion model to generate behaviorally plausible and geographically coherent OD matrices, reporting a 7.38% RMSE gain over the WEDAN...

  51. Watch Your Step: Information Injection in Diffusion Models via Shadow Timestep Embedding

    cs.LG 2026-05 unverdicted novelty 7.0

    Timestep embeddings in diffusion models function as a separable side channel that can carry dedicated information for adversarial injection or detection.

  52. Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection

    cs.CV 2026-04 unverdicted novelty 7.0

    Noise2Map repurposes diffusion model denoising into a direct predictor for semantic segmentation and change detection tasks in remote sensing, achieving top average ranks on benchmark datasets.

  53. SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset

    cs.CV 2026-04 unverdicted novelty 7.0

    SEAL introduces semantic-guided constraints during test-time adaptation to improve identity preservation and contextual control in single-image sticker personalization, backed by a new large-scale tagged sticker dataset.

  54. ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent

    cs.CV 2026-04 unverdicted novelty 7.0

    ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.

  55. Generative diffusion models for spatiotemporal influenza forecasting

    cs.LG 2026-04 unverdicted novelty 7.0

    Influpaint uses generative diffusion models on image-encoded influenza data to produce realistic and diverse epidemic trajectories that match leading ensemble methods in accuracy.

  56. Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization

    cs.CV 2026-04 unverdicted novelty 7.0

    Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.

  57. $Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models

    cs.CV 2026-04 unverdicted novelty 7.0

    Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a di...

  58. ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

    cs.CV 2026-04 unverdicted novelty 7.0

    ReImagine decouples human appearance from temporal consistency via pretrained image backbones, SMPL-X motion guidance, and training-free video diffusion refinement to generate high-quality controllable videos.

  59. HP-Edit: A Human-Preference Post-Training Framework for Image Editing

    cs.CV 2026-04 unverdicted novelty 7.0

    HP-Edit introduces a post-training framework and RealPref-50K dataset that uses a VLM-based HP-Scorer to align diffusion image editing models with human preferences, improving outputs on Qwen-Image-Edit-2509.

  60. Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

    cs.CV 2026-04 unverdicted novelty 7.0

    OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · cited by 428 Pith papers · 20 internal anchors

  1. [1]

    Wasserstein GAN

    Martin Arjovsky, Soumith Chintala, and L ´eon Bottou. Wasserstein GAN. arXiv preprint arXiv:1701.07875, January

  2. [2]

    Large Scale GAN Training for High Fidelity Natural Image Synthesis

    Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, September

  3. [3]

    WaveG- rad: Estimating gradients for waveform generation

    Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. WaveG- rad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, September

  4. [4]

    Neural Ordinary Differential Equations

    Ricky T Q Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differ- ential equations. arXiv preprint arXiv:1806.07366, June

  5. [5]

    Density estimation using Real NVP

    Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. arXiv preprint arXiv:1605.08803, May

  6. [6]

    FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

    Will Grathwohl, Ricky T Q Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. FFJORD: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367, October

  7. [7]

    Im- proved training of wasserstein gans

    10 Published as a conference paper at ICLR 2021 Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Im- proved training of wasserstein gans. In Advances in Neural Information Processing Systems, pp. 5769–5779,

  8. [8]

    GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two Time-Scale update rule converge to a local nash equilibrium. arXiv preprint arXiv:1706.08500, June

  9. [9]

    Denoising Diffusion Probabilistic Models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.arXiv preprint arXiv:2006.11239, June

  10. [10]

    A Style-Based Generator Architecture for Generative Adversarial Networks

    Tero Karras, Samuli Laine, and Timo Aila. A Style-Based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948, December

  11. [11]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-Encoding variational bayes. arXiv preprint arXiv:1312.6114v10, December

  12. [12]

    Generalizing Hamiltonian Monte Carlo with Neural Networks

    Daniel Levy, Matthew D Hoffman, and Jascha Sohl-Dickstein. Generalizing hamiltonian monte carlo with neural networks. arXiv preprint arXiv:1711.09268,

  13. [13]

    Learning in Implicit Generative Models

    Shakir Mohamed and Balaji Lakshminarayanan. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, October

  14. [14]

    Continuous- in-depth neural networks

    Alejandro F Queiruga, N Benjamin Erichson, Dane Taylor, and Michael W Mahoney. Continuous- in-depth neural networks. arXiv preprint arXiv:2008.02389,

  15. [15]

    Variational Inference with Normalizing Flows

    ISSN 0899-7667, 1530-888X. Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770, May

  16. [16]

    Stochastic Backpropagation and Approximate Inference in Deep Generative Models

    Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082,

  17. [17]

    Markov Chain Monte Carlo and Variational Inference: Bridging the Gap

    Tim Salimans, Diederik P Kingma, and Max Welling. Markov chain monte carlo and variational inference: Bridging the gap. arXiv preprint arXiv:1410.6460, October

  18. [18]

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics

    Jascha Sohl-Dickstein, Eric A Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. arXiv preprint arXiv:1503.03585, March

  19. [19]

    A-NICE-MC: Adversarial Training for MCMC

    11 Published as a conference paper at ICLR 2021 Jiaming Song, Shengjia Zhao, and Stefano Ermon. A-nice-mc: Adversarial training for mcmc.arXiv preprint arXiv:1706.07561, June

  20. [20]

    Generative Modeling by Estimating Gradients of the Data Distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. arXiv preprint arXiv:1907.05600, July

  21. [21]

    Improved techniques for training Score-Based generative models

    Yang Song and Stefano Ermon. Improved techniques for training Score-Based generative models. arXiv preprint arXiv:2006.09011, June

  22. [22]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456,

  23. [23]

    WaveNet: A Generative Model for Raw Audio

    Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, September 2016a. Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1...

  24. [24]

    Wide Residual Networks

    Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, May

  25. [25]

    Since the focus of this paper is to accelerate reverse models corresponding to the Gaussian diffusion, we leave empirical evaluations as future work

    12 Published as a conference paper at ICLR 2021 A N ON-M ARKOVIAN FORWARD PROCESSES FOR A DISCRETE CASE In this section, we describe a non-Markovian forward processes for discrete data and corresponding variational objectives. Since the focus of this paper is to accelerate reverse models corresponding to the Gaussian diffusion, we leave empirical evaluati...

  26. [26]

    (6) andqσ(xt−1|xt, x0) defined in Eq

    Forqσ(x1:T|x0) defined in Eq. (6) andqσ(xt−1|xt, x0) defined in Eq. (7), we have: qσ(xt|x0) =N (√αtx0, (1−αt)I) (22) Proof. Assume for anyt≤T ,qσ(xt|x0) =N (√αtx0, (1−αt)I) holds, if: qσ(xt−1|x0) =N (√αt−1x0, (1−αt−1)I) (23) then we can prove the statement with an induction argument for t fromT to 1, since the base case (t =T ) already holds. First, we have...

  27. [27]

    Variance-Exploding

    The ODE in Eq. (14) with the optimal modelϵ(t) θ has an equivalent probability flow ODE corresponding to the “Variance-Exploding” SDE in Song et al. (2020). 14 Published as a conference paper at ICLR 2021 Proof. In the context of the proof, we considert as a continous, independent “time” variable and x andα as functions oft. First, let us consider a repara...

  28. [28]

    marginals

    for more details of VE-SDE. 15 Published as a conference paper at ICLR 2021 ODE form for VE-SDE Definept(¯x) as the data distribution perturbed withσ2(t) variance Gaus- sian noise. The probability flow for VE-SDE is defined as Song et al. (2020): d¯x =−1 2g(t)2∇¯x logpt(¯x)dt (47) whereg(t) = √ dσ2(t) dt is the diffusion coefficient, and∇¯x logpt(¯x) is the s...

  29. [29]

    We use the same model for each dataset, and only compare the performance of different generative processes

    to make the results directly comparable. We use the same model for each dataset, and only compare the performance of different generative processes. For CIFAR10, Bedroom and Church, we obtain the pretrained checkpoints from the original DDPM implementation; for CelebA, we trained our own model using the denoising objectiveL1. Our architecture forϵ(t) θ (x...

  30. [30]

    We use the pretrained models from Ho et al

    based on a Wide ResNet (Zagoruyko & Komodakis, 2016). We use the pretrained models from Ho et al. (2020) for CIFAR10, Bedroom and Church, and train our own model for the CelebA 64× 64 model (since a pretrained model is not provided). Our CelebA model has five feature map resolutions from 64× 64 to 4× 4, and we use the original CelebA dataset (not CelebA-HQ...

  31. [31]

    The constant value c is selected such that τ−1 is close to T

    dim(τ) 10 20 50 100 10 20 50 100 DDIM (η = 0.0) 16.95 8.89 6.75 6.62 19.45 12.47 10.84 10.58 DDPM (η = 1.0) 42.78 22.77 10.81 6.81 51.56 23.37 11.16 8.27 D.2 R EVERSE PROCESS SUB -SEQUENCE SELECTION We consider two types of selection procedure forτ given the desired dim(τ)<T : • Linear: we select the timesteps such thatτi =⌊ci⌋ for somec; • Quadratic: we ...

  32. [32]

    21 Published as a conference paper at ICLR 2021 Figure 13: More interpolations from the Church DDIM with dim(τ) =