super hub Mixed citations

Denoising Diffusion Implicit Models

Chenlin Meng, Jiaming Song · 2020 · cs.LG · arXiv 2010.02502

Mixed citation behavior. Most common role is background (67%).

592 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 592 citing papers more from Chenlin Meng arXiv PDF

abstract

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 58 method 23 baseline 2

citation-polarity summary

background 56 use method 23 baseline 2 support 1 unclear 1

claims ledger

abstract Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose revers

authors

and Stefano Ermon Chenlin Meng Jiaming Song

co-cited works

representative citing papers

Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization

cs.CV · 2026-06-09 · conditional · novelty 8.0

Lip Forcing distills a 14B bidirectional video diffusion teacher into autoregressive students that achieve real-time lip synchronization at 31 FPS using two denoising steps without CFG.

Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies

cs.RO · 2026-06-09 · unverdicted · novelty 8.0

TAKO demonstrates real-time adversarial takeover of robotic diffusion policies via reusable universal patches on visual inputs, achieving 100% success in steering attacker-chosen trajectories across multiple tasks, encoders, and diffusion methods.

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

cs.CV · 2026-04-04 · unverdicted · novelty 8.0

ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

Flow-Map GRPO: Reinforcement Learning for Few-Step Flow-Map Generators via Anchored Stochastic Composition

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

Flow-Map GRPO uses anchored stochastic flow map composition to enable GRPO-based RL alignment of deterministic few-step flow-map generators while preserving their marginal paths.

Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.

MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.

ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.

Diffusion Model Attribution via Spectral Coupling of Denoiser Responses

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

SDS extracts stable spectral signatures from diffusion model denoisers via frequency-controlled perturbations, achieving 99.9% attribution accuracy across eight models and 96.2% under prompt shift.

Keep The Essentials: Efficient Reference Conditioned Generation via Token Dropping

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Sparse Context achieves 2-4x faster inference in reference-conditioned diffusion models by fine-tuning with random token dropping and applying task-aware selection at inference time, without loss of visual quality.

MeshFlow: Mesh Generation with Equivariant Flow Matching

cs.GR · 2026-06-22 · unverdicted · novelty 7.0

MeshFlow applies equivariant optimal-transport flow matching to generate triangle meshes as soups, matching autoregressive quality with an 18x inference speedup.

PanoVine: Whole-Body Visuomotor Control for Soft Growing Vine Robot

cs.RO · 2026-06-22 · unverdicted · novelty 7.0

Introduces the first autonomous whole-body vision control system for soft vine robots via an end-to-end visuomotor policy trained on demonstrations.

Thinking in Boxes: 3D Editing in Real Images Made Easy

cs.CV · 2026-06-18 · unverdicted · novelty 7.0

A method that treats 3D box pairs as exact transformation specs, adds a depth-aware floor reference, and trains an image generator on synthetic scenes plus Objectron videos to perform large 3D edits on real photographs.

Timage: A Generative Text-in-Image Paradigm for Fine-Tuning Vision-Language Models

cs.CV · 2026-06-18 · unverdicted · novelty 7.0

Timage generates text query overlays on images via Constrained Schrödinger Bridge to boost fine-grained spatial reasoning in vision-language models, outperforming larger systems on VMCBench with a 7B backbone.

Learning to Distort: Weakly-Supervised Image Quality Transfer for Prostate DWI Correction

cs.CV · 2026-06-17 · unverdicted · novelty 7.0

A weakly-supervised image quality transfer method generates synthetic distorted DWI images from quality labels to train improved distortion correction models for prostate MRI.

Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting

cs.CV · 2026-06-16 · unverdicted · novelty 7.0

Introduces Robust-TOOC benchmark for corrupted images and Dual-TTT test-time training that updates only a text-guided denoising module to boost robustness in open-vocabulary counting.

Improving Robotic Generalist Policies via Flow Reversal Steering

cs.RO · 2026-06-11 · unverdicted · novelty 7.0

Flow Reversal Steering steers flow matching generalist policies by reversing suboptimal actions to nearby better modes, enabling improved zero-shot control, quick distillation, and RL bootstrapping in robotic manipulation.

Dual-Constrained Diffusion Image Compression for Operational Rate-Distortion-Perception Optimization

cs.CV · 2026-06-11 · unverdicted · novelty 7.0

DCIC uses dual constraints on a diffusion decoder to realize adjustable RDP operating points in neural image compression without extra rate cost.

Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics

cs.RO · 2026-06-10 · unverdicted · novelty 7.0

Ambient Diffusion Policy enables better imitation learning from suboptimal robot data by leveraging spectral properties to restrict data usage to specific diffusion times.

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

cs.CV · 2026-06-07 · unverdicted · novelty 7.0

MaskAlign uses random token-subset alignment and pre-mask mixing to reduce diffusion models' reliance on complete clean-image token sets during representation alignment.

Where the Score Lives: A Wavelet View of Diffusion

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

Derives optimal score functions for diffusion models as wavelet expansions in terms of data moments, enabling architecture-agnostic analysis of which distribution attributes matter for denoising.

Consistent-Inversion: Reverse Consistency Guidance for Structure-Preserving Visual Editing

cs.CV · 2026-06-05 · unverdicted · novelty 7.0

Consistent-Inversion introduces reverse consistency guidance that corrects early target denoising steps by checking reversibility toward the source inversion trajectory under the original prompt.

Parallel Jacobi Decoding for Fast Autoregressive Image Generation

cs.CV · 2026-06-04 · conditional · novelty 7.0

Parallel Jacobi Decoding accelerates autoregressive image models 4.8x-6.4x by using 2D spatial draft expansion and adjusted attention masks while keeping generation quality competitive.

citing papers explorer

Showing 50 of 336 citing papers after filters.

Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization cs.CV · 2026-06-09 · conditional · none · ref 37 · internal anchor
Lip Forcing distills a 14B bidirectional video diffusion teacher into autoregressive students that achieve real-time lip synchronization at 31 FPS using two denoising steps without CFG.
ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos cs.CV · 2026-04-04 · unverdicted · none · ref 31 · internal anchor
ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.
Flow-GRPO: Training Flow Matching Models via Online RL cs.CV · 2025-05-08 · unverdicted · none · ref 22 · internal anchor
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers cs.CV · 2026-06-30 · unverdicted · none · ref 44 · internal anchor
Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.
MUSE: Unlocking Timestep as Native Task Steering for One-Step Dense Prediction cs.CV · 2026-06-29 · unverdicted · none · ref 2 · internal anchor
MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.
ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving cs.CV · 2026-06-28 · unverdicted · none · ref 30 · internal anchor
Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.
Diffusion Model Attribution via Spectral Coupling of Denoiser Responses cs.CV · 2026-06-26 · unverdicted · none · ref 8 · internal anchor
SDS extracts stable spectral signatures from diffusion model denoisers via frequency-controlled perturbations, achieving 99.9% attribution accuracy across eight models and 96.2% under prompt shift.
Keep The Essentials: Efficient Reference Conditioned Generation via Token Dropping cs.CV · 2026-06-22 · unverdicted · none · ref 16 · internal anchor
Sparse Context achieves 2-4x faster inference in reference-conditioned diffusion models by fine-tuning with random token dropping and applying task-aware selection at inference time, without loss of visual quality.
Thinking in Boxes: 3D Editing in Real Images Made Easy cs.CV · 2026-06-18 · unverdicted · none · ref 55 · internal anchor
A method that treats 3D box pairs as exact transformation specs, adds a depth-aware floor reference, and trains an image generator on synthetic scenes plus Objectron videos to perform large 3D edits on real photographs.
Timage: A Generative Text-in-Image Paradigm for Fine-Tuning Vision-Language Models cs.CV · 2026-06-18 · unverdicted · none · ref 44 · internal anchor
Timage generates text query overlays on images via Constrained Schrödinger Bridge to boost fine-grained spatial reasoning in vision-language models, outperforming larger systems on VMCBench with a 7B backbone.
Learning to Distort: Weakly-Supervised Image Quality Transfer for Prostate DWI Correction cs.CV · 2026-06-17 · unverdicted · none · ref 19 · internal anchor
A weakly-supervised image quality transfer method generates synthetic distorted DWI images from quality labels to train improved distortion correction models for prostate MRI.
Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting cs.CV · 2026-06-16 · unverdicted · none · ref 23 · internal anchor
Introduces Robust-TOOC benchmark for corrupted images and Dual-TTT test-time training that updates only a text-guided denoising module to boost robustness in open-vocabulary counting.
Dual-Constrained Diffusion Image Compression for Operational Rate-Distortion-Perception Optimization cs.CV · 2026-06-11 · unverdicted · none · ref 29 · internal anchor
DCIC uses dual constraints on a diffusion decoder to realize adjustable RDP operating points in neural image compression without extra rate cost.
MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training cs.CV · 2026-06-07 · unverdicted · none · ref 24 · internal anchor
MaskAlign uses random token-subset alignment and pre-mask mixing to reduce diffusion models' reliance on complete clean-image token sets during representation alignment.
Consistent-Inversion: Reverse Consistency Guidance for Structure-Preserving Visual Editing cs.CV · 2026-06-05 · unverdicted · none · ref 2 · internal anchor
Consistent-Inversion introduces reverse consistency guidance that corrects early target denoising steps by checking reversibility toward the source inversion trajectory under the original prompt.
Parallel Jacobi Decoding for Fast Autoregressive Image Generation cs.CV · 2026-06-04 · conditional · none · ref 44 · internal anchor
Parallel Jacobi Decoding accelerates autoregressive image models 4.8x-6.4x by using 2D spatial draft expansion and adjusted attention masks while keeping generation quality competitive.
Reflection Separation from a Single Image via Joint Latent Diffusion cs.CV · 2026-06-02 · unverdicted · none · ref 47 · internal anchor
A joint latent diffusion model with cross-layer self-attention and disjoint sampling separates reflection and transmission layers from single images more effectively than prior methods on real-world benchmarks.
Diffusing in the Right Space: A Systematic Study of Latent Diffusability cs.CV · 2026-06-02 · unverdicted · none · ref 70 · internal anchor
A large-scale empirical study across tokenizers and diffusion backbones identifies Velocity Irreducible Variance (VIV) as one of the most stable predictors of latent diffusion generation quality.
Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo cs.CV · 2026-05-31 · unverdicted · none · ref 18 · internal anchor
SplatShot is a training-free method that inserts per-step 3DGS refitting and photometric feedback into diffusion denoising to enforce multi-view consistency for single-photo 3D face avatars.
Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation cs.CV · 2026-05-31 · unverdicted · none · ref 54 · internal anchor
DRDD decouples diffusion into independent noise and residual stages to preserve domain harmonization and enable unified data-efficient I2I translation.
Towards Anatomically Plausible Human Image Generation via Synthetic Localized Preferences cs.CV · 2026-05-25 · unverdicted · none · ref 33 · internal anchor
ASAP generates over 10K synthetic anatomical preference pairs via targeted degradation of high-fidelity images and applies a localized margin-bounded DPO to reduce anatomical errors in text-to-image human generation, supported by the new HAP dataset and HAF-Bench.
DeltaCam: Differential Intrinsic Camera Modeling for Video Generation cs.CV · 2026-05-24 · unverdicted · none · ref 32 · internal anchor
DeltaCam models relative changes in camera intrinsics via Δ-parameterized neural adaptors in video diffusion models trained on synthetic data to enable controllable generation and real-world transfer.
Loki: Representation over Architecture for Diffusion-Based Portrait Animation cs.CV · 2026-05-22 · unverdicted · none · ref 50 · internal anchor
Loki replaces RGB conditioning stacks with identity-orthogonal parametric face encodings rasterized for diffusion, achieving efficient cross-ID portrait animation without cross-ID training data.
DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation cs.CV · 2026-05-22 · unverdicted · none · ref 9 · internal anchor
DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation cs.CV · 2026-05-22 · unverdicted · none · ref 43 · internal anchor
VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models cs.CV · 2026-05-20 · unverdicted · none · ref 41 · internal anchor
Linear-DPO replaces sigmoid utility with linear utility and adds EMA reference to improve preference alignment in diffusion and flow-matching text-to-image models.
DrawMotion: Generating 3D Human Motions by Freehand Drawing cs.CV · 2026-05-20 · unverdicted · none · ref 64 · internal anchor
DrawMotion is a diffusion-based framework that fuses text and hand-drawn stickman conditions via a Multi-Condition Module and training-free guidance to generate 3D human motions.
FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models cs.CV · 2026-05-19 · unverdicted · none · ref 17 · 2 links · internal anchor
FlowErase-RL applies GRPO to reformulate concept erasure in flow matching models as reward optimization using a dynamic dual-path mechanism for target suppression and non-target preservation.
Functionalization via Structure Completion and Motion Rectification cs.CV · 2026-05-18 · unverdicted · none · ref 110 · internal anchor
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
StreamingEffect: Real-Time Human-Centric Video Effect Generation cs.CV · 2026-05-16 · unverdicted · none · ref 57 · internal anchor
StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.
Towards Generalized Image Manipulation Localization via Score-based Model cs.CV · 2026-05-16 · conditional · none · ref 29 · internal anchor
DiffIML applies score-based generative modeling to image manipulation localization, recovering coherent masks iteratively from noise to improve generalization on unseen manipulation types.
VMU-Diff: A Coarse-to-fine Multi-source Data Fusion Framework for Precipitation Nowcasting cs.CV · 2026-05-14 · unverdicted · none · ref 56 · internal anchor
VMU-Diff improves precipitation nowcasting via coarse multi-source Vision Mamba fusion followed by residual conditional diffusion refinement.
HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention cs.CV · 2026-05-14 · unverdicted · none · ref 27 · internal anchor
HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.
Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation cs.CV · 2026-05-13 · unverdicted · none · ref 28 · internal anchor
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
Amortized Guidance for Image Inpainting with Pretrained Diffusion Models cs.CV · 2026-05-13 · unverdicted · none · ref 33 · internal anchor
AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.
ImageAttributionBench: How Far Are We from Generalizable Attribution? cs.CV · 2026-05-13 · unverdicted · none · ref 62 · internal anchor
ImageAttributionBench is a benchmark dataset demonstrating that state-of-the-art image attribution methods lack robustness to image degradation and fail to generalize to semantically disjoint domains.
DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport cs.CV · 2026-05-13 · unverdicted · none · ref 38 · internal anchor
DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generative models.
LatentHDR: Decoupling Exposure from Diffusion via Conditional Latent-to-Latent Mapping for Text/Image-to-Panoramic HDR cs.CV · 2026-05-11 · unverdicted · none · ref 40 · internal anchor
LatentHDR generates structurally consistent panoramic HDR images by producing one scene latent with a diffusion backbone then deterministically mapping it to multiple exposure latents via a lightweight conditional head.
Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs cs.CV · 2026-05-10 · unverdicted · none · ref 40 · internal anchor
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
OphEdit: Training-Free Text-Guided Editing of Ophthalmic Surgical Videos cs.CV · 2026-05-08 · unverdicted · none · ref 11 · internal anchor
OphEdit enables text-guided editing of eye surgery videos without training by injecting preserved attention value tensors into the diffusion denoising process to maintain anatomical structure.
GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization cs.CV · 2026-05-08 · unverdicted · none · ref 22 · 2 links · internal anchor
GPO-V jailbreaks dVLMs by globally optimizing probabilities in the denoising process to bypass refusal patterns, achieving stealthy and transferable attacks.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 33 · internal anchor
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
MotionGRPO: Overcoming Low Intra-Group Diversity in GRPO-Based Egocentric Motion Recovery cs.CV · 2026-05-07 · unverdicted · none · ref 11 · internal anchor
MotionGRPO models diffusion sampling as a Markov decision process optimized with Group Relative Policy Optimization, using hybrid rewards and noise injection to boost sample diversity and local joint precision in egocentric motion recovery.
DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models cs.CV · 2026-05-05 · unverdicted · none · ref 59 · internal anchor
DMGD achieves better performance than fine-tuned SOTA methods in dataset distillation on ImageNet subsets by using semantic matching through conditional likelihood optimization and OT-based distribution matching in a training-free diffusion setup.
SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking cs.CV · 2026-05-04 · unverdicted · none · ref 15 · internal anchor
SpecEdit accelerates diffusion-based image editing up to 10x by using a low-resolution draft to identify edit-relevant tokens via semantic discrepancies for selective high-resolution denoising.
Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection cs.CV · 2026-04-30 · unverdicted · none · ref 54 · internal anchor
Noise2Map repurposes diffusion model denoising into a direct predictor for semantic segmentation and change detection tasks in remote sensing, achieving top average ranks on benchmark datasets.
SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset cs.CV · 2026-04-29 · unverdicted · none · ref 2 · internal anchor
SEAL introduces semantic-guided constraints during test-time adaptation to improve identity preservation and contextual control in single-image sticker personalization, backed by a new large-scale tagged sticker dataset.
ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent cs.CV · 2026-04-28 · unverdicted · none · ref 19 · internal anchor
ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization cs.CV · 2026-04-26 · unverdicted · none · ref 36 · internal anchor
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 42 · internal anchor
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.

Denoising Diffusion Implicit Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer