Lip Forcing distills a 14B bidirectional video diffusion teacher into autoregressive students that achieve real-time lip synchronization at 31 FPS using two denoising steps without CFG.
super hub Mixed citations
Denoising Diffusion Implicit Models
Mixed citation behavior. Most common role is background (67%).
abstract
Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \times$ to $50 \times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose revers
authors
co-cited works
representative citing papers
TAKO demonstrates real-time adversarial takeover of robotic diffusion policies via reusable universal patches on visual inputs, achieving 100% success in steering attacker-chosen trajectories across multiple tasks, encoders, and diffusion methods.
ActivityForensics is the first large-scale benchmark for temporally localizing activity-level forgeries in videos, paired with a diffusion-based baseline called TADiff.
Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
Flow-Map GRPO uses anchored stochastic flow map composition to enable GRPO-based RL alignment of deterministic few-step flow-map generators while preserving their marginal paths.
Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.
MUSE shows that the native timestep embedding in diffusion models acts as a parameter-free steering signal for multi-task monocular depth and normal estimation via manifold decoupling in latent space.
Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.
SDS extracts stable spectral signatures from diffusion model denoisers via frequency-controlled perturbations, achieving 99.9% attribution accuracy across eight models and 96.2% under prompt shift.
Synthetic minority augmentation improves threshold-integrated and optimized classification metrics only under model misspecification by correcting ranking errors, while providing no fundamental benefit beyond possible variance reduction under well-specified score models.
Sparse Context achieves 2-4x faster inference in reference-conditioned diffusion models by fine-tuning with random token dropping and applying task-aware selection at inference time, without loss of visual quality.
MeshFlow applies equivariant optimal-transport flow matching to generate triangle meshes as soups, matching autoregressive quality with an 18x inference speedup.
Introduces the first autonomous whole-body vision control system for soft vine robots via an end-to-end visuomotor policy trained on demonstrations.
A method that treats 3D box pairs as exact transformation specs, adds a depth-aware floor reference, and trains an image generator on synthetic scenes plus Objectron videos to perform large 3D edits on real photographs.
Timage generates text query overlays on images via Constrained Schrödinger Bridge to boost fine-grained spatial reasoning in vision-language models, outperforming larger systems on VMCBench with a 7B backbone.
A weakly-supervised image quality transfer method generates synthetic distorted DWI images from quality labels to train improved distortion correction models for prostate MRI.
Introduces Robust-TOOC benchmark for corrupted images and Dual-TTT test-time training that updates only a text-guided denoising module to boost robustness in open-vocabulary counting.
Flow Reversal Steering steers flow matching generalist policies by reversing suboptimal actions to nearby better modes, enabling improved zero-shot control, quick distillation, and RL bootstrapping in robotic manipulation.
DCIC uses dual constraints on a diffusion decoder to realize adjustable RDP operating points in neural image compression without extra rate cost.
Ambient Diffusion Policy enables better imitation learning from suboptimal robot data by leveraging spectral properties to restrict data usage to specific diffusion times.
MaskAlign uses random token-subset alignment and pre-mask mixing to reduce diffusion models' reliance on complete clean-image token sets during representation alignment.
Derives optimal score functions for diffusion models as wavelet expansions in terms of data moments, enabling architecture-agnostic analysis of which distribution attributes matter for denoising.
Consistent-Inversion introduces reverse consistency guidance that corrects early target denoising steps by checking reversibility toward the source inversion trajectory under the original prompt.
citing papers explorer
No citing papers match the current filters.