super hub Canonical reference

Progressive Distillation for Fast Sampling of Diffusion Models

Jonathan Ho, Tim Salimans · 2022 · cs.LG · DOI 10.48550/arxiv.2202.00512 · arXiv 2202.00512

Canonical reference. 72% of citing Pith papers cite this work as background.

154 Pith papers citing it

Background 72% of classified citations

open full Pith review browse 154 citing papers more from Jonathan Ho arXiv PDF

abstract

Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with state-of-the-art samplers taking as many as 8192 steps, and are able to distill down to models taking as few as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 18 method 5 baseline 2

citation-polarity summary

background 18 use method 5 baseline 2

claims ledger

abstract Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a ne

authors

Jonathan Ho Tim Salimans

co-cited works

representative citing papers

An exact information theory of generalization phase transitions in Bayesian diffusion models

cs.LG · 2026-07-09 · conditional · novelty 8.0

Bayesian diffusion models memorize training data when mutual information between restricted observations and training data exceeds log dataset size, and generalize otherwise.

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

cs.CV · 2026-05-07 · unverdicted · novelty 8.0

CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.

Query Lower Bounds for Diffusion Sampling

cs.LG · 2026-04-12 · unverdicted · novelty 8.0

Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.

Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.

CoDMD: Copula-aware Distribution Matching Distillation for Fast Video Generation

cs.CV · 2026-06-20 · unverdicted · novelty 7.0

CoDMD adds a copula-matching regularizer to DMD for distilling 50-step video diffusion models to 4 steps, reporting VBench scores of 84.46/84.87 on 1.3B/14B Wan-2.1-T2V models.

Learning a Maximum Entropy Model for Visual Textures using Diffusion

cs.CV · 2026-06-15 · unverdicted · novelty 7.0

A diffusion-trained maximum entropy model uses 512 learned statistics to synthesize visual textures at quality matching or exceeding prior models that rely on ~177k statistics.

World Model Self-Distillation: Training World Models to Solve General Tasks

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

Self-distillation from a caption-conditioned video diffusion model to an image-and-prompt-conditioned executor, enhanced by RL from VLM feedback, enables task solving in world models.

Complexity-Balanced Diffusion Splitting

cs.CV · 2026-06-04 · unverdicted · novelty 7.0

CBS partitions the diffusion timeline into segments of equal approximation burden via Dirichlet energy and trajectory acceleration monitors estimated by an auxiliary model, yielding higher synthesis quality at fixed per-step cost across SiT, JiT and UNet backbones.

Parallel Jacobi Decoding for Fast Autoregressive Image Generation

cs.CV · 2026-06-04 · conditional · novelty 7.0

Parallel Jacobi Decoding accelerates autoregressive image models 4.8x-6.4x by using 2D spatial draft expansion and adjusted attention masks while keeping generation quality competitive.

Multimarginal flow matching with optimal transport potentials

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

OTP-FM extends conditional flow matching by incorporating dynamic optimal transport potentials to enable efficient multimarginal transport learning with intermediate observed marginals.

Where to Refine, When to Stop: Rethinking Redundancy via Latent Discrepancy for Efficient Visual Autoregressive Generation

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

LD-Pruning applies latent discrepancy to prune tokens and adaptively skip unconditional branches in VAR models for up to 2.35x faster inference with preserved quality.

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

cs.LG · 2026-05-24 · conditional · novelty 7.0

TRI-TSMC is a trust-region framework for learning twisting functions in SMC-based inference-time alignment of diffusion models that yields zero-variance samplers in theory and better alignment on text and image tasks under fixed budgets.

DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.

VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.

Generative Pseudo-Force Fields for Molecular Generation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.

RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation

cs.RO · 2026-05-17 · unverdicted · novelty 7.0

RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.

StreamingEffect: Real-Time Human-Centric Video Effect Generation

cs.CV · 2026-05-16 · unverdicted · novelty 7.0

StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.

Training-Free Generative Sampling via Moment-Matched Score Smoothing

stat.ML · 2026-05-14 · unverdicted · novelty 7.0

MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.

Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.

Muninn: Your Trajectory Diffusion Model But Faster

cs.RO · 2026-05-11 · unverdicted · novelty 7.0

Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

cs.HC · 2026-05-11 · unverdicted · novelty 7.0

HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.

LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated uncertainty than pixel-space or dropout baselines.

SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

SpecEdit accelerates diffusion-based image editing up to 10x by using a low-resolution draft to identify edit-relevant tokens via semantic discrepancies for selective high-resolution denoising.

citing papers explorer

Showing 50 of 154 citing papers.

An exact information theory of generalization phase transitions in Bayesian diffusion models cs.LG · 2026-07-09 · conditional · none · ref 52 · internal anchor
Bayesian diffusion models memorize training data when mutual information between restricted observations and training data exceeds log dataset size, and generalize otherwise.
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation cs.CV · 2026-05-07 · unverdicted · none · ref 44 · internal anchor
CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.
Query Lower Bounds for Diffusion Sampling cs.LG · 2026-04-12 · unverdicted · none · ref 17 · internal anchor
Diffusion sampling from d-dimensional distributions requires at least ~sqrt(d) adaptive score queries when score estimates have polynomial accuracy.
Cross-Space Distillation: Teaching One-Step Students with Modern Diffusion Teachers cs.CV · 2026-06-30 · unverdicted · none · ref 38 · internal anchor
Introduces a Bridge latent interface that maps mismatched student latents into teacher space, enabling distillation from modern diffusion teachers to compact one-step students and raising SD 1.5 HPSv3 from 5.4 to 9.4 while keeping one-step speed.
CoDMD: Copula-aware Distribution Matching Distillation for Fast Video Generation cs.CV · 2026-06-20 · unverdicted · none · ref 22 · internal anchor
CoDMD adds a copula-matching regularizer to DMD for distilling 50-step video diffusion models to 4 steps, reporting VBench scores of 84.46/84.87 on 1.3B/14B Wan-2.1-T2V models.
Learning a Maximum Entropy Model for Visual Textures using Diffusion cs.CV · 2026-06-15 · unverdicted · none · ref 38 · internal anchor
A diffusion-trained maximum entropy model uses 512 learned statistics to synthesize visual textures at quality matching or exceeding prior models that rely on ~177k statistics.
World Model Self-Distillation: Training World Models to Solve General Tasks cs.CV · 2026-06-10 · unverdicted · none · ref 50 · internal anchor
Self-distillation from a caption-conditioned video diffusion model to an image-and-prompt-conditioned executor, enhanced by RL from VLM feedback, enables task solving in world models.
Complexity-Balanced Diffusion Splitting cs.CV · 2026-06-04 · unverdicted · none · ref 30 · internal anchor
CBS partitions the diffusion timeline into segments of equal approximation burden via Dirichlet energy and trajectory acceleration monitors estimated by an auxiliary model, yielding higher synthesis quality at fixed per-step cost across SiT, JiT and UNet backbones.
Parallel Jacobi Decoding for Fast Autoregressive Image Generation cs.CV · 2026-06-04 · conditional · none · ref 38 · internal anchor
Parallel Jacobi Decoding accelerates autoregressive image models 4.8x-6.4x by using 2D spatial draft expansion and adjusted attention masks while keeping generation quality competitive.
Multimarginal flow matching with optimal transport potentials cs.LG · 2026-06-03 · unverdicted · none · ref 11 · internal anchor
OTP-FM extends conditional flow matching by incorporating dynamic optimal transport potentials to enable efficient multimarginal transport learning with intermediate observed marginals.
Where to Refine, When to Stop: Rethinking Redundancy via Latent Discrepancy for Efficient Visual Autoregressive Generation cs.CV · 2026-05-29 · unverdicted · none · ref 8 · internal anchor
LD-Pruning applies latent discrepancy to prune tokens and adaptively skip unconditional branches in VAR models for up to 2.35x faster inference with preserved quality.
Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo cs.LG · 2026-05-24 · conditional · none · ref 28 · internal anchor
TRI-TSMC is a trust-region framework for learning twisting functions in SMC-based inference-time alignment of diffusion models that yields zero-variance samplers in theory and better alignment on text and image tasks under fixed budgets.
DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation cs.CV · 2026-05-22 · unverdicted · none · ref 7 · internal anchor
DFSAttn is a training-free framework for dynamic fine-grained sparse attention in video DiTs that achieves up to 2.1x speedup while preserving generation quality via Hilbert reordering, hierarchical scoring, and adaptive caching.
VDE: Training-Free Accelerating Rectified Flow Model via Velocity Decomposition and Estimation cs.CV · 2026-05-22 · unverdicted · none · ref 38 · internal anchor
VDE accelerates rectified flow models like Flux by 3.22x with LPIPS of 0.069 via velocity decomposition into parallel/orthogonal components plus periodic full-pass anchoring.
Generative Pseudo-Force Fields for Molecular Generation cs.LG · 2026-05-18 · unverdicted · none · ref 67 · internal anchor
Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.
RoboFlow4D: A Lightweight Flow World Model Toward Real-Time Flow-Guided Robotic Manipulation cs.RO · 2026-05-17 · unverdicted · none · ref 23 · internal anchor
RoboFlow4D is an end-to-end lightweight flow world model that predicts multi-frame 3D flows from visual observations and textual instructions to provide explicit planning for real-time robotic manipulation.
StreamingEffect: Real-Time Human-Centric Video Effect Generation cs.CV · 2026-05-16 · unverdicted · none · ref 54 · internal anchor
StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.
Training-Free Generative Sampling via Moment-Matched Score Smoothing stat.ML · 2026-05-14 · unverdicted · none · ref 24 · internal anchor
MM-SOLD is a training-free particle sampler whose large-particle limit converges to a moment-matched Gibbs distribution obtained by exponentially tilting a score-smoothed target.
Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation cs.CV · 2026-05-13 · unverdicted · none · ref 33 · internal anchor
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
Muninn: Your Trajectory Diffusion Model But Faster cs.RO · 2026-05-11 · unverdicted · none · ref 52 · internal anchor
Muninn accelerates diffusion trajectory planners up to 4.6x by spending an uncertainty budget to decide when to cache denoiser outputs, preserving performance and certifying bounded deviation from full computation.
HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation cs.HC · 2026-05-11 · unverdicted · none · ref 69 · internal anchor
HapticLDM is the first latent diffusion model that generates vibrotactile signals directly from text, using dynamic text curation and global denoising to improve realism and semantic alignment over autoregressive baselines.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 30 · internal anchor
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
PODiff: Latent Diffusion in Proper Orthogonal Decomposition Space for Scientific Super-Resolution cs.LG · 2026-05-05 · unverdicted · none · ref 17 · internal anchor
PODiff performs conditional diffusion in a fixed, variance-ordered POD latent space to enable efficient probabilistic super-resolution of high-dimensional scientific fields with lower memory and better-calibrated uncertainty than pixel-space or dropout baselines.
SpecEdit: Training-Free Acceleration for Diffusion based Image Editing via Semantic Locking cs.CV · 2026-05-04 · unverdicted · none · ref 19 · internal anchor
SpecEdit accelerates diffusion-based image editing up to 10x by using a low-resolution draft to identify edit-relevant tokens via semantic discrepancies for selective high-resolution denoising.
CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies cs.CV · 2026-04-27 · unverdicted · none · ref 38 · internal anchor
CF-VLA uses a coarse initialization over endpoint velocity followed by single-step refinement to achieve strong performance with low inference steps on CALVIN, LIBERO, and real-robot tasks.
Dream-Cubed: Controllable Generative Modeling in Minecraft by Training on Billions of Cubes cs.CV · 2026-04-22 · unverdicted · none · ref 38 · internal anchor
Dream-Cubed releases a billion-scale voxel dataset and 3D diffusion models that generate controllable Minecraft worlds by operating directly on blocks.
Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning cs.LG · 2026-04-21 · unverdicted · none · ref 39 · internal anchor
GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
Structure-Adaptive Sparse Diffusion in Voxel Space for 3D Medical Image Enhancement cs.CV · 2026-04-20 · unverdicted · none · ref 21 · internal anchor
A sparse voxel-space diffusion method with structure-adaptive modulation achieves up to 10x training speedup and state-of-the-art results for 3D medical image denoising and super-resolution.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories cs.CV · 2026-04-16 · unverdicted · none · ref 39 · internal anchor
LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse cs.CV · 2026-04-06 · unverdicted · none · ref 7 · internal anchor
Chorus accelerates video DiT serving up to 45% via inter-request caching reuse in a three-stage denoising strategy with token-guided attention amplification.
1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation cs.CV · 2026-04-05 · conditional · none · ref 36 · internal anchor
1.x-Distill achieves better quality and diversity than prior few-step distillation methods at 1.67 and 1.74 effective NFEs on SD3 models with up to 33x speedup.
DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching cs.CV · 2026-02-05 · unverdicted · none · ref 54 · internal anchor
DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models cs.LG · 2026-02-04 · unverdicted · none · ref 32 · internal anchor
Early and late denoising steps in masked diffusion LMs are robust to smaller-model replacement, enabling 17% FLOPs reduction with modest generative quality loss.
Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation cs.CV · 2026-02-02 · conditional · none · ref 30 · 2 links · internal anchor
Causal Forcing distills few-step autoregressive video generators from an autoregressive diffusion teacher rather than a bidirectional one, avoiding conditional-expectation blur and beating Self Forcing on motion and quality metrics.
Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion cs.CV · 2025-12-29 · conditional · none · ref 62 · internal anchor
Stream-DiffVSR enables practical low-latency video super-resolution by combining a four-step distilled denoiser, auto-regressive temporal guidance, and a temporal processor in a strictly causal pipeline.
Large Video Planner Enables Generalizable Robot Control cs.RO · 2025-12-17 · conditional · none · ref 69 · internal anchor
A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 75 · internal anchor
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
StereoFoley: Object-Aware Stereo Audio Generation from Video cs.SD · 2025-09-22 · conditional · none · ref 30 · internal anchor
StereoFoley is an end-to-end video-to-stereo-audio framework that uses a base generative model fine-tuned on synthetic object-tracked data with panning and distance controls to achieve object-aware spatial sound.
Lipschitz-Guided Design of Interpolation Schedules in Generative Models stat.ML · 2025-09-01 · unverdicted · none · ref 34 · internal anchor
Minimizing averaged squared Lipschitzness of the drift produces interpolation schedules that improve numerical accuracy and mitigate mode collapse in generative models, with closed-form optima for Gaussians and validation on stochastic PDEs.
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE cs.AI · 2025-07-29 · unverdicted · none · ref 32 · internal anchor
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
History-Guided Video Diffusion cs.LG · 2025-02-10 · unverdicted · none · ref 49 · internal anchor
DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.
One Step Diffusion via Shortcut Models cs.LG · 2024-10-16 · conditional · none · ref 20 · internal anchor
Shortcut models enable high-quality single or few-step sampling in diffusion models with one network and training phase by conditioning on desired step size.
Diffusion Models Are Real-Time Game Engines cs.LG · 2024-08-27 · conditional · none · ref 86 · internal anchor
A diffusion model trained on DOOM play sessions generates stable real-time interactive game frames at 20 FPS with quality near lossy JPEG.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 222 · internal anchor
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference cs.CV · 2023-10-06 · unverdicted · none · ref 80 · internal anchor
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning cs.LG · 2022-08-12 · unverdicted · none · ref 14 · internal anchor
Diffusion-QL uses conditional diffusion models as expressive policies in offline RL by coupling behavior cloning with Q-value maximization, achieving SOTA on most D4RL tasks.
LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing cs.CV · 2026-06-25 · conditional · none · ref 48 · internal anchor
A three-stage distillation plus AR mask cache converts a bidirectional DiT editor into a real-time causal streaming editor that preserves non-edited regions at 12.66 FPS.
Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting cs.CV · 2026-03-30 · conditional · none · ref 24 · internal anchor
Per-position AR prediction entropy jointly drives speculative AR decoding and an anti-symmetric single-step drift decoder, yielding 3.8–5.5× faster hybrid visual generation without distillation.
Steering Optimisation Trajectories in Diffusion Representation Learning cs.CV · 2026-07-06 · conditional · none · ref 71 · internal anchor
SteeringDRL identifies two optimization regimes in diffusion autoencoders and uses gated residual U-Nets with a log SNR curriculum to steer training toward disentangled representations, improving performance across multiple benchmarks.
Unified Audio Intelligence Without Regressing on Text Intelligence cs.CL · 2026-07-06 · conditional · none · ref 70 · internal anchor
Audex unifies audio understanding and generation on a strong text MoE backbone with multi-stage SFT plus text-only Cascade RL, matching open SOTA audio scores while mostly retaining text capability.

Progressive Distillation for Fast Sampling of Diffusion Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer