hub Mixed citations

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu · 2022 · cs.LG · arXiv 2211.01095

Mixed citation behavior. Most common role is background (50%).

39 Pith papers citing it

Background 50% of classified citations

open full Pith review browse 39 citing papers arXiv PDF

abstract

Diffusion probabilistic models (DPMs) have achieved impressive success in high-resolution image synthesis, especially in recent large-scale text-to-image generation applications. An essential technique for improving the sample quality of DPMs is guided sampling, which usually needs a large guidance scale to obtain the best sample quality. The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples. Although recent works propose dedicated high-order solvers and achieve a further speedup for sampling without guidance, their effectiveness for guided sampling has not been well-tested before. In this work, we demonstrate that previous high-order fast samplers suffer from instability issues, and they even become slower than DDIM when the guidance scale grows large. To further speed up guided sampling, we propose DPM-Solver++, a high-order solver for the guided sampling of DPMs. DPM-Solver++ solves the diffusion ODE with the data prediction model and adopts thresholding methods to keep the solution matches training data distribution. We further propose a multistep variant of DPM-Solver++ to address the instability issue by reducing the effective step size. Experiments show that DPM-Solver++ can generate high-quality samples within only 15 to 20 steps for guided sampling by pixel-space and latent-space DPMs.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 3

citation-polarity summary

background 3 use method 3

representative citing papers

Entropy Across the Bridge: Conditional-Marginal Discretization for Flow and Schr\"odinger Samplers

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Derives a conditional-marginal entropy-rate objective for bridge-aware discretization that yields U-shaped schedules and improves low-NFE sample quality on 2D, CIFAR-10, and protein tasks.

Is Monotonic Sampling Necessary in Diffusion Models?

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.

TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

TMPO uses Softmax Trajectory Balance to match policy probabilities over multiple trajectories to a Boltzmann reward distribution, improving diversity by 9.1% in diffusion alignment tasks.

Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning

eess.SP · 2026-05-06 · unverdicted · novelty 7.0

Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.

DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching

cs.CV · 2026-02-05 · unverdicted · novelty 7.0

DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.

Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers

cs.AI · 2026-01-09 · unverdicted · novelty 7.0

DiTs use either a two-stage cross-attention circuit or text-token fusion circuit for spatial relations depending on the text encoder, achieving near-perfect in-domain accuracy but differing out-of-domain robustness.

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

cs.SD · 2025-09-28 · unverdicted · novelty 7.0

AudioMoG is a mixture-of-guidance sampling technique that combines CFG and AG signals to outperform single-guidance baselines in text-to-audio generation at equivalent speed.

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

cs.LG · 2025-09-19 · unverdicted · novelty 7.0

DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-process methods.

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

cs.AI · 2025-07-29 · unverdicted · novelty 7.0

MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.

Toward Theoretical Insights into Diffusion Trajectory Distillation via Operator Merging

cs.LG · 2025-05-21 · unverdicted · novelty 7.0

Diffusion trajectory distillation is reframed as operator merging, yielding an optimal variance-driven merging strategy via Pareto dynamic programming in the linear Gaussian case and unavoidable approximation errors from exponential mixture growth in the nonlinear case.

UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models

cs.CV · 2025-04-17 · unverdicted · novelty 7.0

UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

cs.LG · 2024-06-06 · conditional · novelty 7.0

Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

cs.CV · 2023-10-06 · unverdicted · novelty 7.0

Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.

ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.

FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

FIS-DiT achieves 2.11-2.41x speedup on video DiT models in few-step regimes with negligible quality loss by exploiting frame-wise sparsity and consistency through a training-free interleaved execution strategy.

The two clocks and the innovation window: When and how generative models learn rules

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

Lookahead Drifting Model

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

The lookahead drifting model improves upon the drifting model by sequentially computing multiple drifting terms that incorporate higher-order gradient information, leading to better performance on toy examples and CIFAR10.

Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.

Image Diffusion Preview with Consistency Solver

cs.LG · 2025-12-15 · unverdicted · novelty 6.0

ConsistencySolver enables high-quality low-step diffusion previews by adapting general linear multistep methods into a lightweight RL-optimized solver, matching multistep DPM-Solver FID with 47% fewer steps and cutting user interaction time by nearly 50%.

PixelDiT: Pixel Diffusion Transformers for Image Generation

cs.CV · 2025-11-25 · conditional · novelty 6.0

PixelDiT generates images in pixel space with a dual-level transformer and reaches 1.61 FID on ImageNet 256, outperforming prior pixel-space models.

NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation

cs.CV · 2025-10-02 · unverdicted · novelty 6.0

NoiseShift learns a resolution-specific mapping from scheduler noise to conditioning noise via lightweight calibration to restore consistency and improve low-resolution generation quality in models like SD3 and Flux.

Synthesis of discrete-continuous quantum circuits with multimodal diffusion models

quant-ph · 2025-06-02 · unverdicted · novelty 6.0

Multimodal diffusion model generates discrete gate selections and continuous parameters for quantum circuit compilation, claiming better gate counts and noise resilience than prior methods.

Sampling-Aware Quantization for Diffusion Models

cs.CV · 2025-05-04 · unverdicted · novelty 6.0

A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

cs.RO · 2025-02-27 · accept · novelty 6.0

OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.

citing papers explorer

Showing 39 of 39 citing papers.

Entropy Across the Bridge: Conditional-Marginal Discretization for Flow and Schr\"odinger Samplers cs.LG · 2026-05-15 · unverdicted · none · ref 34 · internal anchor
Derives a conditional-marginal entropy-rate objective for bridge-aware discretization that yields U-shaped schedules and improves low-NFE sample quality on 2D, CIFAR-10, and protein tasks.
Is Monotonic Sampling Necessary in Diffusion Models? cs.LG · 2026-05-12 · unverdicted · none · ref 75 · internal anchor
Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.
TMPO: Trajectory Matching Policy Optimization for Diverse and Efficient Diffusion Alignment cs.LG · 2026-05-09 · unverdicted · none · ref 28 · 2 links · internal anchor
TMPO uses Softmax Trajectory Balance to match policy probabilities over multiple trajectories to a Boltzmann reward distribution, improving diversity by 9.1% in diffusion alignment tasks.
Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning eess.SP · 2026-05-06 · unverdicted · none · ref 14 · internal anchor
Grayscale diffusion model generates two-layer RF passives with sub-pixel resolution from partial S-parameters, achieving low error in surrogate predictions and validated on fabricated filters.
DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching cs.CV · 2026-02-05 · unverdicted · none · ref 39 · internal anchor
DisCa replaces heuristic feature caching with a lightweight learnable neural predictor compatible with distillation, achieving 11.8× acceleration on video diffusion transformers with preserved generation quality.
Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers cs.AI · 2026-01-09 · unverdicted · none · ref 23 · internal anchor
DiTs use either a two-stage cross-attention circuit or text-token fusion circuit for spatial relations depending on the text encoder, achieving near-perfect in-domain accuracy but differing out-of-domain robustness.
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance cs.SD · 2025-09-28 · unverdicted · none · ref 52 · internal anchor
AudioMoG is a mixture-of-guidance sampling technique that combines CFG and AG signals to outperform single-guidance baselines in text-to-audio generation at equivalent speed.
DiffusionNFT: Online Diffusion Reinforcement with Forward Process cs.LG · 2025-09-19 · unverdicted · none · ref 16 · internal anchor
DiffusionNFT performs online RL for diffusion models on the forward process via flow matching and positive-negative contrasts, delivering up to 25x efficiency gains and rapid benchmark improvements over prior reverse-process methods.
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE cs.AI · 2025-07-29 · unverdicted · none · ref 23 · internal anchor
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
Toward Theoretical Insights into Diffusion Trajectory Distillation via Operator Merging cs.LG · 2025-05-21 · unverdicted · none · ref 17 · internal anchor
Diffusion trajectory distillation is reframed as operator merging, yielding an optimal variance-driven merging strategy via Pareto dynamic programming in the linear Gaussian case and unavoidable approximation errors from exponential mixture growth in the nonlinear case.
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models cs.CV · 2025-04-17 · unverdicted · none · ref 38 · internal anchor
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data cs.LG · 2024-06-06 · conditional · none · ref 71 · internal anchor
Absorbing discrete diffusion models the conditional distributions of clean data; reparameterizing yields a time-independent RADD that unifies with AO-ARMs and reaches SOTA perplexity among diffusion models on zero-shot language benchmarks.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference cs.CV · 2023-10-06 · unverdicted · none · ref 70 · internal anchor
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
ElasticDiT: Efficient Diffusion Transformers via Elastic Architecture and Sparse Attention for High-Resolution Image Generation on Mobile Devices cs.CV · 2026-05-15 · unverdicted · none · ref 57 · internal anchor
ElasticDiT introduces an elastic DiT architecture with adjustable spatial compression and block depth plus Shift Sparse Block Attention and a distilled VAE to enable a single model to cover multiple fidelity-latency points for high-resolution image generation on mobile devices.
FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity cs.CV · 2026-05-12 · unverdicted · none · ref 26 · internal anchor
FIS-DiT achieves 2.11-2.41x speedup on video DiT models in few-step regimes with negligible quality loss by exploiting frame-wise sparsity and consistency through a training-free interleaved execution strategy.
The two clocks and the innovation window: When and how generative models learn rules cs.LG · 2026-05-11 · unverdicted · none · ref 53 · internal anchor
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.
Lookahead Drifting Model cs.LG · 2026-04-10 · unverdicted · none · ref 17 · internal anchor
The lookahead drifting model improves upon the drifting model by sequentially computing multiple drifting terms that incorporate higher-order gradient information, leading to better performance on toy examples and CIFAR10.
Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning cs.LG · 2026-04-10 · unverdicted · none · ref 40 · internal anchor
JFDL allows pre-trained Consistency Models to perform guided image generation post-hoc by aligning flow distributions, reducing FID scores on CIFAR-10 and ImageNet without needing a teacher model.
Image Diffusion Preview with Consistency Solver cs.LG · 2025-12-15 · unverdicted · none · ref 24 · internal anchor
ConsistencySolver enables high-quality low-step diffusion previews by adapting general linear multistep methods into a lightweight RL-optimized solver, matching multistep DPM-Solver FID with 47% fewer steps and cutting user interaction time by nearly 50%.
PixelDiT: Pixel Diffusion Transformers for Image Generation cs.CV · 2025-11-25 · conditional · none · ref 43 · internal anchor
PixelDiT generates images in pixel space with a dual-level transformer and reaches 1.61 FID on ImageNet 256, outperforming prior pixel-space models.
NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation cs.CV · 2025-10-02 · unverdicted · none · ref 26 · internal anchor
NoiseShift learns a resolution-specific mapping from scheduler noise to conditioning noise via lightweight calibration to restore consistency and improve low-resolution generation quality in models like SD3 and Flux.
Synthesis of discrete-continuous quantum circuits with multimodal diffusion models quant-ph · 2025-06-02 · unverdicted · none · ref 53 · internal anchor
Multimodal diffusion model generates discrete gate selections and continuous parameters for quantum circuit compilation, claiming better gate counts and noise resilience than prior methods.
Sampling-Aware Quantization for Diffusion Models cs.CV · 2025-05-04 · unverdicted · none · ref 26 · internal anchor
A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success cs.RO · 2025-02-27 · accept · none · ref 29 · internal anchor
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers cs.CV · 2024-10-14 · unverdicted · none · ref 14 · internal anchor
Sana-0.6B produces high-resolution images with strong text alignment at 20x smaller size and 100x higher throughput than Flux-12B by combining 32x image compression, linear DiT blocks, and a decoder-only LLM text encoder.
VideoPhy: Evaluating Physical Commonsense for Video Generation cs.CV · 2024-06-05 · conditional · none · ref 66 · internal anchor
VideoPhy benchmark shows state-of-the-art text-to-video models follow physical commonsense and text prompts in only 39.6% of cases for the best model.
SketchDeco: Training-Free Latent Composition for Precise Sketch Colourisation cs.CV · 2024-05-29 · unverdicted · none · ref 45 · internal anchor
SketchDeco performs training-free sketch colourisation via diffusion inversion to insert user colors followed by custom self-attention blending for local fidelity and global harmony.
Improved DDIM Sampling with Moment Matching Gaussian Mixtures cs.CV · 2023-11-08 · unverdicted · none · ref 14 · internal anchor
Moment-matched GMM kernels in DDIM yield lower FID and higher IS than Gaussian kernels at small sampling steps on CelebA-HQ, FFHQ, ImageNet, and Stable Diffusion tasks.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models cs.CV · 2023-08-13 · unverdicted · none · ref 38 · internal anchor
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows cs.CV · 2023-04-19 · unverdicted · none · ref 38 · internal anchor
DiFaReli++ conditions a DDIM on shading references and inferred shadow maps to relight single-view faces with consistent shadows, trained only on 2D images and claiming SOTA on Multi-PIE.
Dynamic Video Generation: Shaping Video Generation Across Time and Space cs.CV · 2026-05-20 · unverdicted · none · ref 23 · internal anchor
DVG dynamically selects content-aware spatio-temporal acceleration strategies for diffusion-based video generation, delivering up to 7x speedup with near-lossless quality on models like HunyuanVideo.
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization cs.RO · 2026-05-17 · unverdicted · none · ref 92 · internal anchor
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
Outlier-Robust Diffusion Solvers for Inverse Problems cs.CV · 2026-05-10 · unverdicted · none · ref 39 · internal anchor
Diffusion-based inverse problem solvers are made robust to outliers by combining explicit noise estimation with a Huber-loss IRLS objective solved via conjugate gradient.
Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges cs.LG · 2026-05-03 · unverdicted · none · ref 36 · internal anchor
A structured diffusion bridge method achieves near fully-paired modality translation quality using alignment constraints even in unpaired or semi-paired regimes.
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling cs.CV · 2026-04-30 · unverdicted · none · ref 47 · internal anchor
Visual generation models are evolving from passive renderers to interactive agentic world modelers, but current systems lack spatial reasoning, temporal consistency, and causal understanding, with evaluations overemphasizing perceptual quality.
Continuous diffusion for categorical data cs.CL · 2022-11-28 · unverdicted · none · ref 57 · internal anchor
The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.
From Euler to Dormand-Prince: ODE Solvers for Flow Matching Generative Models cs.LG · 2026-04-04 · accept · none · ref 16 · internal anchor
RK4 at 80 function evaluations matches Euler at 200 in sliced Wasserstein quality for flow matching sampling, with the adaptive solver concentrating steps near t=1 due to stiffening velocity fields.
A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models cs.LG · 2026-05-21 · unreviewed · ref 12 · internal anchor
LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention cs.CV · 2026-05-06 · unreviewed · ref 200 · internal anchor

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer