SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Xuanwu Yin; Yuda Song; Zehao Sun

arxiv: 2403.16627 · v2 · pith:7FAXPII5new · submitted 2024-03-25 · 💻 cs.CV

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Yuda Song , Zehao Sun , Xuanwu Yin This is my paper

classification 💻 cs.CV

keywords modelsdiffusionimageapproacharchitecturesdistillationfasterlatency

0 comments

read the original abstract

Recent advancements in diffusion models have positioned them at the forefront of image generation. Despite their superior performance, diffusion models are not without drawbacks; they are characterized by complex architectures and substantial computational demands, resulting in significant latency due to their iterative sampling process. To mitigate these limitations, we introduce a dual approach involving model miniaturization and a reduction in sampling steps, aimed at significantly decreasing model latency. Our methodology leverages knowledge distillation to streamline the U-Net and image decoder architectures, and introduces an innovative one-step DM training technique that utilizes feature matching and score distillation. We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU, respectively. Moreover, our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
cs.CV 2026-05 unverdicted novelty 6.0

LIFT and PLACE enable stable knowledge distillation for extremely lightweight diffusion models by decomposing the task into coarse alignment followed by fine refinement with piecewise local adaptive guidance.
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
cs.CV 2026-05 unverdicted novelty 6.0

LIFT and PLACE enable stable training of extremely compressed diffusion models by breaking distillation into coarse linear alignment followed by local adaptive refinement.
LIFT and PLACE: A Simple, Stable, and Effective Knowledge Distillation Framework for Lightweight Diffusion Models
cs.CV 2026-05 unverdicted novelty 6.0

LIFT decomposes distillation into coarse linear alignment then fine refinement while PLACE adds error-based local adaptation, allowing stable training of 1.3M-parameter students (1.6% teacher size) to FID 15.73 across...
Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra
cs.LG 2026-02 conditional novelty 5.0

Systematic benchmarking of diffusion model optimizations on Apple M3 Ultra produces 22.7 FPS real-time img2img at 512x512 and demonstrates that CUDA-derived techniques do not transfer directly to Apple Silicon.