Improved distribution matching distillation for fast image synthesis

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation

cs.CV · 2026-05-15 · unverdicted · novelty 7.0

Echo-Forcing decouples stable anchors, compressed history, and recent dynamics in video diffusion KV caches using hierarchical memory, scene recall frames, and difference-aware decay to support interactive long video generation under bounded cache.

Efficient Image Synthesis with Sphere Latent Encoder

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Decouples Sphere Encoder into fixed pretrained encoder and spherical latent denoiser, yielding higher quality and faster inference than the joint original on Animal-Faces, Oxford-Flowers and ImageNet-1K.

Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Delta Forcing improves temporal coherence in interactive autoregressive video generation by estimating transition consistency from teacher-generator latent deltas and balancing it against a monotonic continuity objective.

Gradient-Free Noise Optimization for Reward Alignment in Generative Models

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

ZeNO frames noise optimization as a path-integral control problem solvable from zeroth-order reward evaluations, connecting to implicit Langevin dynamics for reward-tilted distributions.

Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Distilled one-step consistency model from optimal-transport flow-matching teacher reconstructs high-fidelity dynamical system flows from low-fidelity data with 12x speedup, half the parameters, and 23.1% better SSIM than scratch-trained baselines.

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

Lens is a 3.8B-parameter text-to-image model that reaches competitive or superior performance to >6B-parameter systems using 19.3% of the training compute of Z-Image through a densely captioned 800M dataset, multi-resolution batching, semantic VAE, strong language encoder, RL fine-tuning, and 4-step

citing papers explorer

Showing 6 of 6 citing papers.

Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation cs.CV · 2026-05-15 · unverdicted · none · ref 34
Echo-Forcing decouples stable anchors, compressed history, and recent dynamics in video diffusion KV caches using hierarchical memory, scene recall frames, and difference-aware decay to support interactive long video generation under bounded cache.
Efficient Image Synthesis with Sphere Latent Encoder cs.CV · 2026-05-15 · unverdicted · none · ref 37
Decouples Sphere Encoder into fixed pretrained encoder and spherical latent denoiser, yielding higher quality and faster inference than the joint original on Animal-Faces, Oxford-Flowers and ImageNet-1K.
Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation cs.CV · 2026-05-14 · unverdicted · none · ref 14
Delta Forcing improves temporal coherence in interactive autoregressive video generation by estimating transition consistency from teacher-generator latent deltas and balancing it against a monotonic continuity objective.
Gradient-Free Noise Optimization for Reward Alignment in Generative Models cs.LG · 2026-05-12 · unverdicted · none · ref 34 · 2 links
ZeNO frames noise optimization as a path-integral control problem solvable from zeroth-order reward evaluations, connecting to implicit Langevin dynamics for reward-tilted distributions.
Physical Fidelity Reconstruction via Improved Consistency-Distilled Flow Matching for Dynamical Systems cs.LG · 2026-05-07 · unverdicted · none · ref 33
Distilled one-step consistency model from optimal-transport flow-matching teacher reconstructs high-fidelity dynamical system flows from low-fidelity data with 12x speedup, half the parameters, and 23.1% better SSIM than scratch-trained baselines.
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models cs.CV · 2026-05-20 · unverdicted · none · ref 69
Lens is a 3.8B-parameter text-to-image model that reaches competitive or superior performance to >6B-parameter systems using 19.3% of the training compute of Z-Image through a densely captioned 800M dataset, multi-resolution batching, semantic VAE, strong language encoder, RL fine-tuning, and 4-step

Improved distribution matching distillation for fast image synthesis

fields

years

verdicts

representative citing papers

citing papers explorer