StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.
One-step diffusion with distribution matching distillation
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 9roles
method 3polarities
use method 3representative citing papers
HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
DAR replaces residual addition in DiTs with learnable timestep-adaptive non-incremental aggregation of sublayer outputs, improving FID by 2.11 on ImageNet 256x256 and accelerating convergence by 8.75x.
Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.
Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.
Z-Image is an efficient 6B-parameter foundation model for image generation that rivals larger commercial systems in photorealism and bilingual text rendering through a new single-stream diffusion transformer and streamlined training.
Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.
citing papers explorer
-
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Z-Image is an efficient 6B-parameter foundation model for image generation that rivals larger commercial systems in photorealism and bilingual text rendering through a new single-stream diffusion transformer and streamlined training.
-
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.