DDE introduces a compact coordinator network that combines denoised outputs from pre-trained diffusion models to enable generation in larger domains and complex conditioning settings.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Categorical flow matching models scale to 1.7B parameters on 2.1T tokens, enabling 4-step text generation with competitive quality and benchmark performance.
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
citing papers explorer
-
Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models
DDE introduces a compact coordinator network that combines denoised outputs from pre-trained diffusion models to enable generation in larger domains and complex conditioning settings.
-
Scaling Categorical Flow Maps
Categorical flow matching models scale to 1.7B parameters on 2.1T tokens, enabling 4-step text generation with competitive quality and benchmark performance.
-
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
- HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos
- Diagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers
- Spherical Flows for Sampling Categorical Data