DDE introduces a compact coordinator network that combines denoised outputs from pre-trained diffusion models to enable generation in larger domains and complex conditioning settings.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
6
Pith papers citing it
representative citing papers
Categorical flow matching models scale to 1.7B parameters on 2.1T tokens, enabling 4-step text generation with competitive quality and benchmark performance.
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.