Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.
Cascaded diffusion models for high fidelity image generation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
citing papers explorer
-
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.
-
Temporal Aware Pruning for Efficient Diffusion-based Video Generation
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.