Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.
Your ViT is secretly a hybrid discriminative-generative diffusion model
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
citing papers explorer
-
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.
-
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.