RePack projects VFM features to a low-dimensional manifold for efficient DiT training, followed by a Latent-Guided Refiner that improves FID to 1.65 on ImageNet-1K after 64 epochs.
IV (Decoupling):We follow a frequency-decoupled design
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model
RePack projects VFM features to a low-dimensional manifold for efficient DiT training, followed by a Latent-Guided Refiner that improves FID to 1.65 on ImageNet-1K after 64 epochs.