RePack projects VFM features to a low-dimensional manifold for efficient DiT training, followed by a Latent-Guided Refiner that improves FID to 1.65 on ImageNet-1K after 64 epochs.
This confirms that our projection-based compression effectively retains core semantic structures that are often lost in standard reconstruction-based V AE training
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model
RePack projects VFM features to a low-dimensional manifold for efficient DiT training, followed by a Latent-Guided Refiner that improves FID to 1.65 on ImageNet-1K after 64 epochs.