VCF aligns CLIP image features to text embeddings via a lightweight aligner to enable dual image-text conditioning in Stable Diffusion at inference without concept-specific training.
Training-free style and content transfer by leveraging u-net skip connections in stable diffusion 2.arXiv preprint arXiv:2501.14524, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Injecting Image Guidance into Text-Conditioned Diffusion Models at Inference
VCF aligns CLIP image features to text embeddings via a lightweight aligner to enable dual image-text conditioning in Stable Diffusion at inference without concept-specific training.