Stylus achieves training-free music style transfer on Mel-spectrograms by repurposing image diffusion models via style-key injection in self-attention plus phase-preserving reconstruction, outperforming baselines by 34.1% in content preservation and 25.7% in perceptual quality per 2,925 human raters
High-resolution im- age synthesis with latent diffusion models,
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.
citing papers explorer
-
Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms
Stylus achieves training-free music style transfer on Mel-spectrograms by repurposing image diffusion models via style-key injection in self-attention plus phase-preserving reconstruction, outperforming baselines by 34.1% in content preservation and 25.7% in perceptual quality per 2,925 human raters
-
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.