pith. sign in

arxiv: 2311.16491 · v1 · pith:Y7XXJJNPnew · submitted 2023-11-25 · 💻 cs.CV

Z^*: Zero-shot Style Transfer via Attention Rearrangement

classification 💻 cs.CV
keywords stylecontentunderlineimagelatentcross-attentiondenoisingdiffusion
0
0 comments X
read the original abstract

Despite the remarkable progress in image style transfer, formulating style in the context of art is inherently subjective and challenging. In contrast to existing learning/tuning methods, this study shows that vanilla diffusion models can directly extract style information and seamlessly integrate the generative prior into the content image without retraining. Specifically, we adopt dual denoising paths to represent content/style references in latent space and then guide the content image denoising process with style latent codes. We further reveal that the cross-attention mechanism in latent diffusion models tends to blend the content and style images, resulting in stylized outputs that deviate from the original content image. To overcome this limitation, we introduce a cross-attention rearrangement strategy. Through theoretical analysis and experiments, we demonstrate the effectiveness and superiority of the diffusion-based $\underline{Z}$ero-shot $\underline{S}$tyle $\underline{T}$ransfer via $\underline{A}$ttention $\underline{R}$earrangement, Z-STAR.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving

    cs.CV 2026-06 unverdicted novelty 7.0

    Introduces the ASTAD task and training-free ASTModel framework for semantically consistent asymmetric style transfer using labeled synthetic content and unlabeled real references.