MaTe proposes a training-free diffusion transformer that performs material transfer using only images by integrating them at the token level for unified multi-modal attention in a shared latent space.
Multi- modal attention for speech emotion recognition
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer
MaTe proposes a training-free diffusion transformer that performs material transfer using only images by integrating them at the token level for unified multi-modal attention in a shared latent space.