UAT presents a diffusion-centric framework coupling continuous latent diffusion for audio with masked discrete diffusion for text in a shared dual-stream backbone to enable unified generation, editing, and captioning.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
UAT: Unified Audio-Text Diffusion for Audio Generation, Editing, and Captioning
UAT presents a diffusion-centric framework coupling continuous latent diffusion for audio with masked discrete diffusion for text in a shared dual-stream backbone to enable unified generation, editing, and captioning.