Diff-VS is an efficient audio-aware diffusion U-Net for vocal separation that matches discriminative baselines on objective metrics while achieving state-of-the-art perceptual quality via proxy measures.
Sound demixing challenge 2023 music demixing track technical report: Tfc-tdf-unet v3
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
eess.AS 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Detecting manners of articulation and adding them as knowledge features improves target speech extraction in cinematic audio with background sounds.
citing papers explorer
-
Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation
Diff-VS is an efficient audio-aware diffusion U-Net for vocal separation that matches discriminative baselines on objective metrics while achieving state-of-the-art perceptual quality via proxy measures.
-
A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)
Detecting manners of articulation and adding them as knowledge features improves target speech extraction in cinematic audio with background sounds.