SIPS decomposes stochastic interpolant dynamics into predictive drift and generative denoising to combine arbitrary pretrained predictors with a degradation-agnostic clean-speech prior for better speech enhancement and separation.
Do we need EMA for diffusion-based speech enhancement? Toward a magnitude-preserving network architecture
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Predictive-Generative Drift Decomposition for Speech Enhancement and Separation
SIPS decomposes stochastic interpolant dynamics into predictive drift and generative denoising to combine arbitrary pretrained predictors with a degradation-agnostic clean-speech prior for better speech enhancement and separation.