PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching in SSL Representation Domain for Speech Enhancement

· 2026 · eess.AS · arXiv 2606.17806

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Flow matching (FM) enables high-fidelity generation, while self-supervised learning (SSL) speech models provide hierarchical representations spanning acoustic and phonetic levels. However, existing FM-based speech enhancement (SE) methods operate primarily in the spectral domain, treating SSL features only as external conditions rather than modeling directly in the SSL latent space. To fully exploit the structural richness of SSL representations, we propose PhASE-Flow, an FM-based SE framework that operates entirely in the SSL space. It models the conditional distribution of clean acoustic representations given phonetic ones, reconstructing the waveform via a neural vocoder. Experiments show that PhASE-Flow outperforms state-of-the-art baselines in perceptual quality and intelligibility. Notably, it achieves competitive performance with only four sampling steps, enabling highly efficient inference. Audio demos are available at https://anonymous.4open.science/w/phase-flow_demo-E6E1/.

representative citing papers

PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching in SSL Representation Domain for Speech Enhancement

eess.AS · 2026-06-16 · unverdicted · novelty 5.0

PhASE-Flow performs phonetic-conditioned acoustic flow matching entirely in SSL representation space for speech enhancement and reports competitive perceptual quality with only four sampling steps.

citing papers explorer

Showing 1 of 1 citing paper after filters.

PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching in SSL Representation Domain for Speech Enhancement eess.AS · 2026-06-16 · unverdicted · none · ref 1 · internal anchor
PhASE-Flow performs phonetic-conditioned acoustic flow matching entirely in SSL representation space for speech enhancement and reports competitive perceptual quality with only four sampling steps.

PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching in SSL Representation Domain for Speech Enhancement

fields

years

verdicts

representative citing papers

citing papers explorer