L3-SE reduces linguistic hallucination in LM-based speech enhancement by distilling noise-invariant acoustic-semantic representations from noisy inputs to condition an autoregressive decoder-only language model.
Less is more: Data curation matters in scaling speech enhance- ment.arXiv preprint arXiv:2506.23859
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
Replacing early-reflected speech with time-shifted anechoic clean speech as the training target, combined with a two-stage distortion-perception framework, yields state-of-the-art universal speech enhancement.
citing papers explorer
-
Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation
L3-SE reduces linguistic hallucination in LM-based speech enhancement by distilling noise-invariant acoustic-semantic representations from noisy inputs to condition an autoregressive decoder-only language model.
-
Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement
Replacing early-reflected speech with time-shifted anechoic clean speech as the training target, combined with a two-stage distortion-perception framework, yields state-of-the-art universal speech enhancement.
- FSD50K-Solo: Automated Curation of Single-Source Sound Events