pith. machine review for the scientific record. sign in

arxiv: 1907.01160 · v1 · submitted 2019-07-02 · 💻 cs.SD · cs.CL· cs.LG· eess.AS· stat.ML

Recognition: unknown

WHAM!: Extending Speech Separation to Noisy Environments

Authors on Pith no claims yet
classification 💻 cs.SD cs.CLcs.LGeess.ASstat.ML
keywords noiseseparationspeechambientareadatasetmixturesnoisy
0
0 comments X
read the original abstract

Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost completely, at artificially low sampling rates, and with no external background noise. In this paper, we strive to move the field towards more realistic and challenging scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!) dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined with real ambient noise samples. The samples were collected in coffee shops, restaurants, and bars in the San Francisco Bay Area, and are made publicly available. We benchmark various speech separation architectures and objective functions to evaluate their robustness to noise. While separation performance decreases as a result of noise, we still observe substantial gains relative to the noisy signals for most approaches.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

    cs.CV 2026-05 unverdicted novelty 7.0

    SpurAudio benchmark shows state-of-the-art few-shot audio classifiers suffer large performance drops when background correlations are disrupted, even in large pretrained models.