Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

Colin Vaz; Kunal Dhawan; Ruchir Travadi; Shrikanth Narayanan

arxiv: 1907.06859 · v1 · pith:DOUA4YHAnew · submitted 2019-07-16 · 📡 eess.AS · cs.SD

Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

Kunal Dhawan , Colin Vaz , Ruchir Travadi , Shrikanth Narayanan This is my paper

Pith reviewed 2026-05-24 20:51 UTC · model grok-4.3

classification 📡 eess.AS cs.SD

keywords noise-robust acoustic featuresNMF dictionary adaptationtotal variability modelingspeech recognitionunseen noiseutterance-specific transformacoustic feature extraction

0 comments

The pith

Total variability modeling adapts NMF dictionaries per utterance to produce noise-robust acoustic features without any parallel clean-noisy training pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that learns a total variability subspace from NMF representations and uses it to create an utterance-specific transform for adapting dictionaries. This produces acoustic features that remain robust to the noise present in each individual utterance. The approach sidesteps the common requirement for paired clean and noisy speech data during training. On the Aurora 4 plus DEMAND noise corpus the resulting features match baseline performance overall and stay closest to clean-speech word error rates when the noise is unseen.

Core claim

A total variability subspace learned without parallel clean-noisy pairs can be combined with NMF to generate utterance-specific dictionary adaptations that yield acoustic features whose word error rates on noisy test data remain comparable to standard baselines and, on unseen noises, closest to the clean-speech baseline.

What carries the argument

Total variability subspace that produces an utterance-specific transform for adapting NMF dictionaries

If this is right

Noise-robust features become feasible without collecting or aligning clean-noisy parallel corpora.
Each utterance receives its own adaptation rather than a single global model.
Performance on unseen noise conditions improves relative to fixed dictionary approaches.
The same pipeline remains competitive with convolutive NMF features on seen noise conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could lower the data-collection burden for building robust speech recognizers in new acoustic environments.
Utterance-level adaptation might transfer to other signal-processing tasks where per-example dictionary or basis adjustment is useful.
Because the transform is computed from the test utterance itself, the approach may suit streaming or low-latency applications once the subspace is fixed.

Load-bearing premise

A subspace learned without paired clean-noisy examples can still generate transforms that meaningfully reduce the effect of noise on the extracted features.

What would settle it

If, on the Aurora 4 + DEMAND corpus with held-out noise types, the proposed features produce word error rates farther from the clean-speech rate than the CNMF or other baseline features.

Figures

Figures reproduced from arXiv: 1907.06859 by Colin Vaz, Kunal Dhawan, Ruchir Travadi, Shrikanth Narayanan.

**Figure 1.** Figure 1: Visualizing the dictionary W and activation matrix H after running NMF on a speech signal V. one can think of the dictionary as containing k components that are added together by the activation matrix to approximate the input matrix. In the case of speech, the input matrix is typically the magnitude spectrogram, and the dictionary contains spectral “building blocks” required to reconstruct the spectrogram.… view at source ↗

read the original abstract

We propose an algorithm to extract noise-robust acoustic features from noisy speech. We use Total Variability Modeling in combination with Non-negative Matrix Factorization (NMF) to learn a total variability subspace and adapt NMF dictionaries for each utterance. Unlike several other approaches for extracting noise-robust features, our algorithm does not require a training corpus of parallel clean and noisy speech. Furthermore, the proposed features are produced by an utterance-specific transform, allowing the features to be robust to the noise occurring in each utterance. Preliminary results on the Aurora 4 + DEMAND noise corpus show that our proposed features perform comparably to baseline acoustic features, including features calculated from a convolutive NMF (CNMF) model. Moreover, on unseen noises, our proposed features gives the most similar word error rate to clean speech compared to the baseline features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main contribution is an utterance-specific NMF adaptation method that uses total variability modeling to avoid parallel clean-noisy training data, but the reported gains remain preliminary and narrowly scoped.

read the letter

The core idea here is to learn a total variability subspace from NMF dictionaries and then adapt those dictionaries per utterance for noise-robust features. This avoids the parallel-data requirement that many other noise-robust methods need, which is a practical plus for real-world use. The approach produces features via an utterance-specific transform, and the abstract positions it as new relative to prior NMF and i-vector style work in the citations. On the Aurora 4 plus DEMAND setup the features match baseline performance and come closest to clean-speech WER on unseen noises, which directly tests the robustness claim. That is the part that holds up from the given evidence. The results are labeled preliminary with no error bars, no statistical tests, and limited detail on how the subspace is estimated or how the adaptation transform is applied. Without those steps shown explicitly it is hard to judge whether the subspace actually captures the right variability or if the gains could be replicated. The work stays within acoustic feature extraction and does not claim broader architectural changes. It is aimed at researchers already working on NMF-based or variability-modeling approaches to noisy ASR who need a method that does not rely on matched clean-noisy pairs. The central modeling step is coherent on its own terms and the empirical test matches the goal, so the paper is worth sending to referees even if revisions will be needed to fill in the missing equations and controls.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an algorithm combining Total Variability Modeling with Non-negative Matrix Factorization (NMF) to learn a total variability subspace and produce utterance-specific NMF dictionary adaptations for extracting noise-robust acoustic features from noisy speech. Unlike prior methods, it requires no parallel clean-noisy training corpus. Preliminary experiments on the Aurora 4 corpus mixed with DEMAND noises report that the resulting features yield word error rates (WER) comparable to standard and convolutive-NMF baselines, and the closest match to clean-speech WER when tested on unseen noises.

Significance. If the central modeling step can be shown to work, the result would be useful because it removes the parallel-data requirement that limits many noise-robust feature extractors and replaces it with an utterance-specific transform that can in principle track noise variation within a single recording. The reported metric (WER proximity to clean) directly tests the intended robustness outcome.

major comments (2)

[Abstract] Abstract: the central claim that a total-variability subspace learned without parallel clean-noisy pairs can produce a useful utterance-specific transform is stated but never accompanied by the estimation procedure, the adaptation equations, or any derivation showing how the subspace is applied to an NMF dictionary. This information is load-bearing for both the “no parallel data” advantage and the reported WER results.
[Abstract] Abstract: results are labeled “preliminary” with no mention of the number of utterances, cross-validation folds, statistical significance tests, or error bars on the WER figures. Without these, it is impossible to evaluate whether the claim that the proposed features give “the most similar word error rate to clean speech” on unseen noises is reliable.

minor comments (1)

[Abstract] The sentence “our proposed features gives the most similar word error rate” contains a subject-verb agreement error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that a total-variability subspace learned without parallel clean-noisy pairs can produce a useful utterance-specific transform is stated but never accompanied by the estimation procedure, the adaptation equations, or any derivation showing how the subspace is applied to an NMF dictionary. This information is load-bearing for both the “no parallel data” advantage and the reported WER results.

Authors: The abstract is a concise summary. The estimation procedure for learning the total variability subspace from non-parallel noisy data, the adaptation equations, and the full derivation of how the subspace produces an utterance-specific NMF dictionary transform are presented in detail in Sections 2 and 3 of the manuscript. These sections explicitly show the no-parallel-data training path and how the adapted dictionaries yield the reported features. revision: no
Referee: [Abstract] Abstract: results are labeled “preliminary” with no mention of the number of utterances, cross-validation folds, statistical significance tests, or error bars on the WER figures. Without these, it is impossible to evaluate whether the claim that the proposed features give “the most similar word error rate to clean speech” on unseen noises is reliable.

Authors: We agree that the abstract would benefit from additional experimental context. The manuscript body reports results on the standard Aurora 4 training and test sets (with the specific number of utterances and the DEMAND noise mixing procedure) using conventional train/test partitions. We will revise the abstract to reference the corpus scale and note that further statistical analysis (including error bars) can be included in a revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided abstract and description contain no equations, fitted parameters, or derivation steps that reduce to their own inputs by construction. The central claim is an empirical observation on Aurora 4 + DEMAND (comparable WER to baselines, closest to clean on unseen noise) obtained from an utterance-specific transform learned via total variability modeling on NMF dictionaries. No self-definitional loop, fitted-input-as-prediction, or load-bearing self-citation chain is exhibited; the method is explicitly positioned as avoiding parallel clean-noisy data, and the reported metric directly tests the modeling goal without internal reduction to the input assumptions. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method implicitly assumes that total variability subspaces exist and can be estimated from noisy data alone.

pith-pipeline@v0.9.0 · 5685 in / 1023 out tokens · 18398 ms · 2026-05-24T20:51:45.287753+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

[1]

Speech offers a natural and efﬁcient way to inter- act with these devices

Introduction Automatic speech recognition (ASR) systems are being increas- ingly deployed on a wide range of devices for a wide range of applications. Speech offers a natural and efﬁcient way to inter- act with these devices. Furthermore, speech contains paralin- guistic content that devices can use to modify their outputs or behavior. For example, ASR sy...

work page
[2]

The training set does not require parallel clean and noisy utterances, and

work page
[3]

In the following sections, we provide a brief overview of NMF and total variability modeling, followed by our proposed noise- robust acoustic feature algorithm

The dictionary can be adapted for each utterance at test time, allowing for better modeling of the acoustic condi- tions in each utterance. In the following sections, we provide a brief overview of NMF and total variability modeling, followed by our proposed noise- robust acoustic feature algorithm. Section 4 describes our ex- periments and offers insight...

work page
[4]

Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

Background 2.1. Non-negative Matrix Factorization NMF decomposes a non-negative matrix V ∈ Rd×t + into the product of a non-negative dictionary W ∈ Rd×k + and non- negative activation matrix H ∈ Rk×t + . Because of the non- negative constraint, the decomposition is purely additive, and arXiv:1907.06859v1 [eess.AS] 16 Jul 2019 Figure 1: Visualizing the dic...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[5]

The idea is for the dictionary to capture as much of the noise in the spectrogram as possible so that the activation matrix is not af- fected by noise

Algorithm In this section, we describe an algorithm that uses TVM to adapt an NMF dictionary to the noise in an input spectrogram. The idea is for the dictionary to capture as much of the noise in the spectrogram as possible so that the activation matrix is not af- fected by noise. We will use the activation matrix as acoustic features for ASR on noisy sp...

work page
[6]

dliv- ing

Experiments and Results We investigated the performance of our algorithm on the clean speech in the Aurora 4 corpus [20] with added noise from the DEMAND dataset [21]. The training set consists of 7138 ut- terances from the Aurora 4 training set corrupted by one of six different noises (labeled in the DEMAND dataset as “dliv- ing”, “npark”, “omeeting”, “p...

work page
[7]

The algorithm uses Total Vari- ability Modeling to learn a total variability subspace and adapt a UBM NMF dictionary for each utterance at test time

Conclusion We proposed an algorithm to calculate noise-robust acoustic features from noisy utterances. The algorithm uses Total Vari- ability Modeling to learn a total variability subspace and adapt a UBM NMF dictionary for each utterance at test time. We use the NMF activation matrix corresponding to the adapted dictio- nary as the acoustic features. Thu...

work page
[8]

Anger de- tection in call center dialogues,

D. Pappas, I. Androutsopoulos, and H. Papageorgiou, “Anger de- tection in call center dialogues,” in IEEE Int. Conf. Cognitive In- focommunications, Gy¨or, Hungary, 2015, pp. 139–144

work page 2015
[9]

Evaluation of a noise- robust dsr front-end on aurora databases,

D. Macho, L. Mauury, B. No ´e, Y . M. Cheng, D. Ealey, D. Jouvet, H. Kelleher, D. Pearce, and F. Saadoun, “Evaluation of a noise- robust dsr front-end on aurora databases,” in Proc. Int. Conf. Spo- ken Lang. Process., 2002, pp. 17–20

work page 2002
[10]

Noise model transfer: novel ap- proach to robustness against nonstationary noise,

T. Yoshioka and T. Nakatani, “Noise model transfer: novel ap- proach to robustness against nonstationary noise,” IEEE Trans. Acoustics, Speech, and Lang. Process. , vol. 21, no. 10, pp. 2182– 2192, Oct. 2013

work page 2013
[11]

Evaluation of the splice al- gorithm on the aurora2 database,

J. Droppo, A. Acero, and L. Deng, “Evaluation of the splice al- gorithm on the aurora2 database,” in Proc. Eurospeech, 2001, pp. 217–220

work page 2001
[12]

Noise adaptive training for robust automatic speech recognition,

O. Kalinli, M. L. Seltzer, J. Droppo, and A. Acero, “Noise adaptive training for robust automatic speech recognition,” IEEE Trans. Acoustics, Speech, and Lang. Process. , vol. 18, no. 8, pp. 1889–1901, Nov. 2010

work page 1901
[13]

Speaker and noise factorization for robust speech recognition,

Y . Wang and M. J. F. Gales, “Speaker and noise factorization for robust speech recognition,” IEEE Trans. Acoustics, Speech, and Lang. Process., vol. 20, no. 7, pp. 2149–2158, Sep. 2012

work page 2012
[14]

Suppression of acoustic noise in speech using spectral subtraction,

S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoustics, Speech, and Signal Process., vol. 20, no. 2, pp. 113–120, Apr. 1979

work page 1979
[15]

Positive matrix factorization: A non- negative factor model with optimal utilization of error estimates of data values,

P. Paatero and U. Tapper, “Positive matrix factorization: A non- negative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994

work page 1994
[16]

Algorithms for non-negative matrix factorization,

D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Adv. in Neu. Info. Proc. Sys. 13 , 2001, pp. 556– 562

work page 2001
[17]

An investigation of deep neu- ral networks for noise robust speech recognition,

M. L. Seltzer, D. Yu, and Y . Wang, “An investigation of deep neu- ral networks for noise robust speech recognition,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process., 2013, pp. 7398– 7402

work page 2013
[18]

Investigation of speech separation as a front-end for noise robust speech recognition,

A. Narayanan and D. L. Wang, “Investigation of speech separation as a front-end for noise robust speech recognition,” IEEE/ACM Trans. Audio, Speech, and Lang. Process., vol. 22, no. 4, pp. 826– 835, 2014

work page 2014
[19]

A vector taylor series ap- proach for environment-independent speech recognition,

P. J. Moreno, B. Raj, and R. M. Stern, “A vector taylor series ap- proach for environment-independent speech recognition,” inIEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 1996, pp. 733–736 vol. 2

work page 1996
[20]

High- performance robust speech recognition using stereo training data,

L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, “High- performance robust speech recognition using stereo training data,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 2001, pp. 301–304

work page 2001
[21]

Power-normalized cepstral coefﬁcients (pncc) for robust speech recognition,

C. Kim and R. M. Stern, “Power-normalized cepstral coefﬁcients (pncc) for robust speech recognition,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 2012, pp. 4101–4104

work page 2012
[22]

Cnmf- based acoustic features for noise-robust asr,

C. Vaz, D. Dimitriadis, S. Thomas, and S. Narayanan, “Cnmf- based acoustic features for noise-robust asr,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 2016, pp. 5735– 5739

work page 2016
[23]

Front-end factor analysis for speaker veriﬁcation,

N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker veriﬁcation,” IEEE Trans. Audio, Speech, and Lang. Process. , vol. 19, no. 4, pp. 788–798, 2010

work page 2010
[24]

On the use of the beta divergence for musical source separation,

D. FitzGerald, M. Cranitch, and E. Coyle, “On the use of the beta divergence for musical source separation,” in IET Irish Signals and Systems Conf., 2009

work page 2009
[25]

Sparse nmf-half-baked or well done?

J. Le Roux, F. Weninger, and J. Hershey, “Sparse nmf-half-baked or well done?” Mitsubishi Elect. Res. Lab. Cambridge, MA, USA, Tech. Rep. TR- 2015-023, 2015

work page 2015
[26]

Non-negative matrix factorization with sparseness constraints,

P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,” J. Machine Learning Research , vol. 5, pp. 1457– 1469, 2004

work page 2004
[27]

Analysis of the Aurora large vocabulary evaluations,

N. Parihar and J. Picone, “Analysis of the Aurora large vocabulary evaluations,” in Proc. Eurospeech, 2003, pp. 337–340

work page 2003
[28]

The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings,

J. Thiemann, N. Ito, and E. Vincent, “The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings,” Proc. Meetings on Acoustics, vol. 19, no. 1, 2013

work page 2013
[29]

Lib- rispeech: An ASR corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Lib- rispeech: An ASR corpus based on public domain audio books,” in Int. Conf. Acoustics, Speech, Signal Process. , 2015

work page 2015

[1] [1]

Speech offers a natural and efﬁcient way to inter- act with these devices

Introduction Automatic speech recognition (ASR) systems are being increas- ingly deployed on a wide range of devices for a wide range of applications. Speech offers a natural and efﬁcient way to inter- act with these devices. Furthermore, speech contains paralin- guistic content that devices can use to modify their outputs or behavior. For example, ASR sy...

work page

[2] [2]

The training set does not require parallel clean and noisy utterances, and

work page

[3] [3]

In the following sections, we provide a brief overview of NMF and total variability modeling, followed by our proposed noise- robust acoustic feature algorithm

The dictionary can be adapted for each utterance at test time, allowing for better modeling of the acoustic condi- tions in each utterance. In the following sections, we provide a brief overview of NMF and total variability modeling, followed by our proposed noise- robust acoustic feature algorithm. Section 4 describes our ex- periments and offers insight...

work page

[4] [4]

Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

Background 2.1. Non-negative Matrix Factorization NMF decomposes a non-negative matrix V ∈ Rd×t + into the product of a non-negative dictionary W ∈ Rd×k + and non- negative activation matrix H ∈ Rk×t + . Because of the non- negative constraint, the decomposition is purely additive, and arXiv:1907.06859v1 [eess.AS] 16 Jul 2019 Figure 1: Visualizing the dic...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[5] [5]

The idea is for the dictionary to capture as much of the noise in the spectrogram as possible so that the activation matrix is not af- fected by noise

Algorithm In this section, we describe an algorithm that uses TVM to adapt an NMF dictionary to the noise in an input spectrogram. The idea is for the dictionary to capture as much of the noise in the spectrogram as possible so that the activation matrix is not af- fected by noise. We will use the activation matrix as acoustic features for ASR on noisy sp...

work page

[6] [6]

dliv- ing

Experiments and Results We investigated the performance of our algorithm on the clean speech in the Aurora 4 corpus [20] with added noise from the DEMAND dataset [21]. The training set consists of 7138 ut- terances from the Aurora 4 training set corrupted by one of six different noises (labeled in the DEMAND dataset as “dliv- ing”, “npark”, “omeeting”, “p...

work page

[7] [7]

The algorithm uses Total Vari- ability Modeling to learn a total variability subspace and adapt a UBM NMF dictionary for each utterance at test time

Conclusion We proposed an algorithm to calculate noise-robust acoustic features from noisy utterances. The algorithm uses Total Vari- ability Modeling to learn a total variability subspace and adapt a UBM NMF dictionary for each utterance at test time. We use the NMF activation matrix corresponding to the adapted dictio- nary as the acoustic features. Thu...

work page

[8] [8]

Anger de- tection in call center dialogues,

D. Pappas, I. Androutsopoulos, and H. Papageorgiou, “Anger de- tection in call center dialogues,” in IEEE Int. Conf. Cognitive In- focommunications, Gy¨or, Hungary, 2015, pp. 139–144

work page 2015

[9] [9]

Evaluation of a noise- robust dsr front-end on aurora databases,

D. Macho, L. Mauury, B. No ´e, Y . M. Cheng, D. Ealey, D. Jouvet, H. Kelleher, D. Pearce, and F. Saadoun, “Evaluation of a noise- robust dsr front-end on aurora databases,” in Proc. Int. Conf. Spo- ken Lang. Process., 2002, pp. 17–20

work page 2002

[10] [10]

Noise model transfer: novel ap- proach to robustness against nonstationary noise,

T. Yoshioka and T. Nakatani, “Noise model transfer: novel ap- proach to robustness against nonstationary noise,” IEEE Trans. Acoustics, Speech, and Lang. Process. , vol. 21, no. 10, pp. 2182– 2192, Oct. 2013

work page 2013

[11] [11]

Evaluation of the splice al- gorithm on the aurora2 database,

J. Droppo, A. Acero, and L. Deng, “Evaluation of the splice al- gorithm on the aurora2 database,” in Proc. Eurospeech, 2001, pp. 217–220

work page 2001

[12] [12]

Noise adaptive training for robust automatic speech recognition,

O. Kalinli, M. L. Seltzer, J. Droppo, and A. Acero, “Noise adaptive training for robust automatic speech recognition,” IEEE Trans. Acoustics, Speech, and Lang. Process. , vol. 18, no. 8, pp. 1889–1901, Nov. 2010

work page 1901

[13] [13]

Speaker and noise factorization for robust speech recognition,

Y . Wang and M. J. F. Gales, “Speaker and noise factorization for robust speech recognition,” IEEE Trans. Acoustics, Speech, and Lang. Process., vol. 20, no. 7, pp. 2149–2158, Sep. 2012

work page 2012

[14] [14]

Suppression of acoustic noise in speech using spectral subtraction,

S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoustics, Speech, and Signal Process., vol. 20, no. 2, pp. 113–120, Apr. 1979

work page 1979

[15] [15]

Positive matrix factorization: A non- negative factor model with optimal utilization of error estimates of data values,

P. Paatero and U. Tapper, “Positive matrix factorization: A non- negative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994

work page 1994

[16] [16]

Algorithms for non-negative matrix factorization,

D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” in Adv. in Neu. Info. Proc. Sys. 13 , 2001, pp. 556– 562

work page 2001

[17] [17]

An investigation of deep neu- ral networks for noise robust speech recognition,

M. L. Seltzer, D. Yu, and Y . Wang, “An investigation of deep neu- ral networks for noise robust speech recognition,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process., 2013, pp. 7398– 7402

work page 2013

[18] [18]

Investigation of speech separation as a front-end for noise robust speech recognition,

A. Narayanan and D. L. Wang, “Investigation of speech separation as a front-end for noise robust speech recognition,” IEEE/ACM Trans. Audio, Speech, and Lang. Process., vol. 22, no. 4, pp. 826– 835, 2014

work page 2014

[19] [19]

A vector taylor series ap- proach for environment-independent speech recognition,

P. J. Moreno, B. Raj, and R. M. Stern, “A vector taylor series ap- proach for environment-independent speech recognition,” inIEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 1996, pp. 733–736 vol. 2

work page 1996

[20] [20]

High- performance robust speech recognition using stereo training data,

L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, “High- performance robust speech recognition using stereo training data,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 2001, pp. 301–304

work page 2001

[21] [21]

Power-normalized cepstral coefﬁcients (pncc) for robust speech recognition,

C. Kim and R. M. Stern, “Power-normalized cepstral coefﬁcients (pncc) for robust speech recognition,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 2012, pp. 4101–4104

work page 2012

[22] [22]

Cnmf- based acoustic features for noise-robust asr,

C. Vaz, D. Dimitriadis, S. Thomas, and S. Narayanan, “Cnmf- based acoustic features for noise-robust asr,” in IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Process. , 2016, pp. 5735– 5739

work page 2016

[23] [23]

Front-end factor analysis for speaker veriﬁcation,

N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker veriﬁcation,” IEEE Trans. Audio, Speech, and Lang. Process. , vol. 19, no. 4, pp. 788–798, 2010

work page 2010

[24] [24]

On the use of the beta divergence for musical source separation,

D. FitzGerald, M. Cranitch, and E. Coyle, “On the use of the beta divergence for musical source separation,” in IET Irish Signals and Systems Conf., 2009

work page 2009

[25] [25]

Sparse nmf-half-baked or well done?

J. Le Roux, F. Weninger, and J. Hershey, “Sparse nmf-half-baked or well done?” Mitsubishi Elect. Res. Lab. Cambridge, MA, USA, Tech. Rep. TR- 2015-023, 2015

work page 2015

[26] [26]

Non-negative matrix factorization with sparseness constraints,

P. O. Hoyer, “Non-negative matrix factorization with sparseness constraints,” J. Machine Learning Research , vol. 5, pp. 1457– 1469, 2004

work page 2004

[27] [27]

Analysis of the Aurora large vocabulary evaluations,

N. Parihar and J. Picone, “Analysis of the Aurora large vocabulary evaluations,” in Proc. Eurospeech, 2003, pp. 337–340

work page 2003

[28] [28]

The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings,

J. Thiemann, N. Ito, and E. Vincent, “The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings,” Proc. Meetings on Acoustics, vol. 19, no. 1, 2013

work page 2013

[29] [29]

Lib- rispeech: An ASR corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Lib- rispeech: An ASR corpus based on public domain audio books,” in Int. Conf. Acoustics, Speech, Signal Process. , 2015

work page 2015