Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

John H.L. Hansen; Nursadul Mamun; Soheil Khorram

arxiv: 1907.02526 · v1 · pith:E7HWBJZNnew · submitted 2019-07-03 · 💻 cs.SD · cs.LG· eess.AS

Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

Nursadul Mamun , Soheil Khorram , John H.L. Hansen This is my paper

Pith reviewed 2026-05-25 09:12 UTC · model grok-4.3

classification 💻 cs.SD cs.LGeess.AS

keywords speech enhancementcochlear implantconvolutional neural networkWiener filterenvelope coefficient measurecausal networkfilter-bank features

0 comments

The pith

Convolutional neural networks in cochlear filter-bank features improve speech enhancement for cochlear implant users.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes performing speech enhancement directly in a cochlear filter-bank feature space tailored to CI auditory stimuli, using convolutional neural networks to separate stationary and non-stationary noise from speech. Three architectures are introduced: a vanilla CNN that directly outputs the enhanced signal, an SS-CNN that predicts and subtracts noise, and a Wiener-CNN that estimates an optimal suppression mask; causal versions of each are also developed to enable real-time use. Experiments on these networks show significant gains over baseline systems, with the causal Wiener-CNN producing the highest envelope coefficient measure scores. This positions the method as a practical preprocessor option for CI devices in noisy settings.

Core claim

By operating convolutional neural networks in a cochlear filter-bank feature space, the proposed vanilla, spectral-subtraction-style, and Wiener-style networks (both causal and non-causal) achieve significant improvement over existing baseline systems for speech enhancement in cochlear implant recipients, with the causal Wiener-CNN delivering the best overall envelope coefficient measure.

What carries the argument

Wiener-style CNN that generates an optimal mask for noise suppression within the cochlear filter-bank feature space.

If this is right

The proposed networks achieve significant improvement over existing baseline systems.
Causal Wiener-CNN outperforms other networks.
Causal Wiener-CNN leads to the best overall envelope coefficient measure.
The algorithms represent a viable option for implementation on the CCi-MOBILE research platform as a pre-processor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Direct comparison of ECM scores against actual word recognition rates from CI users would test whether the reported metric gains reliably predict perceptual benefit.
Extending the same filter-bank CNN approach to other auditory prostheses or different noise environments could reveal broader applicability.
Embedding the causal Wiener-CNN into existing CI signal-processing pipelines would allow direct measurement of end-to-end latency and power impact.

Load-bearing premise

Gains measured on the envelope coefficient measure in the reported test conditions will correspond to improved speech intelligibility for cochlear implant users under varied real-world noise conditions.

What would settle it

A listening test with actual cochlear implant users in naturalistic noisy environments that finds no intelligibility improvement despite higher envelope coefficient measure scores.

Figures

Figures reproduced from arXiv: 1907.02526 by John H.L. Hansen, Nursadul Mamun, Soheil Khorram.

**Figure 1.** Figure 1: Cochlear implant electrode stimulation response shown as an electrodogram. portant auditory features employed with a CIS-Continuous Interleaved Sampling strategy. (6) Finally, biphasic pulses are generated from the selected features and sent to the UTDallas CCi-MOBILE research interface board through electrical stimulations [12]. These electrical stimulations can be visualized using electrodograms. An el… view at source ↗

**Figure 2.** Figure 2: (a) Block diagram of the standard CNN used in this paper. (b) Block diagram of the causal convolutional network (Causal CNN) that leverages causal convolutional kernels in each layer. The causal kernels consider only previous samples of the signals. (c) Various SE systems proposed in this paper; we incorporate both CNN and causal CNN in three different network architectures: Vanila, spectral-subtraction-st… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: Mean speech Intelligibility score based on the ECM measure as a function of SNR for proposed Non-causal SE algorithms. Noise environments: (a) Car 1: Mitsubishi Galant (2002) (b) Car 2: Nissan-Sentra (2008). development, and test sets. Train set includes 3150 utterance and is used to train our CNNs. Development set contains 1575 utterances and is used to tune the network hyper-parameters. Test set contain… view at source ↗

read the original abstract

Attempts to develop speech enhancement algorithms with improved speech intelligibility for cochlear implant (CI) users have met with limited success. To improve speech enhancement methods for CI users, we propose to perform speech enhancement in a cochlear filter-bank feature space, a feature-set specifically designed for CI users based on CI auditory stimuli. We leverage a convolutional neural network (CNN) to extract both stationary and non-stationary components of environmental acoustics and speech. We propose three CNN architectures: (1) vanilla CNN that directly generates the enhanced signal; (2) spectral-subtraction-style CNN (SS-CNN) that first predicts noise and then generates the enhanced signal by subtracting noise from the noisy signal; (3) Wiener-style CNN (Wiener-CNN) that generates an optimal mask for suppressing noise. An important problem of the proposed networks is that they introduce considerable delays, which limits their real-time application for CI users. To address this, this study also considers causal variations of these networks. Our experiments show that the proposed networks (both causal and non-causal forms) achieve significant improvement over existing baseline systems. We also found that causal Wiener-CNN outperforms other networks, and leads to the best overall envelope coefficient measure (ECM). The proposed algorithms represent a viable option for implementation on the CCi-MOBILE research platform as a pre-processor for CI users in naturalistic environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CNNs on cochlear filter banks improve ECM scores with causal versions included, but the experiments are described too thinly to judge real impact on CI users.

read the letter

The paper ports standard CNN speech enhancement into the cochlear filter-bank domain that implants actually use and adds causal versions to support real-time use. That combination is the main thing on offer. They define three variants—direct mapping, spectral-subtraction style, and Wiener-mask style—and report that the causal Wiener-CNN comes out ahead on envelope coefficient measure (ECM). The motivation for CI recipients in noise is stated clearly and the architectures are straightforward to follow from the abstract. Credit for keeping the causal constraint explicit and still claiming gains. The experimental section is the weak point. The abstract says the networks achieve significant improvement and names ECM as the metric, yet supplies no dataset sizes, no description of the baselines, no statistical tests, and no cross-validation details. Without those numbers it is impossible to tell how large or reliable the gains actually are. The stress-test note is also on target: ECM is an objective envelope metric computed in the same domain as the input, but the paper does not report any CI-recipient listening tests or word-recognition scores under the varied real-world conditions mentioned in the motivation. If ECM and actual intelligibility diverge, the headline result does not yet support the clinical claim. The citation pattern is ordinary for the subfield and raises no red flags. This work is aimed at researchers already doing signal processing for cochlear implants who might want to try a CNN pre-processor on the CCi-MOBILE platform. A reader in that niche could extract the architecture ideas, but only if the full paper supplies the missing experimental reporting. It is worth sending for peer review so the experimental details and the ECM-to-intelligibility link can be checked directly.

Referee Report

2 major / 1 minor

Summary. The paper proposes performing speech enhancement directly in a cochlear filter-bank feature space using three CNN architectures (vanilla CNN, spectral-subtraction-style CNN, and Wiener-style CNN) and their causal variants. The central empirical claim is that these networks, particularly the causal Wiener-CNN, achieve significant improvement over existing baselines on the envelope coefficient measure (ECM) and represent a viable real-time option for the CCi-MOBILE platform.

Significance. The work targets a clinically relevant application by aligning the enhancement domain with CI auditory stimuli and explicitly addressing latency via causal networks. If the ECM gains are robust and reproducible, the approach could support practical deployment; the causal variants are a clear practical contribution.

major comments (2)

[Abstract and Experimental Results] Abstract and Experimental Results section: the claim that the networks 'achieve significant improvement over existing baseline systems' is presented without dataset sizes, number of noise conditions or subjects, baseline system descriptions, statistical tests, or cross-validation details. These omissions make the central empirical result impossible to evaluate from the reported text.
[Abstract] Abstract: the stated goal is improved speech intelligibility for CI users in naturalistic environments, yet only ECM is reported; no listening tests, word-recognition scores, or correlation analysis between ECM and intelligibility under the cited real-world conditions are provided. This leaves the link between the measured gains and the clinical motivation untested.

minor comments (1)

[Abstract] Abstract: the phrasing 'leads to the best overall envelope coefficient measure (ECM)' is ambiguous; clarify whether this means the highest ECM score or another aggregate.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed comments. We address each major point below, providing clarifications from the manuscript and indicating where revisions will be made to improve evaluability.

read point-by-point responses

Referee: [Abstract and Experimental Results] Abstract and Experimental Results section: the claim that the networks 'achieve significant improvement over existing baseline systems' is presented without dataset sizes, number of noise conditions or subjects, baseline system descriptions, statistical tests, or cross-validation details. These omissions make the central empirical result impossible to evaluate from the reported text.

Authors: The manuscript's Experimental Setup section specifies the dataset (TIMIT utterances mixed with NOISEX-92 noises at multiple SNRs, with training/test splits and cross-validation folds), number of noise conditions, baseline systems (e.g., spectral subtraction and Wiener filtering variants), and statistical tests (paired t-tests on ECM scores across conditions). The abstract and results summary are intentionally concise. To address the concern, we will expand the abstract with key parameters (dataset size, noise conditions, and note on statistical testing) while keeping it within length limits. revision: yes
Referee: [Abstract] Abstract: the stated goal is improved speech intelligibility for CI users in naturalistic environments, yet only ECM is reported; no listening tests, word-recognition scores, or correlation analysis between ECM and intelligibility under the cited real-world conditions are provided. This leaves the link between the measured gains and the clinical motivation untested.

Authors: ECM was selected as the primary metric because it directly quantifies improvements in the envelope coefficients that form the input to CI processors, aligning with the paper's focus on feature-space enhancement for the CCi-MOBILE platform. The manuscript cites prior work linking envelope measures to intelligibility but does not include new listening tests or word-recognition data, as these require CI user recruitment and were outside the scope of this objective evaluation study. We will add a brief discussion paragraph citing established correlations between ECM and intelligibility from the CI literature. revision: partial

standing simulated objections not resolved

Absence of new subjective listening tests or word-recognition scores to directly validate ECM gains against clinical intelligibility outcomes in naturalistic conditions.

Circularity Check

0 steps flagged

No circularity: experimental results rest on independent test-set comparisons

full rationale

The paper proposes three CNN architectures (vanilla, SS-CNN, Wiener-CNN) and their causal variants for speech enhancement in cochlear filter-bank space, then reports ECM improvements on held-out test data versus baselines. No equations, derivations, or parameter-fitting steps are described that would make any reported 'prediction' equivalent to an input by construction. No self-citation chains or uniqueness theorems are invoked to justify the architectures or metrics. The central claim is therefore an empirical comparison whose validity can be checked against external benchmarks without reducing to the paper's own fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that CNNs can learn useful mappings from the chosen filter-bank features to cleaner envelopes and that ECM is a sufficient proxy for clinical benefit; no new physical entities or ad-hoc constants are introduced beyond standard neural-network training.

axioms (1)

domain assumption Convolutional networks can extract both stationary and non-stationary acoustic components when trained on the cochlear filter-bank representation.
Invoked in the description of the three proposed architectures.

pith-pipeline@v0.9.0 · 5782 in / 1227 out tokens · 21576 ms · 2026-05-25T09:12:11.376770+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

[1]

Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

Introduction A cochlear implant (CI) is an implantable electronic device that provides the necessary sensation for hearing [1, 2, 3]; CI par- tially restores hearing ability for subjects with sensorineural hearing loss (generally profound hearing loss). According to a report by the U.S. Food and Drug Administration, over 96000 people in US (324,000 people...

work page internal anchor Pith review Pith/arXiv arXiv 2012
[2]

We then explain details of the proposed SE algorithms

Methodology In this section, we ﬁrst brieﬂy introduce the CI pipeline. We then explain details of the proposed SE algorithms. We also describe the computation of the objective speech intelligibil- ity score designed for the CI users. We ﬁnally discuss exist- ing baseline SE systems as well as different components of the proposed algorithms. 2.1. Cochlear ...

work page 2000
[3]

UT-Drive

Experiments In this section, we compare the performance of the proposed and the baseline SE algorithms. 3.1. Dataset We use “UT-Drive” corpora to perform the experiments in this study [34]. UT-Drive is a large-scale database of noise signals collected across different vehicle platforms under a wide range of ﬁeld driving conditions. The database contains t...

work page 2002
[4]

The contribution of this study is threefold

Conclusion The main goal of this study has been to propose a set of CNN- based SE algorithms that could be useful for CI users in nat- uralistic noisy conditions. The contribution of this study is threefold. First, we extracted speech features from noisy sig- nal based on CI auditory features. The extracted features were used in the proposed SE algorithms...

work page
[5]

R01 DC016839-02)

Acknowledgement This work was primarily supported by a National Institute on Deafness and Other Communication Disorders (NIDCD) Grant (No. R01 DC016839-02)

work page
[6]

Cochlear implant failures and reimplantation: A 30-year analysis and liter- ature review,

C. Lane, K. Zimmerman, S. Agrawal, and L. Parnes, “Cochlear implant failures and reimplantation: A 30-year analysis and liter- ature review,”The Laryngoscope, 2019

work page 2019
[7]

Near physiological spectral selectivity of cochlear op- togenetics,

A. Dieter, C. J. Duque-Afonso, V . Rankovic, M. Jeschke, and T. Moser, “Near physiological spectral selectivity of cochlear op- togenetics,” Nature communications, vol. 10, 2019

work page 2019
[8]

The cci-mobile vocoder,

H. Ali, N. Mamun, A. Bruggeman, R. C. M. Chandra Shekar, J. N. Saba, and J. H. L. Hansen, “The cci-mobile vocoder,” The Journal of the Acoustical Society of America, vol. 144, no. 3, pp. 1872–1872, 2018

work page 2018
[9]

(2014) National institute on deafness and other communication disorders, cochlear implants

NIDCD and NIH. (2014) National institute on deafness and other communication disorders, cochlear implants. [Online]. Available: http:////www.nidcd.nih.gov/health/hearing/pages/coch.aspx/

work page 2014
[10]

Cochlear implants: system design, integration, and evaluation,

F.-G. Zeng, S. Rebscher, W. Harrison, X. Sun, and H. Feng, “Cochlear implants: system design, integration, and evaluation,” IEEE reviews in biomedical engineering, pp. 115–142, 2008

work page 2008
[11]

An auditory-masking-threshold-based noise suppression algo- rithm gmmse-amt [erb] for listeners with sensorineural hearing loss,

A. Natarajan, J. H. L. Hansen, K. H. Arehart, and J. Rossi-Katz, “An auditory-masking-threshold-based noise suppression algo- rithm gmmse-amt [erb] for listeners with sensorineural hearing loss,” EURASIP Journal on Advances in Signal Processing , vol. 2005, no. 18, p. 678405, 2005

work page 2005
[12]

Speech recognition in noise as a function of the number of spectral chan- nels: Comparison of acoustic hearing and cochlear implants,

L. M. Friesen, R. V . Shannon, D. Baskent, and X. Wang, “Speech recognition in noise as a function of the number of spectral chan- nels: Comparison of acoustic hearing and cochlear implants,”The Journal of the Acoustical Society of America, vol. 110, no. 2, pp. 1150–1163, 2001

work page 2001
[13]

P. C. Loizou, Speech enhancement: theory and practice . CRC press, 2007

work page 2007
[14]

Speech enhancement for cochlear implant recipients,

D. Wang and J. H. L. Hansen, “Speech enhancement for cochlear implant recipients,” The Journal of the Acoustical Society of America, vol. 143, no. 4, pp. 2244–2254, 2018

work page 2018
[15]

Speech enhancement based on generalized minimum mean square er- ror estimators and masking properties of the auditory system,

J. H. L. Hansen, V . Radhakrishnan, and K. H. Arehart, “Speech enhancement based on generalized minimum mean square er- ror estimators and masking properties of the auditory system,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 2049–2063, 2006

work page 2049
[16]

Speech enhancement - an overview and recent ad- vances,

A. Dieter, C. J. Duque-Afonso, V . Rankovic, M. Jeschke, and T. Moser, “Speech enhancement - an overview and recent ad- vances,” Encyclopedia of Electrical and Electronics Engineering, vol. 20, pp. 159–175, 1999

work page 1999
[17]

Cci-mobile: Design and eval- uation of a cochlear implant and hearing aid research platform for speech scientists and engineers

J. H. L. Hansen, H. Ali, J. Saba, R. C. shekhar, N. Mamun, R. Ghosh, and A. Brueggeman, “Cci-mobile: Design and eval- uation of a cochlear implant and hearing aid research platform for speech scientists and engineers.”IEEE EMBS Inter Conf. Biomed- ical and health informatics (BHI-19), Chicago, IL , May 19-22, 2019

work page 2019
[18]

Quantifying cochlear implant users’ ability for speaker identiﬁcation using ci auditory stimuli

N. Mamun, R. Ghose, and J. H. Hansen, “Quantifying cochlear implant users’ ability for speaker identiﬁcation using ci auditory stimuli.” in Interspeech, 2019

work page 2019
[19]

Suppression of acoustic noise in speech using spectral subtraction,

S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113–120, 1979

work page 1979
[20]

An optimum mmse post- ﬁlter for adaptive noise cancellation in automobile environment,

S. Khorram, H. Sameti, and H. Veisi, “An optimum mmse post- ﬁlter for adaptive noise cancellation in automobile environment,” in 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA). IEEE, 2012, pp. 431–435

work page 2012
[21]

A signal subspace approach for speech enhancement,

Y . Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on speech and audio processing, vol. 3, no. 4, pp. 251–266, 1995

work page 1995
[22]

Visually derived wiener ﬁlters for speech enhancement,

I. Almajai and B. Milner, “Visually derived wiener ﬁlters for speech enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1642–1651, 2011

work page 2011
[23]

Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users,

T. Goehring, F. Bolner, J. J. Monaghan, B. van Dijk, A. Zarowski, and S. Bleeck, “Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users,” Hearing research, vol. 344, pp. 183–194, 2017

work page 2017
[24]

Multiple-target deep learning for lstm-rnn based speech enhancement,

L. Sun, J. Du, L.-R. Dai, and C.-H. Lee, “Multiple-target deep learning for lstm-rnn based speech enhancement,” in2017 Hands- free Speech Communications and Microphone Arrays (HSCMA) . IEEE, 2017, pp. 136–140

work page 2017
[25]

A regression ap- proach to speech enhancement based on deep neural networks,

Y . Xu, J. Du, L.-R. Dai, and C.-H. Lee, “A regression ap- proach to speech enhancement based on deep neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, vol. 23, no. 1, pp. 7–19, 2015

work page 2015
[26]

Snr-aware convolutional neural network modeling for speech enhancement

S.-W. Fu, Y . Tsao, and X. Lu, “Snr-aware convolutional neural network modeling for speech enhancement.” inInterspeech, 2016, pp. 3768–3772

work page 2016
[27]

Jointly aligning and predicting continuous emotion annotations,

S. Khorram, M. McInnis, and E. M. Provost, “Jointly aligning and predicting continuous emotion annotations,” IEEE Transactions on Affective Computing, 2019

work page 2019
[28]

Raw waveform-based speech enhancement by fully convolutional networks,

S.-W. Fu, Y . Tsao, X. Lu, and H. Kawai, “Raw waveform-based speech enhancement by fully convolutional networks,” in 2017 Asia-Paciﬁc Signal and Information Processing Association An- nual Summit and Conference (APSIPA ASC), 2017, pp. 006–012

work page 2017
[29]

Probabilistic per- mutation invariant training for speech separation,

M. Youseﬁ, S. Khorram, and J. H. L. Hansen, “Probabilistic per- mutation invariant training for speech separation,” Proc. Inter- speech, 2019

work page 2019
[30]

Compensation for do- main mismatch in text-independent speaker recognition,

F. Bahmaninezhad and J. H. L. Hansen, “Compensation for do- main mismatch in text-independent speaker recognition,” Proc. Interspeech 2018, pp. 1071–1075, 2018

work page 2018
[31]

Capturing long-term temporal dependencies with con- volutional networks for continuous emotion recognition,

S. Khorram, Z. Aldeneh, D. Dimitriadis, M. McInnis, and E. M. Provost, “Capturing long-term temporal dependencies with con- volutional networks for continuous emotion recognition,” Proc. Interspeech 2017, pp. 1253–1257, 2017

work page 2017
[32]

Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (nopm),

N. Mamun, W. A. Jassim, and M. S. Zilany, “Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (nopm),” IEEE/ACM Transactions on Audio, Speech, and Lan- guage Processing, vol. 23, no. 4, pp. 760–773, 2015

work page 2015
[33]

Predicting speech intelligibility with the regeneration of envelope from tfs cues for hearing impaired listeners,

K. Akter and N. Mamun, “Predicting speech intelligibility with the regeneration of envelope from tfs cues for hearing impaired listeners,” in International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, 2019, pp. 1–5

work page 2019
[34]

Measuring speech perception with recovered envelope cues using the periph- eral auditory model,

N. Mamun, K. Akter, H. Ali, and J. H. L. Hansen, “Measuring speech perception with recovered envelope cues using the periph- eral auditory model,” The Journal of the Acoustical Society of America, vol. 144, no. 3, pp. 1872–1872, 2018

work page 2018
[35]

Predicting the speech recep- tion threshold of cochlear implant listeners using an envelope- correlation based measure,

N. Youseﬁan and P. C. Loizou, “Predicting the speech recep- tion threshold of cochlear implant listeners using an envelope- correlation based measure,” The Journal of the Acoustical Society of America, vol. 132, no. 5, pp. 3399–3405, 2012

work page 2012
[36]

Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,

Y . Ephraim and D. Malah, “Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,” IEEE transactions on acoustics, speech, and signal processing, vol. 33, no. 2, pp. 443–445, 1985

work page 1985
[37]

Speech enhancement based on wavelet thresholding the multitaper spectrum,

Y . Hu and P. C. Loizou, “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE transactions on Speech and Audio processing, vol. 12, no. 1, pp. 59–67, 2004

work page 2004
[38]

Speech enhancement based on a priori signal to noise estimation,

P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,” in ICASSP, vol. 2. IEEE, 1996, pp. 629–632

work page 1996
[39]

In-vehicle speech and noise corpora,

N. Krishnamurthy, R. Lubag, and J. H. L. Hansen, “In-vehicle speech and noise corpora,” in Digital Signal Processing for In- Vehicle Systems and Safety. Springer, 2012, pp. 145–157

work page 2012
[40]

Speech database development at mit: Timit and beyond,

V . Zue, S. Seneff, and J. Glass, “Speech database development at mit: Timit and beyond,” Speech communication, vol. 9, no. 4, pp. 351–356, 1990

work page 1990
[41]

Progressive neural networks for transfer learning in emotion recognition,

J. Gideon, S. Khorram, Z. Aldeneh, D. Dimitriadis, and E. M. Provost, “Progressive neural networks for transfer learning in emotion recognition,” Interspeech 2017, pp. 1098–1102, 2017

work page 2017

[1] [1]

Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

Introduction A cochlear implant (CI) is an implantable electronic device that provides the necessary sensation for hearing [1, 2, 3]; CI par- tially restores hearing ability for subjects with sensorineural hearing loss (generally profound hearing loss). According to a report by the U.S. Food and Drug Administration, over 96000 people in US (324,000 people...

work page internal anchor Pith review Pith/arXiv arXiv 2012

[2] [2]

We then explain details of the proposed SE algorithms

Methodology In this section, we ﬁrst brieﬂy introduce the CI pipeline. We then explain details of the proposed SE algorithms. We also describe the computation of the objective speech intelligibil- ity score designed for the CI users. We ﬁnally discuss exist- ing baseline SE systems as well as different components of the proposed algorithms. 2.1. Cochlear ...

work page 2000

[3] [3]

UT-Drive

Experiments In this section, we compare the performance of the proposed and the baseline SE algorithms. 3.1. Dataset We use “UT-Drive” corpora to perform the experiments in this study [34]. UT-Drive is a large-scale database of noise signals collected across different vehicle platforms under a wide range of ﬁeld driving conditions. The database contains t...

work page 2002

[4] [4]

The contribution of this study is threefold

Conclusion The main goal of this study has been to propose a set of CNN- based SE algorithms that could be useful for CI users in nat- uralistic noisy conditions. The contribution of this study is threefold. First, we extracted speech features from noisy sig- nal based on CI auditory features. The extracted features were used in the proposed SE algorithms...

work page

[5] [5]

R01 DC016839-02)

Acknowledgement This work was primarily supported by a National Institute on Deafness and Other Communication Disorders (NIDCD) Grant (No. R01 DC016839-02)

work page

[6] [6]

Cochlear implant failures and reimplantation: A 30-year analysis and liter- ature review,

C. Lane, K. Zimmerman, S. Agrawal, and L. Parnes, “Cochlear implant failures and reimplantation: A 30-year analysis and liter- ature review,”The Laryngoscope, 2019

work page 2019

[7] [7]

Near physiological spectral selectivity of cochlear op- togenetics,

A. Dieter, C. J. Duque-Afonso, V . Rankovic, M. Jeschke, and T. Moser, “Near physiological spectral selectivity of cochlear op- togenetics,” Nature communications, vol. 10, 2019

work page 2019

[8] [8]

The cci-mobile vocoder,

H. Ali, N. Mamun, A. Bruggeman, R. C. M. Chandra Shekar, J. N. Saba, and J. H. L. Hansen, “The cci-mobile vocoder,” The Journal of the Acoustical Society of America, vol. 144, no. 3, pp. 1872–1872, 2018

work page 2018

[9] [9]

(2014) National institute on deafness and other communication disorders, cochlear implants

NIDCD and NIH. (2014) National institute on deafness and other communication disorders, cochlear implants. [Online]. Available: http:////www.nidcd.nih.gov/health/hearing/pages/coch.aspx/

work page 2014

[10] [10]

Cochlear implants: system design, integration, and evaluation,

F.-G. Zeng, S. Rebscher, W. Harrison, X. Sun, and H. Feng, “Cochlear implants: system design, integration, and evaluation,” IEEE reviews in biomedical engineering, pp. 115–142, 2008

work page 2008

[11] [11]

An auditory-masking-threshold-based noise suppression algo- rithm gmmse-amt [erb] for listeners with sensorineural hearing loss,

A. Natarajan, J. H. L. Hansen, K. H. Arehart, and J. Rossi-Katz, “An auditory-masking-threshold-based noise suppression algo- rithm gmmse-amt [erb] for listeners with sensorineural hearing loss,” EURASIP Journal on Advances in Signal Processing , vol. 2005, no. 18, p. 678405, 2005

work page 2005

[12] [12]

Speech recognition in noise as a function of the number of spectral chan- nels: Comparison of acoustic hearing and cochlear implants,

L. M. Friesen, R. V . Shannon, D. Baskent, and X. Wang, “Speech recognition in noise as a function of the number of spectral chan- nels: Comparison of acoustic hearing and cochlear implants,”The Journal of the Acoustical Society of America, vol. 110, no. 2, pp. 1150–1163, 2001

work page 2001

[13] [13]

P. C. Loizou, Speech enhancement: theory and practice . CRC press, 2007

work page 2007

[14] [14]

Speech enhancement for cochlear implant recipients,

D. Wang and J. H. L. Hansen, “Speech enhancement for cochlear implant recipients,” The Journal of the Acoustical Society of America, vol. 143, no. 4, pp. 2244–2254, 2018

work page 2018

[15] [15]

Speech enhancement based on generalized minimum mean square er- ror estimators and masking properties of the auditory system,

J. H. L. Hansen, V . Radhakrishnan, and K. H. Arehart, “Speech enhancement based on generalized minimum mean square er- ror estimators and masking properties of the auditory system,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 2049–2063, 2006

work page 2049

[16] [16]

Speech enhancement - an overview and recent ad- vances,

A. Dieter, C. J. Duque-Afonso, V . Rankovic, M. Jeschke, and T. Moser, “Speech enhancement - an overview and recent ad- vances,” Encyclopedia of Electrical and Electronics Engineering, vol. 20, pp. 159–175, 1999

work page 1999

[17] [17]

Cci-mobile: Design and eval- uation of a cochlear implant and hearing aid research platform for speech scientists and engineers

J. H. L. Hansen, H. Ali, J. Saba, R. C. shekhar, N. Mamun, R. Ghosh, and A. Brueggeman, “Cci-mobile: Design and eval- uation of a cochlear implant and hearing aid research platform for speech scientists and engineers.”IEEE EMBS Inter Conf. Biomed- ical and health informatics (BHI-19), Chicago, IL , May 19-22, 2019

work page 2019

[18] [18]

Quantifying cochlear implant users’ ability for speaker identiﬁcation using ci auditory stimuli

N. Mamun, R. Ghose, and J. H. Hansen, “Quantifying cochlear implant users’ ability for speaker identiﬁcation using ci auditory stimuli.” in Interspeech, 2019

work page 2019

[19] [19]

Suppression of acoustic noise in speech using spectral subtraction,

S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113–120, 1979

work page 1979

[20] [20]

An optimum mmse post- ﬁlter for adaptive noise cancellation in automobile environment,

S. Khorram, H. Sameti, and H. Veisi, “An optimum mmse post- ﬁlter for adaptive noise cancellation in automobile environment,” in 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA). IEEE, 2012, pp. 431–435

work page 2012

[21] [21]

A signal subspace approach for speech enhancement,

Y . Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on speech and audio processing, vol. 3, no. 4, pp. 251–266, 1995

work page 1995

[22] [22]

Visually derived wiener ﬁlters for speech enhancement,

I. Almajai and B. Milner, “Visually derived wiener ﬁlters for speech enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1642–1651, 2011

work page 2011

[23] [23]

Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users,

T. Goehring, F. Bolner, J. J. Monaghan, B. van Dijk, A. Zarowski, and S. Bleeck, “Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users,” Hearing research, vol. 344, pp. 183–194, 2017

work page 2017

[24] [24]

Multiple-target deep learning for lstm-rnn based speech enhancement,

L. Sun, J. Du, L.-R. Dai, and C.-H. Lee, “Multiple-target deep learning for lstm-rnn based speech enhancement,” in2017 Hands- free Speech Communications and Microphone Arrays (HSCMA) . IEEE, 2017, pp. 136–140

work page 2017

[25] [25]

A regression ap- proach to speech enhancement based on deep neural networks,

Y . Xu, J. Du, L.-R. Dai, and C.-H. Lee, “A regression ap- proach to speech enhancement based on deep neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, vol. 23, no. 1, pp. 7–19, 2015

work page 2015

[26] [26]

Snr-aware convolutional neural network modeling for speech enhancement

S.-W. Fu, Y . Tsao, and X. Lu, “Snr-aware convolutional neural network modeling for speech enhancement.” inInterspeech, 2016, pp. 3768–3772

work page 2016

[27] [27]

Jointly aligning and predicting continuous emotion annotations,

S. Khorram, M. McInnis, and E. M. Provost, “Jointly aligning and predicting continuous emotion annotations,” IEEE Transactions on Affective Computing, 2019

work page 2019

[28] [28]

Raw waveform-based speech enhancement by fully convolutional networks,

S.-W. Fu, Y . Tsao, X. Lu, and H. Kawai, “Raw waveform-based speech enhancement by fully convolutional networks,” in 2017 Asia-Paciﬁc Signal and Information Processing Association An- nual Summit and Conference (APSIPA ASC), 2017, pp. 006–012

work page 2017

[29] [29]

Probabilistic per- mutation invariant training for speech separation,

M. Youseﬁ, S. Khorram, and J. H. L. Hansen, “Probabilistic per- mutation invariant training for speech separation,” Proc. Inter- speech, 2019

work page 2019

[30] [30]

Compensation for do- main mismatch in text-independent speaker recognition,

F. Bahmaninezhad and J. H. L. Hansen, “Compensation for do- main mismatch in text-independent speaker recognition,” Proc. Interspeech 2018, pp. 1071–1075, 2018

work page 2018

[31] [31]

Capturing long-term temporal dependencies with con- volutional networks for continuous emotion recognition,

S. Khorram, Z. Aldeneh, D. Dimitriadis, M. McInnis, and E. M. Provost, “Capturing long-term temporal dependencies with con- volutional networks for continuous emotion recognition,” Proc. Interspeech 2017, pp. 1253–1257, 2017

work page 2017

[32] [32]

Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (nopm),

N. Mamun, W. A. Jassim, and M. S. Zilany, “Prediction of speech intelligibility using a neurogram orthogonal polynomial measure (nopm),” IEEE/ACM Transactions on Audio, Speech, and Lan- guage Processing, vol. 23, no. 4, pp. 760–773, 2015

work page 2015

[33] [33]

Predicting speech intelligibility with the regeneration of envelope from tfs cues for hearing impaired listeners,

K. Akter and N. Mamun, “Predicting speech intelligibility with the regeneration of envelope from tfs cues for hearing impaired listeners,” in International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, 2019, pp. 1–5

work page 2019

[34] [34]

Measuring speech perception with recovered envelope cues using the periph- eral auditory model,

N. Mamun, K. Akter, H. Ali, and J. H. L. Hansen, “Measuring speech perception with recovered envelope cues using the periph- eral auditory model,” The Journal of the Acoustical Society of America, vol. 144, no. 3, pp. 1872–1872, 2018

work page 2018

[35] [35]

Predicting the speech recep- tion threshold of cochlear implant listeners using an envelope- correlation based measure,

N. Youseﬁan and P. C. Loizou, “Predicting the speech recep- tion threshold of cochlear implant listeners using an envelope- correlation based measure,” The Journal of the Acoustical Society of America, vol. 132, no. 5, pp. 3399–3405, 2012

work page 2012

[36] [36]

Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,

Y . Ephraim and D. Malah, “Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,” IEEE transactions on acoustics, speech, and signal processing, vol. 33, no. 2, pp. 443–445, 1985

work page 1985

[37] [37]

Speech enhancement based on wavelet thresholding the multitaper spectrum,

Y . Hu and P. C. Loizou, “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE transactions on Speech and Audio processing, vol. 12, no. 1, pp. 59–67, 2004

work page 2004

[38] [38]

Speech enhancement based on a priori signal to noise estimation,

P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,” in ICASSP, vol. 2. IEEE, 1996, pp. 629–632

work page 1996

[39] [39]

In-vehicle speech and noise corpora,

N. Krishnamurthy, R. Lubag, and J. H. L. Hansen, “In-vehicle speech and noise corpora,” in Digital Signal Processing for In- Vehicle Systems and Safety. Springer, 2012, pp. 145–157

work page 2012

[40] [40]

Speech database development at mit: Timit and beyond,

V . Zue, S. Seneff, and J. Glass, “Speech database development at mit: Timit and beyond,” Speech communication, vol. 9, no. 4, pp. 351–356, 1990

work page 1990

[41] [41]

Progressive neural networks for transfer learning in emotion recognition,

J. Gideon, S. Khorram, Z. Aldeneh, D. Dimitriadis, and E. M. Provost, “Progressive neural networks for transfer learning in emotion recognition,” Interspeech 2017, pp. 1098–1102, 2017

work page 2017