Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment

Hong-Goo Kang; Min-Jae Hwang

arxiv: 1906.08407 · v1 · pith:PNPO44WCnew · submitted 2019-06-20 · 📡 eess.AS · cs.SD· eess.SP

Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment

Min-Jae Hwang , Hong-Goo Kang This is my paper

Pith reviewed 2026-05-25 19:34 UTC · model grok-4.3

classification 📡 eess.AS cs.SDeess.SP

keywords MELPspeech codecdeep learningparameter enhancementspeech enhancementnoisy communicationlow complexity

0 comments

The pith

Deep learning directly enhances MELP codec parameters in noise to match full enhancement performance at lower cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a deep learning model can improve the quality of speech transmitted through a MELP codec in noisy conditions by fixing the codec's internal parameters rather than the raw audio signal. This matters because conventional enhancement requires heavy time-frequency processing that adds complexity and delay. If successful, the method provides a lightweight alternative that works on either side of the codec without extra modules. The authors show that performance stays comparable to standard mask-based approaches while being simpler and faster.

Core claim

By enhancing the noise-corrupted codec parameters with the proposed DL framework, we achieved an enhancement system that is much simpler and faster than conventional T-F mask-based speech enhancement methods, while the quality of its performance remains similar.

What carries the argument

A small deep learning network operating on the MELP codec parameter stream for direct enhancement of noise-corrupted parameters.

Load-bearing premise

A small network operating solely on the codec parameter stream can recover sufficient information to match the performance of full time-frequency signal enhancement without any auxiliary analysis or synthesis modules.

What would settle it

Measuring the perceptual evaluation of speech quality (PESQ) or mean opinion score (MOS) on test sets where the proposed parameter enhancement yields scores substantially lower than T-F mask methods would falsify the claim of similar performance.

Figures

Figures reproduced from arXiv: 1906.08407 by Hong-Goo Kang, Min-Jae Hwang.

**Figure 1.** Figure 1: Various types of speech enhancement processes with MELP coder. and clean speech pair through MELP vocoder analysis as described in Section 2.1. Then, the DL network is trained to estimate clean MELP parameters from noisy MELP parameters by minimizing the mean squared error (MSE) criterion. To model the MELP parameters more accurately, some MELP parameters, such as gain, pitch, and Fourier magnitudes are … view at source ↗

read the original abstract

In this paper, we propose a deep learning (DL)-based parameter enhancement method for a mixed excitation linear prediction (MELP) speech codec in noisy communication environment. Unlike conventional speech enhancement modules that are designed to obtain clean speech signal by removing noise components before speech codec processing, the proposed method directly enhances codec parameters on either the encoder or decoder side. As the proposed method has been implemented by a small network without any additional processes required in conventional enhancement systems, e.g., time-frequency (T-F) analysis/synthesis modules, its computational complexity is very low. By enhancing the noise-corrupted codec parameters with the proposed DL framework, we achieved an enhancement system that is much simpler and faster than conventional T-F mask-based speech enhancement methods, while the quality of its performance remains similar.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a deep learning framework to directly enhance the parameters (e.g., LPC coefficients, pitch, gain) of the MELP speech codec when corrupted by noise, either at the encoder or decoder side. Unlike conventional approaches that apply time-frequency masking to the waveform before or after coding, the method uses a small network without auxiliary T-F analysis/synthesis modules, claiming substantially lower complexity while achieving similar output quality.

Significance. If the performance equivalence holds under rigorous testing, the approach could reduce computational overhead in noisy communication links that employ MELP. The work does not supply machine-checked proofs, reproducible code releases, or parameter-free derivations, so its significance rests entirely on the strength of the empirical comparisons.

major comments (2)

[Abstract and experimental evaluation section] Abstract and experimental evaluation section: the central claim that 'the quality of its performance remains similar' to T-F mask-based methods is unsupported by any reported metrics (PESQ, STOI, MOS), error bars, or statistical tests; the assertion rests on unshown experiments.
[Method and results sections] Method and results sections: no analysis or ablation is presented to test whether noise-corrupted MELP parameters retain sufficient information for a small parameter-only network to recover perceptually important components at a level comparable to full-waveform T-F enhancement; this information-theoretic precondition is load-bearing for the similarity claim.

minor comments (2)

Notation for the network architecture and loss function is introduced without an accompanying equation or diagram, making the implementation details difficult to reproduce.
[Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., average PESQ improvement) to support the similarity claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major point below and will revise the manuscript to provide the requested empirical support.

read point-by-point responses

Referee: [Abstract and experimental evaluation section] Abstract and experimental evaluation section: the central claim that 'the quality of its performance remains similar' to T-F mask-based methods is unsupported by any reported metrics (PESQ, STOI, MOS), error bars, or statistical tests; the assertion rests on unshown experiments.

Authors: We agree that the similarity claim requires quantitative backing. The revised manuscript will include PESQ, STOI, and MOS results (with error bars and statistical significance tests) comparing the proposed parameter-enhancement approach against conventional T-F mask methods on the same noisy test conditions. revision: yes
Referee: [Method and results sections] Method and results sections: no analysis or ablation is presented to test whether noise-corrupted MELP parameters retain sufficient information for a small parameter-only network to recover perceptually important components at a level comparable to full-waveform T-F enhancement; this information-theoretic precondition is load-bearing for the similarity claim.

Authors: We acknowledge the value of such an analysis. The revision will add an ablation study that quantifies how much perceptual information (e.g., via parameter reconstruction error and downstream perceptual metrics) is preserved in the noisy MELP parameters versus the full waveform, thereby testing the feasibility of the parameter-only route. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical DL proposal with no load-bearing derivations or self-referential fits

full rationale

The paper presents an empirical deep learning method for enhancing MELP codec parameters in noise, claiming simplicity and comparable performance to T-F mask methods. No equations, derivations, or first-principles results are described that reduce predictions to inputs by construction. The approach relies on training a small network on corrupted parameters, with performance evaluated externally via listening tests or metrics, not by re-deriving fitted quantities. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing elements. The central claim is an engineering outcome (simpler system with similar quality), which is falsifiable against independent benchmarks and does not collapse into self-definition or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.0 · 5663 in / 980 out tokens · 28546 ms · 2026-05-25T19:34:46.535962+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a DL-based parameter enhancement method for a mixed excitation linear prediction (MELP) speech codec... directly enhances codec parameters... small network without any additional processes... T-F analysis/synthesis modules
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

29-dimensional input and output vectors... GRU layers... MSE criterion

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 2 internal anchors

[1]

However, the core mod- ule of coding system, i.e., vocoding techniques, and the speech enhancement techniques have been developed independently to each other

Introduction To build a comfortable voice communication system in noisy environment, it is necessary to include a speech enhancement or noise reduction techniques [1–4]. However, the core mod- ule of coding system, i.e., vocoding techniques, and the speech enhancement techniques have been developed independently to each other. Thus, the entire speech comm...

work page
[2]

MELP coder with speech enhancement 2.1. MELP vocoder The main characteristic of the MELP codec [5] is to model an excitation signal by mixing voiced pulse and noise components in the frequency domain, where bandpass voicing ﬂags are used to represent the voicing information of frequency subbands. In the system, total six parameters that consist of excitat...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[3]

Then, noisy MELP parameters are directly en- hanced to be similar to the ones obtained from a clean speech signal via a DL network

Vocoder parameter enhancement method In the proposed system, the noise-corrupted speech signal is ﬁrst parameterized to the MELP parameters without any pre- processing. Then, noisy MELP parameters are directly en- hanced to be similar to the ones obtained from a clean speech signal via a DL network. To train the network, ﬁrst, both noisy and clean MELP pa...

work page
[4]

SA1” and “SA2

Experiments 4.1. Database generation In the experiments, phonetically balanced TIMIT corpus [14] and NOISEX-92 corpus [15] were used as speech and noise databases, respectively. To match the sampling rate with the 2.4 kbit/s MELP codec, all samples were down-sampled to 8-kHz. In the TIMIT database, sentences “SA1” and “SA2” commonly recorded by all speake...

work page
[5]

By directly enhancing the MELP parame- ters, the proposed algorithm was successfully combined with the MELP-based speech communication system

Conclusion In this paper, we introduced a DL-based parameter enhance- ment method for a MELP speech codec in noisy communica- tion environments. By directly enhancing the MELP parame- ters, the proposed algorithm was successfully combined with the MELP-based speech communication system. Experimental results showed that the proposed method had a higher sta...

work page
[6]

Acknowledgment This research was supported by Basic Science Research Pro- gram through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2019-11- 0124)

work page 2019
[7]

W. B. Kleijn and K. K. Paliwal, Eds., Speech Coding and Synthe- sis. New York, NY , USA: Elsevier Science Inc., 1995

work page 1995
[8]

The 1.2 kbps/2.4 kbps melp speech coding suite with integrated noise pre- processing,

J. S. Collura, D. F. Brandt, and D. J. Rahikka, “The 1.2 kbps/2.4 kbps melp speech coding suite with integrated noise pre- processing,” in MILCOM 1999. IEEE Military Communications. Conference Proceedings (Cat. No.99CH36341), vol. 2, Oct 1999, pp. 1449–1453 vol.2

work page 1999
[9]

Preprocessing of noisy speech for voice coders,

T. Agarwal and P. Kabal, “Preprocessing of noisy speech for voice coders,” in Speech Coding, 2002, IEEE Workshop Proceedings. , Oct 2002, pp. 169–171

work page 2002
[10]

New speech enhancement techniques for low bit rate speech coding,

R. Martin and R. V . Cox, “New speech enhancement techniques for low bit rate speech coding,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1999, pp. 614–617

work page 1999
[11]

A mixed excitation lpc vocoder model for low bit rate speech coding,

A. McCree and T. P. Barnwell, “A mixed excitation lpc vocoder model for low bit rate speech coding,” IEEE Trans. Speech and Audio Processing, vol. 3, pp. 242–250, 1995

work page 1995
[12]

Melp: the new federal standard at 2400 bps,

L. M. Supplee, R. P. Cohn, J. S. Collura, and A. V . McCree, “Melp: the new federal standard at 2400 bps,” in1997 IEEE Inter- national Conference on Acoustics, Speech, and Signal Processing, vol. 2, April 1997, pp. 1591–1594 vol.2

work page 1997
[13]

Ideal ratio mask estimation us- ing deep neural networks for robust speech recognition,

A. Narayanan and D. Wang, “Ideal ratio mask estimation us- ing deep neural networks for robust speech recognition,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 7092–7096

work page 2013
[14]

Phase- sensitive and recognition-boosted speech separation using deep recurrent neural networks,

H. Erdogan, J. Hershey, S. Watanabe, and J. Le Roux, “Phase- sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in 2015 IEEE International Confer- ence on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings, vol. 2015-August, 8 2015, pp. 708–712

work page 2015
[15]

Speech enhance- ment based on deep denoising autoencoder

X. Lu, Y . Tsao, S. Matsuda, and C. Hori, “Speech enhance- ment based on deep denoising autoencoder.” in INTERSPEECH. ISCA, 2013, pp. 436–440

work page 2013
[16]

Wave-u-net: A multi- scale neural network for end-to-end audio source separation,

D. Stoller, S. Ewert, and S. Dixon, “Wave-u-net: A multi- scale neural network for end-to-end audio source separation,” in Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018, 2018, pp. 334–340. [Online]. Available: http://ismir2018.ircam.fr/doc/pdfs/205 Paper.pdf

work page 2018
[17]

Segan: Speech enhance- ment generative adversarial network,

S. Pascual, A. Bonafonte, and J. Serr `a, “Segan: Speech enhance- ment generative adversarial network,” inINTERSPEECH, 2017

work page 2017
[18]

A wavenet for speech denois- ing,

D. Rethage, J. Pons, and X. Serra, “A wavenet for speech denois- ing,” in2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018, pp. 5069–5073

work page 2018
[19]

Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,

Y . Ephraim and D. Malah, “Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing , vol. 33, no. 2, pp. 443–445, April 1985

work page 1985
[20]

Darpa timit acoustic phonetic con- tinuous speech corpus cdrom,

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, “Darpa timit acoustic phonetic con- tinuous speech corpus cdrom,” 1993

work page 1993
[21]

Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems,

A. Varga and H. J. Steeneken, “Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, vol. 12, no. 3, pp. 247 – 251, 1993

work page 1993
[22]

Learning phrase representations using rnn encoder–decoder for statistical machine translation,

K. Cho, B. van Merri ¨enboer, C ¸ . G ¨ulc ¸ehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using rnn encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Doha, Qatar: Association for Computational Linguistics, ...

work page 2014
[23]

Understanding the difﬁculty of training deep feedforward neural networks,

X. Glorot and Y . Bengio, “Understanding the difﬁculty of training deep feedforward neural networks,” in Proc. AISTATS, 2010, pp. 249–256

work page 2010
[24]

An efﬁcient gradient-based algo- rithm for on-line training of recurrent network trajectories,

R. J. Williams and J. Peng, “An efﬁcient gradient-based algo- rithm for on-line training of recurrent network trajectories,” Neu- ral computat., vol. 2, no. 4, pp. 490–501, 1990

work page 1990
[25]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: http://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

A short- time objective intelligibility measure for time-frequency weighted noisy speech,

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “A short- time objective intelligibility measure for time-frequency weighted noisy speech,” in 2010 IEEE International Conference on Acous- tics, Speech and Signal Processing, March 2010, pp. 4214–4217

work page 2010
[27]

Real- valued fast fourier transform algorithms,

H. Sorensen, D. Jones, M. Heideman, and C. Burrus, “Real- valued fast fourier transform algorithms,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 6, pp. 849– 863, June 1987

work page 1987

[1] [1]

However, the core mod- ule of coding system, i.e., vocoding techniques, and the speech enhancement techniques have been developed independently to each other

Introduction To build a comfortable voice communication system in noisy environment, it is necessary to include a speech enhancement or noise reduction techniques [1–4]. However, the core mod- ule of coding system, i.e., vocoding techniques, and the speech enhancement techniques have been developed independently to each other. Thus, the entire speech comm...

work page

[2] [2]

MELP coder with speech enhancement 2.1. MELP vocoder The main characteristic of the MELP codec [5] is to model an excitation signal by mixing voiced pulse and noise components in the frequency domain, where bandpass voicing ﬂags are used to represent the voicing information of frequency subbands. In the system, total six parameters that consist of excitat...

work page internal anchor Pith review Pith/arXiv arXiv 1906

[3] [3]

Then, noisy MELP parameters are directly en- hanced to be similar to the ones obtained from a clean speech signal via a DL network

Vocoder parameter enhancement method In the proposed system, the noise-corrupted speech signal is ﬁrst parameterized to the MELP parameters without any pre- processing. Then, noisy MELP parameters are directly en- hanced to be similar to the ones obtained from a clean speech signal via a DL network. To train the network, ﬁrst, both noisy and clean MELP pa...

work page

[4] [4]

SA1” and “SA2

Experiments 4.1. Database generation In the experiments, phonetically balanced TIMIT corpus [14] and NOISEX-92 corpus [15] were used as speech and noise databases, respectively. To match the sampling rate with the 2.4 kbit/s MELP codec, all samples were down-sampled to 8-kHz. In the TIMIT database, sentences “SA1” and “SA2” commonly recorded by all speake...

work page

[5] [5]

By directly enhancing the MELP parame- ters, the proposed algorithm was successfully combined with the MELP-based speech communication system

Conclusion In this paper, we introduced a DL-based parameter enhance- ment method for a MELP speech codec in noisy communica- tion environments. By directly enhancing the MELP parame- ters, the proposed algorithm was successfully combined with the MELP-based speech communication system. Experimental results showed that the proposed method had a higher sta...

work page

[6] [6]

Acknowledgment This research was supported by Basic Science Research Pro- gram through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2019-11- 0124)

work page 2019

[7] [7]

W. B. Kleijn and K. K. Paliwal, Eds., Speech Coding and Synthe- sis. New York, NY , USA: Elsevier Science Inc., 1995

work page 1995

[8] [8]

The 1.2 kbps/2.4 kbps melp speech coding suite with integrated noise pre- processing,

J. S. Collura, D. F. Brandt, and D. J. Rahikka, “The 1.2 kbps/2.4 kbps melp speech coding suite with integrated noise pre- processing,” in MILCOM 1999. IEEE Military Communications. Conference Proceedings (Cat. No.99CH36341), vol. 2, Oct 1999, pp. 1449–1453 vol.2

work page 1999

[9] [9]

Preprocessing of noisy speech for voice coders,

T. Agarwal and P. Kabal, “Preprocessing of noisy speech for voice coders,” in Speech Coding, 2002, IEEE Workshop Proceedings. , Oct 2002, pp. 169–171

work page 2002

[10] [10]

New speech enhancement techniques for low bit rate speech coding,

R. Martin and R. V . Cox, “New speech enhancement techniques for low bit rate speech coding,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1999, pp. 614–617

work page 1999

[11] [11]

A mixed excitation lpc vocoder model for low bit rate speech coding,

A. McCree and T. P. Barnwell, “A mixed excitation lpc vocoder model for low bit rate speech coding,” IEEE Trans. Speech and Audio Processing, vol. 3, pp. 242–250, 1995

work page 1995

[12] [12]

Melp: the new federal standard at 2400 bps,

L. M. Supplee, R. P. Cohn, J. S. Collura, and A. V . McCree, “Melp: the new federal standard at 2400 bps,” in1997 IEEE Inter- national Conference on Acoustics, Speech, and Signal Processing, vol. 2, April 1997, pp. 1591–1594 vol.2

work page 1997

[13] [13]

Ideal ratio mask estimation us- ing deep neural networks for robust speech recognition,

A. Narayanan and D. Wang, “Ideal ratio mask estimation us- ing deep neural networks for robust speech recognition,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2013, pp. 7092–7096

work page 2013

[14] [14]

Phase- sensitive and recognition-boosted speech separation using deep recurrent neural networks,

H. Erdogan, J. Hershey, S. Watanabe, and J. Le Roux, “Phase- sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in 2015 IEEE International Confer- ence on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings, vol. 2015-August, 8 2015, pp. 708–712

work page 2015

[15] [15]

Speech enhance- ment based on deep denoising autoencoder

X. Lu, Y . Tsao, S. Matsuda, and C. Hori, “Speech enhance- ment based on deep denoising autoencoder.” in INTERSPEECH. ISCA, 2013, pp. 436–440

work page 2013

[16] [16]

Wave-u-net: A multi- scale neural network for end-to-end audio source separation,

D. Stoller, S. Ewert, and S. Dixon, “Wave-u-net: A multi- scale neural network for end-to-end audio source separation,” in Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018, 2018, pp. 334–340. [Online]. Available: http://ismir2018.ircam.fr/doc/pdfs/205 Paper.pdf

work page 2018

[17] [17]

Segan: Speech enhance- ment generative adversarial network,

S. Pascual, A. Bonafonte, and J. Serr `a, “Segan: Speech enhance- ment generative adversarial network,” inINTERSPEECH, 2017

work page 2017

[18] [18]

A wavenet for speech denois- ing,

D. Rethage, J. Pons, and X. Serra, “A wavenet for speech denois- ing,” in2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018, pp. 5069–5073

work page 2018

[19] [19]

Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,

Y . Ephraim and D. Malah, “Speech enhancement using a mini- mum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing , vol. 33, no. 2, pp. 443–445, April 1985

work page 1985

[20] [20]

Darpa timit acoustic phonetic con- tinuous speech corpus cdrom,

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, “Darpa timit acoustic phonetic con- tinuous speech corpus cdrom,” 1993

work page 1993

[21] [21]

Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems,

A. Varga and H. J. Steeneken, “Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Communication, vol. 12, no. 3, pp. 247 – 251, 1993

work page 1993

[22] [22]

Learning phrase representations using rnn encoder–decoder for statistical machine translation,

K. Cho, B. van Merri ¨enboer, C ¸ . G ¨ulc ¸ehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y . Bengio, “Learning phrase representations using rnn encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Doha, Qatar: Association for Computational Linguistics, ...

work page 2014

[23] [23]

Understanding the difﬁculty of training deep feedforward neural networks,

X. Glorot and Y . Bengio, “Understanding the difﬁculty of training deep feedforward neural networks,” in Proc. AISTATS, 2010, pp. 249–256

work page 2010

[24] [24]

An efﬁcient gradient-based algo- rithm for on-line training of recurrent network trajectories,

R. J. Williams and J. Peng, “An efﬁcient gradient-based algo- rithm for on-line training of recurrent network trajectories,” Neu- ral computat., vol. 2, no. 4, pp. 490–501, 1990

work page 1990

[25] [25]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available: http://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[26] [26]

A short- time objective intelligibility measure for time-frequency weighted noisy speech,

C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “A short- time objective intelligibility measure for time-frequency weighted noisy speech,” in 2010 IEEE International Conference on Acous- tics, Speech and Signal Processing, March 2010, pp. 4214–4217

work page 2010

[27] [27]

Real- valued fast fourier transform algorithms,

H. Sorensen, D. Jones, M. Heideman, and C. Burrus, “Real- valued fast fourier transform algorithms,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 6, pp. 849– 863, June 1987

work page 1987