pith. sign in

arxiv: 1611.00326 · v3 · pith:24RGC4YWnew · submitted 2016-11-01 · 💻 cs.SD · cs.LG· stat.ML

Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection

classification 💻 cs.SD cs.LGstat.ML
keywords speechdetectionfactoredboltzmannenhancedframesmachinesrestricted
0
0 comments X
read the original abstract

In this letter, we propose enhanced factored three way restricted Boltzmann machines (EFTW-RBMs) for speech detection. The proposed model incorporates conditional feature learning by multiplying the dynamical state of the third unit, which allows a modulation over the visible-hidden node pairs. Instead of stacking previous frames of speech as the third unit in a recursive manner, the correlation related weighting coefficients are assigned to the contextual neighboring frames. Specifically, a threshold function is designed to capture the long-term features and blend the globally stored speech structure. A factored low rank approximation is introduced to reduce the parameters of the three-dimensional interaction tensor, on which non-negative constraint is imposed to address the sparsity characteristic. The validations through the area-under-ROC-curve (AUC) and signal distortion ratio (SDR) show that our approach outperforms several existing 1D and 2D (i.e., time and time-frequency domain) speech detection algorithms in various noisy environments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.