Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques
read the original abstract
Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology. The voice is a signal of infinite information. A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal. Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal. Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal. The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed. The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques. The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques. Since it's obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performance.This paper present the viability of MFCC to extract features and DTW to compare the test patterns.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Jointly Aligning and Predicting Continuous Emotion Annotations
A multi-delay sinc network jointly aligns speech signals with delayed continuous emotion labels and predicts arousal/valence, claiming state-of-the-art speech-only results on RECOLA and SEWA.
-
Optimising MFCC parameters for the automatic detection of respiratory diseases
Empirical tuning of MFCC parameters (roughly 30 coefficients, shorter hops, dataset-dependent frame lengths) improves SVM accuracy for respiratory disease detection by 14.9-19.6% on COVID-19 and voice-disorder datasets.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.