Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

I. Elamvazuthi; Lindasalwa Muda; Mumtaj Begam

arxiv: 1003.4083 · v1 · submitted 2010-03-22 · 💻 cs.MM

Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques

Lindasalwa Muda , Mumtaj Begam , I. Elamvazuthi This is my paper

classification 💻 cs.MM

keywords signalvoiceextractionmatchingrecognitiontechniquesalignmentcepstral

0 comments

read the original abstract

Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology. The voice is a signal of infinite information. A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal. Therefore the digital signal processes such as Feature Extraction and Feature Matching are introduced to represent the voice signal. Several methods such as Liner Predictive Predictive Coding (LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN) and etc are evaluated with a view to identify a straight forward and effective method for voice signal. The extraction and matching process is implemented right after the Pre Processing or filtering signal is performed. The non-parametric method for modelling the human auditory perception system, Mel Frequency Cepstral Coefficients (MFCCs) are utilize as extraction techniques. The non linear sequence alignment known as Dynamic Time Warping (DTW) introduced by Sakoe Chiba has been used as features matching techniques. Since it's obvious that the voice signal tends to have different temporal rate, the alignment is important to produce the better performance.This paper present the viability of MFCC to extract features and DTW to compare the test patterns.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Jointly Aligning and Predicting Continuous Emotion Annotations
cs.LG 2019-07 unverdicted novelty 6.0

A multi-delay sinc network jointly aligns speech signals with delayed continuous emotion labels and predicts arousal/valence, claiming state-of-the-art speech-only results on RECOLA and SEWA.
Optimising MFCC parameters for the automatic detection of respiratory diseases
cs.SD 2024-08 conditional novelty 3.0

Empirical tuning of MFCC parameters (roughly 30 coefficients, shorter hops, dataset-dependent frame lengths) improves SVM accuracy for respiratory disease detection by 14.9-19.6% on COVID-19 and voice-disorder datasets.