Tuning Dari Speech Classification Employing Deep Neural Networks


Mursal Dawodi, Jawid Ahmad Baktash, University Avignon, France


Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and common tool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP), and two hybrid models of CNN and RNN. We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.


Dari, deep neural network, speech recognition, recurrent neural network, multilayer perceptron, convolutional neural network