Emotion Recognition in Kurdish Speech from the Sorani Dialect Corpus
DOI:
https://doi.org/10.21271/ZJPAS.36.5.10Keywords:
Emotion recognition, Speech analysis, Sorani KurdishAbstract
Given the increasing need for interactive human-computer applications, the field of employing machine learning algorithms to discern emotions from speech has seen a substantial surge in interest. While emotion recognition systems have made substantial progress in languages like German, English, Spanish, Dutch, and Danish, the availability of comprehensive datasets for the Kurdish language remains notably limited. This paper addresses this gap by focusing on emotion recognition in Sorani Kurdish dialect speech data, which was carefully gathered from openly available videos from the YouTube platform and categorized into four clear supposed emotions: neutral, sadness, happiness, and anger. The study applied both natural Mel Spectrogram and Mel-Frequency Cepstral Coefficient (MFCC) features for various spectrals, followed by the classification models K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), and Support Vector Machine (SVM) to evaluate the results. By closely examining and contrasting the results of using several methods for feature extraction, it was found that SVM obtained a higher accuracy, reaching as much as 85.57%. This is so much more than the first Kurdish emotion classification technique for the recognition of the emotion of the words.
References
AL-TALABANI, A. 2015. Automatic speech emotion recognition-feature space dimensionality and classification challenges. University of Buckingham.
AL-TALABANI, A., SELLAHEWA, H. & JASSIM, S. Excitation source and low level descriptor features fusion for emotion recognition using SVM and ANN. 2013 5th computer science and electronic engineering conference (CEEC), 2013. IEEE, 156-161.
AL-TALABANI, A., SELLAHEWA, H. & JASSIM, S. A. Emotion recognition from speech: tools and challenges. Mobile Multimedia/Image Processing, Security, and Applications 2015, 2015. SPIE, 193-200.
ALAMRI, H. 2023. Emotion recognition in Arabic speech from Saudi dialect corpus using machine learning and deep learning algorithms.
CAN, Y. S., MAHESH, B. & ANDRÉ, E. 2023. Approaches, applications, and challenges in physiological emotion recognition—a tutorial overview. Proceedings of the IEEE.
CHAVHAN, Y., DHORE, M. & YESAWARE, P. 2010. Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1, 6-9.
DULZ, I. 2016. The displacement of the Yezidis after the rise of ISIS in Northern Iraq. Kurdish Studies, 4, 131-147.
EL AYADI, M., KAMEL, M. S. & KARRAY, F. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern recognition, 44, 572-587.
EL SEKNEDY, M. & FAWZI, S. Arabic english speech emotion recognition system. 2023 20th Learning and Technology Conference (L&T), 2023. IEEE, 167-170.
GARCIA-CUESTA, E., SALVADOR, A. B. & PÃEZ, D. G. 2024. EmoMatchSpanishDB: study of speech emotion recognition machine learning models in a new Spanish elicited database. Multimedia Tools and Applications, 83, 13093-13112.
GUROWIEC, I. & NISSIM, N. 2024. Speech emotion recognition systems and their security aspects. Artificial Intelligence Review, 57, 1-45.
HAZMOUNE, S. & BOUGAMOUZA, F. 2024. Using transformers for multimodal emotion recognition: Taxonomies and state of the art review. Engineering Applications of Artificial Intelligence, 133, 108339.
JIANG, D.-N., LU, L., ZHANG, H.-J., TAO, J.-H. & CAI, L.-H. Music type classification by spectral contrast feature. Proceedings. IEEE international conference on multimedia and expo, 2002. IEEE, 113-116.
JOY, J., KANNAN, A., RAM, S. & RAMA, S. 2020. Speech emotion recognition using neural network and MLP classifier. Ijesc, 2020, 25170-25172.
KASURIYA, S., THEERAMUNKONG, T., WUTIWIWATCHAI, C. & SUKHUMMEK, P. 2019. Developing a Thai emotional speech corpus from Lakorn (EMOLA). Language Resources and Evaluation, 53, 17-55.
KURDA, Z. A. 2022. The Kurdish Issue: A Transnational Political Conflict. The European Union and the Kurdish Issue: The EU as a Rational and Normative Actor. Springer.
MARTINEZ-LUCAS, L., LIN, W.-C. & BUSSO, C. 2024. Analyzing continuous-time and sentence-level annotations for speech emotion recognition. IEEE Transactions on Affective Computing.
MCFEE, B., RAFFEL, C., LIANG, D., ELLIS, D. P., MCVICAR, M., BATTENBERG, E. & NIETO, O. librosa: Audio and music signal analysis in python. SciPy, 2015. 18-24.
MEFTAH, A., QAMHAN, M., ALOTAIBI, Y. A. & ZAKARIAH, M. 2020. Arabic speech emotion recognition using KNN and KSUEmotions corpus. International Journal of Simulation--Systems, Science & Technology, 21, 1-5.
PEDREGOSA, F., VAROQUAUX, G., GRAMFORT, A., MICHEL, V., THIRION, B., GRISEL, O., BLONDEL, M., PRETTENHOFER, P., WEISS, R. & DUBOURG, V. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
PULATOV, I., OTENIYAZOV, R., MAKHMUDOV, F. & CHO, Y.-I. 2023. Enhancing speech emotion recognition using dual feature extraction encoders. Sensors, 23, 6640.
SAJJAD, M. & KWON, S. 2020. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE access, 8, 79861-79875.
RAMAKRISHNAN, S. & EL EMARY, I. M. 2013. Speech emotion recognition approaches in human computer interaction. Telecommunication Systems, 52, 1467-1478.
SHAHIN, I., ALOMARI, O. A., NASSIF, A. B., AFYOUNI, I., HASHEM, I. A. & ELNAGAR, A. 2023. An efficient feature selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer. Applied Acoustics, 205, 109279.
SHEYHOLISLAMI, J. 2017. Language status and party politics in Kurdistan-Iraq: The case of Badini and Hawrami varieties. Zazaki–yesterday, today and tomorrow. Survival and standardization of a threatened language. Dieter Halwachs: Grazer Plurlingualismus Studien (GPS 04). Graz: GLM.
SCHOLL, W. 2013. The socio-emotional basis of human interaction and communication: How we construct our social world. Social Science Information, 52, 3-33.
SWAIN, M., ROUTRAY, A. & KABISATPATHY, P. 2018. Databases, features and classifiers for speech emotion recognition: a review. International Journal of Speech Technology, 21, 93-120.
TUSHNET, M. V., CHEN, A. K. & BLOCHER, J. 2020. Free speech beyond words: the surprising reach of the First Amendment, NYU Press.
Koolagudi, S.G., Maity, S., Kumar, V.A., Chakrabarti, S. and Rao, K.S., 2009. IITKGP-SESC: speech database for emotion analysis. In Contemporary Computing: Second International Conference, IC3 2009, Noida, India, August 17-19, 2009. Proceedings 2 (pp. 485-492). Springer Berlin Heidelberg.
Costantini, G., Iaderola, I., Paoloni, A. and Todisco, M., 2014. EMOVO corpus: an Italian emotional speech database. In Proceedings of the ninth international conference on language resources and evaluation (LREC'14) (pp. 3501-3504). European Language Resources Association (ELRA).
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F. and Weiss, B., 2005, September. A database of German emotional speech. In Interspeech (Vol. 5, pp. 1517-1520).
Jackson, P. and Haq, S., 2014. Surrey audio-visual expressed emotion (savee) database. University of Surrey: Guildford, UK.
Engberg, I.S. and Hansen, A.V., 1996. Documentation of the Emotional Speech Data Base, DES.
RUDAW MEDIA NETWORK is a major media broadcaster in the Kurdistan Region, Iraq. Founded: May 29,2013. [Online]. Available: https://www.youtube.com/@RudawKurdish/vides
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Omar Nematullah, Shavan Askar, Shahab Wahhab, Bzar Khidir
This work is licensed under a Creative Commons Attribution 4.0 International License.