Emotion Recognition from Analysis of a Person’s Speech using Deep Learning

Galba, Šimon

Emotion Recognition from Analysis of a Person’s Speech using Deep Learning

Files

final-thesis.pdf(2.55 MB)

review_153400.html(12.23 KB)

Authors

Galba, Šimon

Advisor

Malik, Aamir Saeed

Referee

Kekely, Lukáš

Mark

B

Publisher

Vysoké učení technické v Brně. Fakulta informačních technologií

Abstract

Táto práca sa zaoberá analýzou a implementáciou neurónovej siete za účelom rozpoznávania emócií z reči človeka pomocou hlbokého učenia. Práca sa taktiež zaoberá ladením tejto siete za účelom dosiahnutia väčšej citlivosti voči konkrétnej emócii a skúma časové a nepriamo aj finančné nároky tohto ladenia. Inšpiráciou na vytvorenie tejto práce je stúpajúca integrácia umelej inteligencie v oblasti biológie, zdravotníctva ako aj psychológie a jedným z cieľov je aj skúmanie náročnosti vytvárať konkrétne modely neurónových sietí na účely v týchto vedách, čo by malo prispieť k lepšej dostupnosti modelov umenelej inteligencie. Práca stavia na základe implementácie modelu "AST: Audio Spectrogram Transformer" ktorá je verejne dostupná pod licenciou BSD 3-Clause License a využíva metódy ktoré boli doposiaľ využívané na klasifikáciu a rozpoznávanie obrazov vďaka premene zvukovej stopy na spektrogram. Výsledné hodnoty váženej presnosti sú následovné: 93.5% pre EMODB dataset, 92.8% pre EMOVO a 92,9% pre dataset RAVDESS.
This thesis deals with the analysis and implementation of a neural network for the purpose of recognizing emotions from human speech using deep learning. The thesis also focuses on tuning this network to achieve greater sensitivity to a specific emotion and explores the time and indirectly the financial requirements of this tuning. The inspiration for creating this work is the increasing integration of artificial intelligence in the fields of biology, healthcare, as well as psychology, and one of the goals is also to study the complexity of creating specific models of neural networks for purposes in these sciences, which should contribute to better accessibility of artificial intelligence models. The work is based on the implementation of the "AST: Audio Spectrogram Transformer" model, which is publicly available under the BSD 3-Clause License and utilizes methods that have been used so far for classification and recognition of images by converting an audio track into a spectrogram. The resulting values of weighted accuracy are as follows: 93.5% for the EMODB dataset, 92.8% for EMOVO, and 92.9% for the RAVDESS dataset.

Keywords

hluboké učení, Audio Spectrogram Transformer, rozpoznávání emocí z řeči, zpracování řečového signálu, klasifikace emocí, deep learning, Audio Spectrogram Transformer, speech emotion recognition, speech signal processing, emotion classification

Citation

GALBA, Š. Emotion Recognition from Analysis of a Person’s Speech using Deep Learning [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. 2024.

Language of document

en

Study field

Strojové učení

Comittee

prof. Dr. Ing. Jan Černocký (předseda) doc. Ing. Lukáš Burget, Ph.D. (člen) doc. Mgr. Lukáš Holík, Ph.D. (člen) doc. RNDr. Pavel Smrž, Ph.D. (člen) doc. Ing. Vítězslav Beran, Ph.D. (člen) Ing. František Grézl, Ph.D. (člen)

Date of acceptance

2024-06-17

Defence

Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm B.

Result of defence

práce byla úspěšně obhájena

Document licence

Standardní licenční smlouva - přístup k plnému textu bez omezení