Emotion Recognition from Analysis of a Person’s Speech

Knutelský, Martin

Emotion Recognition from Analysis of a Person’s Speech

but.committee	doc. Ing. Lukáš Burget, Ph.D. (předseda) doc. Ing. Martin Čadík, Ph.D. (člen) doc. Ing. Vladimír Janoušek, Ph.D. (člen) Ing. Michal Hradiš, Ph.D. (člen) Ing. Jaroslav Rozman, Ph.D. (člen) Ing. Tomáš Milet, Ph.D. (člen)	cs
but.defence	Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm B.	cs
but.jazyk	angličtina (English)
but.program	Informační technologie a umělá inteligence	cs
but.result	práce byla úspěšně obhájena	cs
dc.contributor.advisor	Malik, Aamir Saeed	en
dc.contributor.author	Knutelský, Martin	en
dc.contributor.referee	Shakil, Sadia	en
dc.date.created	2023	cs
dc.description.abstract	Táto práca sa zaoberá analýzou rozpoznávania emócií z ľudskej reči. Jej cieľom je navrhnúť a implementovať systém, ktorý je schopný automaticky klasifikovať emočný stav z rečových nahrávok. Riešenie je založené na neurónovej sieti typu Audio Spectrogram Transformer (AST), odvodenej z neurónovej siete Vision Transformer, ktorej vstupom je mel spektrogram. Implementácia riešenia pozostáva z dvoch častí. Prvá časť sa zaoberá extrakciou mel spektrogramu zo vstupnej nahrávky reči, zatiaľ čo v druhej časti predtrénovaný AST model počíta odozvu, ktorej výstupom sú pravdepodobnosti pre uvažované emočné triedy. Tréning a vyhodnotenie implementácie bolo uskutočnené na troch dátových sadách: RAVDESS, Emo-DB a EMOVO. Získané výsledky vo forme neváženej presnosti sú 84.5 % pre RAVDESS, 91.6 % pre Emo-DB a 73.8 % pre EMOVO. Počas tréningu modelu bolo zaznamenávané emitované množstvo CO2 na základe spotrebovanej energie grafickým procesorom. Hlavným výstupom tejto práce je využitie neurónovej siete vychádzajúcej z architektúry typu Transformer, určenej pôvodone pre obrazové úlohy, na rozpoznávanie emócií z ľudskej reči. Ďalším výstupom je hodnota uhlíkovej stopy tréningu neurónovej siete, vyjadrená ako hmotnosť vylúčeného CO2, ktorá dosiahla hodnotu 1058.37 gramov.	en
dc.description.abstract	This thesis deals with the analysis of emotion recognition from human speech. It aims to design and implement a system that can automatically infer emotional states from speech recordings. The solution is based on the Audio Spectrogram Transformer (AST), a derivative of the Vision Transformer neural network, which accepts mel spectrogram as input. The implementation comprehends the pipeline with two stages. In the first stage, a mel spectrogram is obtained from the input speech recording and in the second stage, the pretrained AST model computes output in the form of probabilities of considered emotional classes. The AST implementation was trained and evaluated on three datasets: RAVDESS, Emo-DB and EMOVO. The obtained results in the form of unweighted accuracy are 84.5 % for RAVDESS, 91.6 % for Emo-DB and 73.8 % for EMOVO. During training, the consumed energy of the graphical processing unit was recorded for the calculation of the carbon footprint in terms of emitted CO2. The main contribution of this work is the utilization of neural network based on Transformer architecture, originally used for vision tasks, to classify emotions from speech. Another contribution is carbon footprint tracking of neural network training. The carbon footprint, expressed in emitted CO2 mass is 1058.37 grams.	cs
dc.description.mark	B	cs
dc.identifier.citation	KNUTELSKÝ, M. Emotion Recognition from Analysis of a Person’s Speech [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. 2023.	cs
dc.identifier.other	141159	cs
dc.identifier.uri	http://hdl.handle.net/11012/210539
dc.language.iso	en	cs
dc.publisher	Vysoké učení technické v Brně. Fakulta informačních technologií	cs
dc.rights	Standardní licenční smlouva - přístup k plnému textu bez omezení	cs
dc.subject	rozpoznávanie emócií z reči človeka	en
dc.subject	spracovanie rečového signálu	en
dc.subject	klasifikácia emócií	en
dc.subject	strojové účenie	en
dc.subject	hlboké učenie	en
dc.subject	Vision Transformer	en
dc.subject	Audio Spectrogram Transformer	en
dc.subject	uhlíková stopa	en
dc.subject	speech emotion recognition	cs
dc.subject	speech signal processing	cs
dc.subject	classification of emotions	cs
dc.subject	machine learning	cs
dc.subject	deep learning	cs
dc.subject	Vision Transformer	cs
dc.subject	Audio Spectrogram Transformer	cs
dc.subject	carbon footprint	cs
dc.title	Emotion Recognition from Analysis of a Person’s Speech	en
dc.title.alternative	Emotion Recognition from Analysis of a Person’s Speech	cs
dc.type	Text	cs
dc.type.driver	masterThesis	en
dc.type.evskp	diplomová práce	cs
dcterms.dateAccepted	2023-06-16	cs
dcterms.modified	2023-06-16-14:33:03	cs
eprints.affiliatedInstitution.faculty	Fakulta informačních technologií	cs
sync.item.dbid	141159	en
sync.item.dbtype	ZP	en
sync.item.insts	2025.03.26 15:36:16	en
sync.item.modts	2025.01.15 18:38:32	en
thesis.discipline	Strojové učení	cs
thesis.grantor	Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačových systémů	cs
thesis.level	Inženýrský	cs
thesis.name	Ing.	cs

Files

Original bundle

Now showing 1 - 3 of 3

Name:: final-thesis.pdf
Size:: 5.33 MB
Format:: Adobe Portable Document Format
Description:: final-thesis.pdf

Download

Name:: appendix-1.zip
Size:: 11.08 MB
Format:: zip
Description:: appendix-1.zip

Download

Name:: review_141159.html
Size:: 6.83 KB
Format:: Hypertext Markup Language
Description:: file review_141159.html

Download

Collections

2023