Neural Networks With Dilated Convolutions For Sound Event Recognition

Loading...
Thumbnail Image

Date

Authors

Miklanek, Stepan

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoké učení technické v Brně, Fakulta elektrotechniky a komunikačních technologií

ORCID

Abstract

Convolutional neural networks, most commonly deployed in image classification tasks,typically use square-shaped convolutional kernels, which are well suited for feature extraction fromtwo-dimensional data. This study explores the effect of utilizing spectrally aware dilated convolutionsspecialized for sound event recognition. By extending the base kernels in the time or the frequencydimension, the features extracted from the spectral audio representations should, in theory, bettercapture the temporal and timbral information of different sound events. The baseline neural networkmodel with squared kernels was compared against three models, which used an increasing dilationfactor in the subsequent convolutional layers. The three models were purposefully tuned to focustowards the frequency and time feature extraction. The results have shown that the models withdilated convolutions performed noticeably better in comparison with the baseline model.

Description

Citation

Proceedings I of the 27st Conference STUDENT EEICT 2021: General papers. s. 581-585. ISBN 978-80-214-5942-7
https://conf.feec.vutbr.cz/eeict/index/pages/view/ke_stazeni

Document type

Peer-reviewed

Document version

Published version

Date of access to the full text

Language of document

en

Study field

Comittee

Date of acceptance

Defence

Result of defence

DOI

Endorsement

Review

Supplemented By

Referenced By

Citace PRO