Segmentation of Speech and Humming in Vocal Input

Loading...
Thumbnail Image

Authors

Sporka, Adam J.
Polacek, Ondrej
Havlik, Jan

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

Společnost pro radioelektronické inženýrství

ORCID

Abstract

Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control. In order to combine the two approaches (e.g. "volume up, mmm") it is necessary to perform a speech/NVVI segmentation of the input sound signal. This paper presents two novel methods of speech and humming segmentation. The first method is based on classification of MFCC and RMS parameters using a neural network (MFCC method), while the other method computes volume changes in the signal (IAC method). The two methods are compared using a corpus collected from 13 speakers. The results indicate that the MFCC method outperforms IAC in terms of accuracy, precision, and recall.

Description

Citation

Radioengineering. 2012, vol. 21, č. 3, s. 923-929. ISSN 1210-2512
http://www.radioeng.cz/fulltexts/2012/12_03_0923_0929.pdf

Document type

Peer-reviewed

Document version

Published version

Date of access to the full text

Language of document

en

Study field

Comittee

Date of acceptance

Defence

Result of defence

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 3.0 Unported License
Citace PRO