Speaker Discrimination Using Long-Term Spectrum of Speech

Sigmund, Milan

Speaker Discrimination Using Long-Term Spectrum of Speech

Files

21248Article Text7722511020190925.pdf(750.07 KB)

Date

2019-09-25

Authors

Sigmund, Milan

ORCID

0000-0003-3973-3626

Publisher

Kaunas University of Technology

Altmetrics

Abstract

In this article, a specific long-term speech spectrum was investigated with respect to its use for speaker recognition. The long-term spectrum was calculated by means of second-order linear prediction using the average autocorrelation coefficients. Four subbands with the most discriminative capability were selected for speaker recognition. These subbands involve the frequencies of 0-1.2 kHz in total. The best recognition rates, i.e. 91.7% on complete speech and 100% on voiced speech, were achieved in optimal paired subbands.
V tomto článku bylo zkoumáno specifické dlouhodobé spektrum řeči s ohledem na jeho využití pro rozpoznávání mluvčích. Dlouhodobé spektrum bylo vypočteno pomocí lineární predikce druhého řádu s použitím průměrných autokorelačních koeficientů. Pro rozpoznávání mluvčích byly vybrány čtyři dílčí pásma s nejvyšší diskriminační schopností. Tato pásma zahrnují celkem frekvence 0-1,2 kHz. V optimálně spárovaných dílčích pásmech bylo dosaženo nejlepší míry rozpoznávání, a sice 91,7% při použití kompletní řeči a 100% při použití znělé řeči.

Keywords

Speech signal, long-term spectrum, speaker discrimination, efficient features, Řečový signál, dlouhodobé spektrum, rozpoznávání mluvčích, efektivní příznaky

Citation

Information Technology and Control. 2019, vol. 48, issue 3, p. 446-453.
http://itc.ktu.lt/index.php/ITC/article/view/21248

Document type

Peer-reviewed

Document version

Published version

Language of document

en

Document licence

Creative Commons Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/