Speaker Recognition Regardless of Context and Language on a Fixed Set of Competitorsстатья
Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из перечня ВАК
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 14 сентября 2017 г.
Аннотация:The problem of speaker recognition from a given set of speakers for any language and any context
is considered. A database of Russian numerals that contains speech segments from 216 men and 177 women,
each of whom spoke from 400 to 800 words, is used for recognition. Speech has been recorded on different
types of microphones in different rooms at the natural noise level. Recognition is based on solutions of the
inverse problem of finding the voice excitation pulse shape for each pitch period by the known speech seg
ment. The pulse shape is defined as the inverse Fourier transform of the regularized ratio of speech signal
spectra at the intervals of the open and closed glottis. Recognition is carried out by ten parameters: the pitch
period, the open glottis interval duration, times when the source amplitude is maximum, minimum, or zero,
the amplitude ratio for the minimum and maximum source pulses, three decomposition ratios of the source
function by the principal component method, and the vowel duration. In such a recognition procedure, in
the case of the utterance of a word that contains one vowel, the false reject rate (FRR) for men is 1.7–5.4%,
and the false acceptance rate (FAR) is 5.4–7.1%. For women FRR = 2–5.2% and FAR = 5.2–6.3%. The rec
ognition error decreases with an increasing number of vowels in the speech signal. At 10 vowels, for men FRR
= 0.05–0.2% and FAR = 0.07–0.8%, and for women FRR = 0.09–0.2% and FAR = 0.17–2.1%.