Title: Speaker Verification System using SVM
1Speaker Verification System using SVM
Jun-Won Suh Intelligent Electronic Systems Human
and Systems Engineering Department of Electrical
and Computer Engineering
2Outline Summary of Ph.d Dissertation of Vincent
Wan
- Speaker verification system
- Extracting features
- Creating models of speakers
- Generative models, discriminative models
- Making generative models discriminative
- Developing speaker verification using SVMs
- My interest to improve our system.
3Speaker verification system
- Authenticate a persons claimed identity
- Text dependent and independent
- The system models the sound of the clients
voice. (based on physical characteristics of the
clients vocal tract.)
- Feature extraction
- Enrolment
- Creates a model for clients voice
- Pattern matching
- Decision theory
A generic speaker verification system
4Extracting features
- Building models of speakers depends on frequency
analysis of the speakers voice. - Linear predictive coding (LPC)
- LPC assumes that speech can be modelled as the
output of periodic pulses or random noise. - The solutions for these LPC coefficients is
obtained by minimizing MSE. - Perceptual linear prediction (PLP)
- PLP combines LPC analysis with psychophysics
knowledge of the human auditory system. - Ex Human ear has a higher frequency resolution
at low frequencies.
5Creating models of speakers
- Generative models
- Gaussian Mixture Model (GMM), Hidden Markov
Model (HMM) - Models are probability density estimators that
attempt to capture all of the fluctuations and
variations of the data. - Discriminative models
- Polynomial classifiers, Support Vector Machines
(SVM) - Models are optimized to minimize the error on a
set of training samples. - Models draw the boundary between classes and
ignores the fluctuations within each class. - Generative models discriminative
- Generative models use to estimate the within
class probability densities and do not minimize a
classification error. - Discriminative models achieves the highest
performance in classification tasks.
6Making generative models discriminative
- GMM-LR/SVM combination
- GMM likelihood ratio
- Bengio proposed that the probability estimates
are not perfect and a better version would be - Bayes decision rule
- The input to the SVM is the two dimensional
vector made up of the log likelihoods of the
client and world models. - A limitation of these approaches arises from
frame basis discrimination.
7Importance of kernels
- Early SVM using polynomial and RBF kernels
- Optimization problems requiring significant
computational resources that were unsustainable. - Employing cluster algorithms to reduce the
accuracy. - Frame level training inputs discard the useful
speaker classification information. - SVM using score-space kernels
- The variable length of utterance can be
classified by sequence level.
8Classifying sequences using score-space kernels
- The score-space kernel enables SVMs to classify
whole sequences. - A variable length sequence of input vectors is
mapped explicitly onto a single point in a space
of fixed dimension. - The score-space is derived from the likelihood
score. - The likelihood ratio score-space
9Computing the score-space vectors
Define the global likelihood of a sequence X
x1, , xNl
10Computing the score-space vectors
- The fixed length vectors of the likelihood ration
kernel can be expressed as - The final likelihood ratio kernel is
- The dimensionality of the score-space is equal to
the total number of parameters in the generative
models. Hence the SVM can classify the complete
utterance sequences.
11Experiment Results on PolyVar
- The data has a noise.
- The data has a much more clients tests than YOHO.
12Conclusion
- Add GMM-LR/SVM model in our verification system
- Add score-space kernel on SVM
- Need to compare the computation requirement for
Fisher and LR kernels. -
13References
- V. Wan, Speaker Verification using Support
Vector Machines, University of Sheffield, June
2003 - V. Wan, Building Sequence Kernels for Speaker
Verificaiton and Speech Recognition, University
of Sheffield - S. Bengio, and J. Marithoz, Learning the
Decision Function for the Speaker Verification,
IDIAP, 2001