SPEAKER RECOGNITION - PowerPoint PPT Presentation

About This Presentation
Title:

SPEAKER RECOGNITION

Description:

SPEAKER RECOGNITION A PRESENTATION BY SHAMALEE DESHPANDE INTRODUCTION Speaker Recognition * Automatically recognizing speaker * Uses individual information ... – PowerPoint PPT presentation

Number of Views:3342
Avg rating:3.0/5.0
Slides: 20
Provided by: Sham69
Category:

less

Transcript and Presenter's Notes

Title: SPEAKER RECOGNITION


1
SPEAKER RECOGNITION
  • A PRESENTATION
  • BY
  • SHAMALEE DESHPANDE

2
INTRODUCTION
  • Speaker Recognition
  • Automatically recognizing
  • speaker
  • Uses individual information from the
    speakers speech waves

3
INTRODUCTION
  • Two Approaches
  • Text-Dependant Recognition
  • Text-Independent Recognition

4
INTRODUCTION
  • Two Approaches
  • Text-Dependant Recognition
  • Use of keywords or sentences having the same
    text for the templates and the recognition
  • Text-Independent Recognition

5
INTRODUCTION
  • Two Approaches
  • Text-Dependant Recognition
  • Text-Independent Recognition
  • Does not rely on a specific text being spoken.

6
INTRODUCTION
  • Classes of Sound
  • Voiced, unvoiced, Plosive
  • Production of Pitch Frequency
  • and Formants

Glottal Waveform
7
BLOCK DIAGRAM OF A SPEAKER RECOGNITION
SYSTEM
8
DESIRABLE ATTRIBUTES OF A SPEAKER RECOGNITION SYS
  • Feature should occur naturally and frequently in
    speech
  • Easily measurable
  • Doesnt change over time or be affected by
    speakers health
  • Isnt affected by background noise
  • Not be subject to mimicry

9
SOURCES OF VARIABILITY IN SPEECH
  • Phonetic Identity
  • Two samples may correspond to different phonetic
    segments. E.g. Vowel and fricative
  • Pitch
  • Pitch, other features like breathiness and
    amplitude can be varied
  • Speaker
  • Differences due to source physiology, emotions
  • Microphone
  • Environment

10
  • Possible Acoustic Parameters
  • Formant Frequencies
  • LPC
  • Pitch
  • Nasal Co articulation
  • Gain

11
COMMON SPEAKER RECOGNITION TECHNIQUES
  • DISCRETE FOURIER TRANSFORM
  • LINEAR PREDICTIVE CODING
  • CEPSTRAL ANALYSIS
  • DYNAMIC TIME WARPING
  • HIDDEN MARKOV MODELS

12
DISCRETE / FAST FOURIER TRANSFORM
  • Changes time domain signals into freq domain
    signal representations
  • Enables reduced complexity for processor

Read N speech samples from input
Append N-L zeroes to the input data
Calculation of DFT
Windowing
13
LINEAR PREDICTIVE CODING

TUBE Vocal tract
BUZZER Glottal excitation
Characterized by intensity and pitch
Characterized by formants
LPC model of the speech producing organs of the
body
14
CEPSTRAL ANALYSIS
  • Dis-adv of DFT/FFT is that formant freqs may
    shift the pitch or overlap it
  • In Cepstral analysis, formants are completely
    removed from the spectrum
  • Defined as Fourier Transform of the Log of the
    power spectrum
  • S(n) p(n) v(n)
  • X(n) w(n) s(n)
  • S(w) p(w) v(w) Fourier Transform
  • Log S(w)log p(w) log v(w)
  • C(q) log S(q) log p(q) log v(q)
  • Q quefrency , C(q) complex cepstrum

15
CEPSTRAL ANALYSIS
Window
DFT
LOG
IDFT
Speech
Cepstrum
16
DYNAMIC TIME WARPING
  • Incoming speech is usually compared frame by
    frame with stored template
  • Achieved via a pair wise comparison of feature
    vectors from each sequence
  • Dis Adv variation in length of corresponding
    phonemes
  • DTW takes into account non linear relation
    between lengths of the two signals
  • Used as a matching algorithm

Example DTW grid
17
HIDDEN MARKOV MODELS
  • Speech signal is identified during search process
    rather than explicitly
  • Comprises of
  • Hidden Markov Chain representing temporal
    variability
  • Observable process representing spectral
    variability
  • Portrayed as stochastic pair (X,Y)
  • HMM is a Finite State Machine where a Probability
    Density Function p(xs) is associated with each
    state s

18
FUTURE RESEARCH
  • To extract and apply all levels and information
    from the speech signal conveying speaker identity
  • Acoustic use spectral features conveying vocal
    tract information
  • Prosodic - use features derived from pitch,
    energy tracks
  • to classify information
  • Phonetic use phone sequences to characterize
    speaker specific pronunciations
  • Idiolect use words to characterize user
    specific word patterns
  • Linguistic use linguistic patterns to
    characterize speaker specific
    conversation style

19
APPLICATIONS
  • Access Control- physical facilities, computer
    networks and websites
  • PC Login and Password Reset
  • Secured Transactions remote banking and online
    credit card purchase authentication
  • Time Attendance - workplaces
  • Law Enforcement forensics, parole
Write a Comment
User Comments (0)
About PowerShow.com