Neuro-Fuzzy and Soft Computing for Speaker Recognition (????) - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Neuro-Fuzzy and Soft Computing for Speaker Recognition (????)

Description:

Neuro-Fuzzy and Soft Computing for Speaker Recognition ( ) J.-S. Roger Jang ( ) CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang – PowerPoint PPT presentation

Number of Views:212

Avg rating:3.0/5.0

Slides: 31

Provided by: KenH171

Category:

more less

Transcript and Presenter's Notes

Title: Neuro-Fuzzy and Soft Computing for Speaker Recognition (????)

1
Neuro-Fuzzy and Soft Computing forSpeaker
Recognition (????)
1999 CSIST

J.-S. Roger Jang (???)
CS Dept., Tsing Hua Univ., Taiwan
http//www.cs.nthu.edu.tw/jang
jang_at_cs.nthu.edu.tw

2
Outline

Introduction
Data acquisition
Feature extraction
Data reduction
Condensing, editing, fuzzy clustering
Fuzzy classifier refinement
Random search
Experiments
Conclusions and future work

3
Speaker Recognition

Types
Text-dependent or text-independent
Close-set or open-set
Methodologies involved
Digital signal processing
Pattern recognition
Clustering or vector quantization
Nonlinear optimization
Neuro-fuzzy techniques

4
Data Acquisition

Recording
Recording program of Windows 95/98
8 kHz sampling rate, 8-bit resolution
(worse than phone quality)
5-second speech signal takes about 40KB.
Samples
Speaker 1
Speaker 2
Speaker 3

5
Feature Extraction

Major steps
Overlapping frames of 256 points (32 ms)
Hamming windowing to lessen distortion
cepstrum(frame) real(IFFT(logFFT(frame)))
FFT Fast Fourier Transform
IFFT Inverse FFT
A feature vector consists of the first 14
cepstral coefficients of a frame.
Optional steps
Frequency-selective filter to reduce noise
Mel-warped cepstral coefficients
Feature-wise normalization

6
Physical Meanings of Cepstrum
7
Feature Extraction
2.39 sec. speech signal
148 frames of 256 points
Hamming windowing
take frames
low-pass filter
FFT
abs
log
resample
IFFT
real
first 14 coefficients
normalization
148 feature vectors of length 14
8
Feature Extraction

Upper speaker 1 , lower speaker 2

9
Pattern Recognition

Schematic diagram

Feature Extraction
Data Reduction
Sample speech
Sample set
Classifier
Feature Extraction
Recognized speaker
Test speech
10
Pattern Recognition Methods

K-NNR K nearest neighbor rule
Euclidean distance
Mahalanobis distance
Maximum log likelihood
Adaptive networks
Multilayer perceptrons
Radial basis function networks
Fuzzy classifiers with random search

11
K-Nearest Neighbor Rule (K-NNR)

Steps
1. Find the first k nearest neighbors of a given
point.
2. Determine the class of the given point by a
voting mechanism among these k nearest
neighbors.

class-A point class-B point point with
unknown class
Feature 2
Feature 1
12
Decision Boundary for 1-NNR

Voronoi diagram piecewise linear boundary

13
Distance Metrics

Euclidean distance
Mahalanobis distance

14
Maximum Log Likelihood

Multivariate Normal Distribution N(m, S)
Likelihood of x in class j
Log likelihood

15
Maximum Log Likelihood

Likelihood of X x1, , xn in class j
Log likelihood

16
Data Reduction

Purpose
Reduce NNR computation load
Increase data consistency
Techniques
To reduce data size
Editing To eliminate noisy (boundary) data
Condensing To eliminate redundant (deeply
embedded) data
Vector quantization To find representative data
To reduce data dimensions
Principal component projection To reduce the
dimensions of the feature sets
Discriminant projection To find the best set of
vectors which best separates the patterns

17
Editing

To remove noisy (boundary) data

18
Condensing

To remove redundant (deeply embedded) data

19
VQ Fuzzy C-Means Clustering

A point can belong to various clusters with
various degrees.

20
Fuzzy Classifier

Rule base
if x is close to (A1 or A2 or A3), then class
if x is close to (B1 or B2 or B3), then class

A3
A fuzzy classifier is equivalent to a 1-NNR if
all MFs have the same width.
v
A2
B1
v
B2
A1
v
B3
21
Fuzzy Classifier

Adaptive network representation

A1
max
A2
x1
A3

y
S
-
B1
x2
max
B2
B2
multidimensional MFs
x x1 x2 belongs to class if y gt 0
class if y lt 0
22
Refining Fuzzy Classifier
MFs with the same width
v
v
v
MFs widths refined via random search
23
Principal Component Projection

Eigenvalues of covariance matrix l1 gt l2 gt l3 gt
... gt ld
Projection on v1 v2
Projection on v3 v4

24
Discriminant Projection

Best discriminant vectors v1, v2, ... , vd
Projection on v1 v2 Projection
on v3 v4

25
Experiments

Experimental data
Sample size 578, test size 1063, no. of class
3
No. of each speaker for sample data 148 280 150
No. of each speaker for test data 256 457 350
Experiments
K-NNR with all sample data
K-NNR with reduced sample data
Fuzzy classifier refined via random search

26
Performance Using All Samples

Sample size 578
Test size 1063

Recognition rates as functions of the speech
signal length

Confusion
matrix

27
Performance After E D

Sample size 497 after editing, 64 after
condensing
Test size 1063

Recognition rates as functions of the speech
signal length

Confusion
matrix

28
Performance After VQ (FCM)

Sample size 60 after FCM
Test size 1063

Recognition rates as functions of the speech
signal length
Confusion matrix
29
Performance After VQ RS

Sample (rule) size 60, tuned via random search
Test size 1063

Recognition rates as functions of the speech
signal length
Confusion matrix
30
On-line Recognition Hardware Setup
31
Conclusions

Performance after editing and condensing is
unpredictable.
Performance after VQ (FCM) is consistently better
than that of editing and condensing.
A simple derivative-free optimization method,
I.e., random search, can significantly enhance
the performance.

32
Future work