Title: Model Formation and Classification Techniques For Conversation-based Speaker Discrimination
1Model Formation and Classification Techniques For
Conversation-based Speaker Discrimination
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
Uchechukwu O. Ofoegbu
2Acknowledgement
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
My committee members, for your time and
commitment to my research
The Air Force Research Labs, for financially
supporting most of this research work
My family, for being there
Dr Y, the best advisor one could hope for
Members and Friends of the Speech Lab, for your
valuable contributions
ECE faculty and staff, for your great support
The audience, for being a part of this
3Presentation Outline
- Introduction
- Challenges of Conversational Data
- General Applications of Research
- Novelty of Research
- Introduction
- Evaluation Databases
- Modeling Speakers
- Traditional Speaker Modeling
- Proposed Method
- Features Used
- Distance Used
- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Unsupervised Speaker Indexing
- Speaker Count
- Generalized Speaker Indexing
- Introduction
- Evaluation Databases
- HTIMIT
- SWITCHBOARD
- New Conversations Database
- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Fusion of Distance Measures
- Optimized T Distance
- Decision-Based Combination
- Weighted Decision-Based Combination
- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Fusion of Distance Measures
- Summary
- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Fusion of Distance Measures
- Summary
- Further Research
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
4 Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
5Challenges of Conversational Data
- No a priori information available from
participating speakers - Training is impossible
- No a priori knowledge of change points
- Speakers alternate very rapidly
- Limited amounts of data for single speaker
representations - Distortion
- Channel noise, co-channel data
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
6Proposed Solutions
- Selective creation of data models
- Distance-Based Model Comparison
- Development of application-specific system
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
7Novelty of this Research
- Selective creation of data models
- Distance-Based Model Comparison
- Development of application-specific system
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
8Applications
- Monitoring criminal conversations
- Forensics
- Automated Customer Services
- Storage/Search/Retrieval of Audio Data
- Military Activities
- Conference calls
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
9Databases
- Standard Speaker Discrimination Databases
- HTMIT
- Switchboard
- Temple Conversations Database (TCD)
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
10 Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
11Traditional Speaker Modeling
- Examples
- Gaussian Mixture Models
- Hidden Markov Models
- Neural Networks
- Prosody-Based Models
- Disadvantages
- Require large amounts
- Sometimes require training procedure
- Relatively complex
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
12Conversational Data Modeling
- Current Method
- Equal segmentation of data
- Indiscriminate use of data
- Problems
- Change points unknown
- Not all speech is useful
- Poor performance
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
13Proposed Speaker Modeling
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
SEGMENT 1
SEGMENT M
FEATURE COMPUTATION
FEATURE COMPUTATION
. . .
MODEL 1
MODEL M
14Proposed Speaker Modeling
- Why voiced only?
- Same speech class compared
- Contains the most information
- Whats the appropriate number of phonemes?
- Large enough to sufficiently represent speakers
- Small enough to avoid speaker overlap
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
15Features Considered
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
- Linear Predictive Cepstral Coefficients
- Model the vocal tract
- Mel-Scale Frequency Cepstral Coefficients
- Model the human auditory system
16Distance Measurements
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Different speaker distances
Same speaker distances
17Distances Used
- Mahalanobis Distance
- Hotellings T-Square Statistics
- Kullback-Leibler Distance
- Bhattacharyya Distance
- Levenes Test
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
18Analysis of Cepstral Features
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
19Best Number of Phonemes?
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Number of Phonemes
Features Used - LPCC
20 Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
21Unsupervised Speaker Indexing
- The Restrained-Relative Minimum Distance (RRMD)
Approach
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
REFERENCE MODELS
0 D1,2 D1,3 D2,1 0 D2,3
D3,1 D3,2 0
0 D1,2 D1,3 D2,1 0 D2,3
D3,1 D3,2 0
22Unsupervised Speaker Indexing
- The Restrained-Relative Minimum Distance (RRMD)
Approach
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Observe distance
Reference 2
Reference 1
Unusable Data
Failed
Min. Distance
Relative Distance Condition
Failed
Restraining Condition
Passed
Same Speaker?
Same Speaker
Passed
23RRMD Approach
- Restraining Condition
- Distance Likelihood Ratio
- DLR gt 1 ? Same Speaker
- DLR lt 1 ? Check Relative
- Distance Condition
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
24RRMD Approach
- Relative Distance Condition
- Relative Distance
- Drel dmax dmin
- Drel gt threshold
- ? Same Speaker
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
dmin
dmax
25Experiments and Results
- Experiments
- HTIMIT used for obtaining likelihood ratio
parameters - 1000 same speaker and 1000 different speaker
utterances computed - 100 conversations from Switchboard database used
for evaluation
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
26Indexing Results - Mahalanobis
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
27Indexing Results T-Square
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
28Indexing Results - Bhattacharyya
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
29Indexing Results - Summary
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
- Mahalanobis distance yielded best results
- LPCCs outperformed MFCCs
30Speaker Count System
- The Residual Ratio Algorithm (RRA)
- Process is repeated K-1 times for counting up to
K speakers
Too little data Removed, select Another model
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
DLR-based Model Comparison
DLR-based Model Comparison
. . .
31Speaker Count
- Added Residual Ratio
- Is the sum of the residual ratios in all
elimination stages - Should be higher for greater number of speakers
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
32Experiments and Results
- Experiments
- 4000 conversations generated from HTIMIT
- All 40 conversations from new database used
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
33Speaker Count Results - HTIMIT
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
34Speaker Count Results - HTIMIT
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
35Speaker Count Results TCD
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
36 Speaker Count Results TCD
MFCC
LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
37Cross Evaluation
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
HTIMIT LPCCs with the WDBC TCD MFCCs with the
T-Square
38Speaker Counting-Indexing
- The Residual Ratio speaker count algorithm is
applied - Test models are associated with their matching
reference models - Unmatched models are assigned to the references
from which it has the minimum distance.
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
39Speaker Counting /Indexing Results
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Solid - HTMIT Patterned TCD
40- Fusion of Distance Measures
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
41Correlation Analysis
Draftsmans Display - LPCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
42Best Distance
- Optimal Criteria for Fusion of Distances
- Maximize inter-speaker variation
- Minimize intra-speaker variation
- Maximize T-test value between inter-class
distance distributions
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
43Decision Level Fusion
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
D1 gt match
D2 gt no match
Match ¾ No Match ¼ Final Decision Match
D3 gt match
D4 gt match
44Weighted Decision Level Fusion
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
Ti T-value corresponding to each distance
45 Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
46Research Goal
- To differentiate between speakers in a
conversation - To determine the number of speakers present
- To determine who is speaking when
- To overcome the following challenges
- No a priori information
- Limited data size
- No knowledge of change points
- Co-channel speech
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
47Summary of Accomplishments
- Novel model formation technique
- Three novel approaches for conversations-based
speaker differentiation - Distance combination techniques to enhance
performance
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
48Observations
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
- Mahalanobis Distance, LPCCs optimal for standard
databases - T-Square Distance, MFCCs optimal for new database
- Best fusion technique Weighted voting
combination technique most efficient
49Conclusion
- Developed system yields about 6 EER whereas
state of the art speaker indexing systems yield
about 10 error rate. - Methods for discrimination between speakers
(speaker count or indexing) in CONVERSATIONS with
more than two speakers have been introduced.
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
50 Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
51Further Research
- Investigation of prosodic speaker discrimination
features - Improving model formation technique by
determining speaker change-points a priori - Exploring the use of individual phonemes to form
models
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
52Further Research, contd
- Investigating the use of unvoiced speech,
cautiously, in the formation of models - Speech enhancement techniques to handle distorted
data - Implementation of other fusion techniques such as
KL measure of divergence
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
53Publications
- U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt,
Unsupervised Indexing of Noisy conversations
with Short Speaker Utterances, IEEE Aerospace
Conference. March, 2007 - Â U. Ofoegbu, A. Iyer, R. Yantorno, Detection of
a Third Speaker in Telephone Conversations,
ICSLP, INTERSPEECH 2006 - U. Ofoegbu, A. Iyer, R. Yantorno, A Simple
Approach to Unsupervised Speaker Indexing, IEEE
ISPACS. 2006. - U. Ofoegbu, A. Iyer, R. Yantorno, A Speaker
Count System for Telephone Conversations, IEEE
ISPACS. 2006.
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
54ACKNOWLEDGEMENT To the greatest teacher in the
world, and the one who has made the most impact
in my life, Dr. Robert E. Yantorno. To my best
friend and the love of my life, Dr. Jude C.
Abanulo To Dr. Brian Butz, Dr. John Helferty, Dr.
Saroj Biswas and Dr. Henry Sendaula To my
dissertation committee members, Dr. Iyad Obeid
and Dr. Dennis Silage, and to Dr. Rena Krakow. To
my friend, Ananth Iyer To Abdoul Fall, Joe
Fitschgrund, Angela Linse and Ralph Oyini and to
the members of the Speech Processing Lab and the
faculty of the electrical engineering department
To engineering administrators, Tamika Butler,
Carol Dahlberg, Yvette Gibson and Cheryl Sharp,
and to Louise, day time janitress for the
engineering building To the Temple students who
volunteered as participants in the New
Conversations Database To Temple To the Air Force
Research Labs at Rome Financial supporters of
most of the research To my parents, Ugo Joseph
Ofoegbu my siblings, Amaka Humphrey Onyendi,
Nene, Obinna and Chibuzor Ofoegbu and my
grandmother, Cordelia Osuji To God Thank you.
Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D.
55 Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D. Brett Smolenski, Ph.D.
Extra Slides
56Cepstral Analysis
Frequency Analysis of Speech
Excitation Component
Vocal Tract Component
STFT of Speech
Slowly varying formants
Fast varying harmonics
X
Log of STFT
Log of Excitation
Log of Vocal Tract Component
IDFT of Log of STFT
Excitation
Vocal tract
57Cepstral Features
- Linear Predictive Cepstral Coefficients
- Obtained Recursively from LPC Coefficients
Let LPC vector a0 a1 a2 ap  and LPCC
vector c0 c1 c2 cp c0 c1 c2 cn-1Â Â Â Â Â
58Conversational Data Modeling
- Current Method
- Equal Segmentation of Data
- Indiscriminate use of data
- Problems
- Change points unknown
- Not all speech is useful
59Best Distance
- Intra-speaker and inter-speaker distance lengths
are always equal, therefore - P sum of the covariance matrices of the
two classes. - ?1 maximum eigenvalue obtained by solving
the - generalized eigenvalue problem
- Q is the square of the distance between the
mean vectors - of the two classes
60Best Distance
Distance Measure 2
Distance Measure 1
61RRMD Approach
- Relative Distance Condition
62Modeling Analysis
N 20 4 seconds of voiced
speech
63Modeling Analysis
N 5 1 second of voiced
speech
64Distance Measures
- Mahalanobis Distance
- Measures the separation between the means of both
classes - Hotellings T-Square Statistics
- Measures the separation between the means of both
classes and takes into consideration the data
lengths - Kullback-Leibler Distance
- Measures the separation between the distribution
of both classes - Bhattacharyya Distance
- Derived from measuring the classification error
between both classes - Levenes Test
- Measures absolute deviation from the center of
the class distribution
65Speaker Recognition
- Speaker Identification
- Who is this speaker?
- Speaker Verification
- Is he who he claims to be?
System Output
66Speaker Segmentation
- Broadcast News/Conference Data
- Conversational Data
67Procedural Set-up
- Intra-speaker distance computations
- 384-Speaker database used
- Average Utterance Length 5 seconds
Inter-speaker distance computations
68Best N Estimation
- 245 conversations from SWITCHBOARD used
- Results shown for T-Square distance
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
N 5
69RRA Examples 2 Speakers
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
70RRA Examples 3 Speakers
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
71Comparison
TWO-SPEAKER RESIDUAL
THREE-SPEAKER RESIDUAL
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
Residual Ratio after 2nd round of RRA
Residual Ratio after 2nd round of RRA
Speaker 2
72Effects of Fusion
LPCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
73Effects of Fusion
LPCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
74Effects of Fusion
MFCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
75Effects of Fusion
MFCCs
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
76Best Feature Size
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
77Best Feature Size
Addressing the Challenges Applications Methods
Modeling Speakers Speaker Indexing Speaker
Count Speaker Count-Indexing Fusion of
Distances Evaluation Summary and Further
Research
78Correlation Analysis
Draftsmans Display - MFCC
Introduction Evaluation Databases Modeling
Speakers Application Systems Fusion of Distance
Measures Summary Further Research
79 Advisor Robert Yantorno, Ph.D Committee
Members Brian Butz, Ph.D. Dennis Silage,
Ph.D. Iyad Obeid, Ph.D. Brett Smolenski, Ph.D.