EE 516 Lecture 1 - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

EE 516 Lecture 1

Description:

Vector Quantization. Gaussian Mixtures. The EM Algorithm. Speaker ... Along with business interest, the driving force in advancing the State-of-the-Art ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 30

Provided by: ResearchM53

Category:

Tags: lecture

more less

Transcript and Presenter's Notes

Title: EE 516 Lecture 1

1
EE 516 Lecture 1

Geoffrey Zweig
Microsoft Research
4/2/2009

2
Our Topics
From JHU 2002 SuperSID Final Presentation
Reynolds et al.
3
Topic Coverage By Day

Data Representations and Models (4/23)
Vector Quantization
Gaussian Mixtures
The EM Algorithm
Speaker Identification (5/7)
Language Identification (5/7)
Hidden Markov Models (5/14)
Dynamic Programming
Building a Speech Recognizer (5/14)

4
Language Identification Why Do it?

Multi-lingual society
Applications should be able to deal with anyone
Businesses
Automated help systems
Reservations, account access, etc.
Travel
Airport Kiosks
Train stations
Government
Funds research to identify languages
Runs evaluations in it

5
How Do You Do it?
6
How Do You Do It? (2)
p ih n s probably English
k r p s t probably Czech
After Zissman 1996
7
How Do You Do It (3)
Same methods multiple times
After Zissman 1996
8
How Do You Do It? (4)
Run a complete speech recognizer in each language
And we will see several other ways, and
combinations!
After Zissman 1996
9
Gauging Progress The NIST Evaluations

National Institute of Standards and Technology
Has sponsored benchmark tests in multiple
language processing areas for over a decade
Topic Detection Tracking
Content Extraction
Video Analysis
Speech Recognition
Language Identification
Speaker Identification
Machine Translation
http//www.itl.nist.gov/iad/mig/tests/
Coordination with site funding by Defense
Advanced Research Projects Agency (DARPA)
Along with business interest, the driving force
in advancing the State-of-the-Art

10
For Example, Progress in Speech Recognition
11
Language Identification - How Well Can It Be Done
Who Salutes?
Organization Location
Beijing Naphoo Technology Company China
Brno University of Technology Czech Republic
Georgia Institute of Technology USA
Groupe des Ecoles des Telecommunication, Ecole Nationale Superieure des Telecommunications France
IBM USA
IKERLAN Technological Research Center Spain
Institut de Recherche en Informatique de Toulouse France
Institute for Infocomm Research Singapore
Institute of Acoustics, Chinese Academy of Sciences China
Institut National de Recherche sur les Transports et Leur Securite France
International Computer Science Institute (USA) USA
Laboratoire d'Informatique pour la Mecanique et les Sciences de l'Ingenieur France
MIT Lincoln Laboratory USA
Nanyang Technological University Singapore
Politecnico di Torino Italy
Spescom Datavoice South Africa
Telefonica I D Spain
TNO Human Factors The Netherlands
Tsinghua University China
Universidad Autnoma de Madrid Spain
University of the Basque Country Spain
University of Stellenbosch South Africa
University of Science and Technology of China China
From NIST 2007 LRE Website
12
How Well Can it Be Done What Languages?
From NIST 2007 LRE Website
13
How Well Can It Be Done? Testing Conditions

26 languages and dialects
Telephone speech
Multiple duration conditions
3, 10, 30 seconds
Detection Error Tradeoff (DET) Curves used to
measure performance

14
How Well Can it Be Done Some Numbers
From NIST 2007 LRE Website
15
Language Identification Project

Build a language ID system with the Call Friend
Data set
Implement several of the main techniques
Set up a demo on your laptop that will recognize
someones language

16
Flavors of Speaker Recognition
Our Focus!
From JHU 2002 SuperSID Final Presentation
Reynolds et al.
17
Speaker Recognition Why Do It?

Personal Applications
Voice-print passwords
Voicemail transcription who left that message?
Business Applications
Calling your bank
Government
Is that Osama calling from Pakistan?
Prison call monitoring
Automated parolee calling is he where you think?

18
How Do You Do It?
More recently Support vector machines operating
on GMMs (!)
19
How Do You Do It? (2)
Also use high-level information!
From JHU 2002 SuperSID Final Presentation
Reynolds et al.
20
How Well Can It Be Done Who Salutes?
From NIST 2008 SRE Presentation, Martin
Greenberg
21
More Salutes
From NIST 2008 SRE Presentation, Martin
Greenberg
22
From Europe
From NIST 2008 SRE Presentation, Martin
Greenberg
23
More From Europe
From NIST 2008 SRE Presentation, Martin
Greenberg
24
U.S. Entries
From NIST 2008 SRE Presentation, Martin
Greenberg
25
How Well Can It Be Done Testing Conditions