EE 516 Lecture 1 - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

EE 516 Lecture 1

Description:

Vector Quantization. Gaussian Mixtures. The EM Algorithm. Speaker ... Along with business interest, the driving force in advancing the State-of-the-Art ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 30
Provided by: ResearchM53
Category:
Tags: lecture

less

Transcript and Presenter's Notes

Title: EE 516 Lecture 1


1
EE 516 Lecture 1
  • Geoffrey Zweig
  • Microsoft Research
  • 4/2/2009

2
Our Topics
From JHU 2002 SuperSID Final Presentation
Reynolds et al.
3
Topic Coverage By Day
  • Data Representations and Models (4/23)
  • Vector Quantization
  • Gaussian Mixtures
  • The EM Algorithm
  • Speaker Identification (5/7)
  • Language Identification (5/7)
  • Hidden Markov Models (5/14)
  • Dynamic Programming
  • Building a Speech Recognizer (5/14)

4
Language Identification Why Do it?
  • Multi-lingual society
  • Applications should be able to deal with anyone
  • Businesses
  • Automated help systems
  • Reservations, account access, etc.
  • Travel
  • Airport Kiosks
  • Train stations
  • Government
  • Funds research to identify languages
  • Runs evaluations in it

5
How Do You Do it?
6
How Do You Do It? (2)
p ih n s probably English
k r p s t probably Czech
After Zissman 1996
7
How Do You Do It (3)
Same methods multiple times
After Zissman 1996
8
How Do You Do It? (4)
Run a complete speech recognizer in each language
And we will see several other ways, and
combinations!
After Zissman 1996
9
Gauging Progress The NIST Evaluations
  • National Institute of Standards and Technology
  • Has sponsored benchmark tests in multiple
    language processing areas for over a decade
  • Topic Detection Tracking
  • Content Extraction
  • Video Analysis
  • Speech Recognition
  • Language Identification
  • Speaker Identification
  • Machine Translation
  • http//www.itl.nist.gov/iad/mig/tests/
  • Coordination with site funding by Defense
    Advanced Research Projects Agency (DARPA)
  • Along with business interest, the driving force
    in advancing the State-of-the-Art

10
For Example, Progress in Speech Recognition
11
Language Identification - How Well Can It Be Done
Who Salutes?
Organization Location
Beijing Naphoo Technology Company China
Brno University of Technology Czech Republic
Georgia Institute of Technology USA
Groupe des Ecoles des Telecommunication, Ecole Nationale Superieure des Telecommunications France
IBM USA
IKERLAN Technological Research Center Spain
Institut de Recherche en Informatique de Toulouse France
Institute for Infocomm Research Singapore
Institute of Acoustics, Chinese Academy of Sciences China
Institut National de Recherche sur les Transports et Leur Securite France
International Computer Science Institute (USA) USA
Laboratoire d'Informatique pour la Mecanique et les Sciences de l'Ingenieur France
MIT Lincoln Laboratory USA
Nanyang Technological University Singapore
Politecnico di Torino Italy
Spescom Datavoice South Africa
Telefonica I D Spain
TNO Human Factors The Netherlands
Tsinghua University China
Universidad Autnoma de Madrid Spain
University of the Basque Country Spain
University of Stellenbosch South Africa
University of Science and Technology of China China
From NIST 2007 LRE Website
12
How Well Can it Be Done What Languages?
From NIST 2007 LRE Website
13
How Well Can It Be Done? Testing Conditions
  • 26 languages and dialects
  • Telephone speech
  • Multiple duration conditions
  • 3, 10, 30 seconds
  • Detection Error Tradeoff (DET) Curves used to
    measure performance

14
How Well Can it Be Done Some Numbers
From NIST 2007 LRE Website
15
Language Identification Project
  • Build a language ID system with the Call Friend
    Data set
  • Implement several of the main techniques
  • Set up a demo on your laptop that will recognize
    someones language

16
Flavors of Speaker Recognition
Our Focus!
From JHU 2002 SuperSID Final Presentation
Reynolds et al.
17
Speaker Recognition Why Do It?
  • Personal Applications
  • Voice-print passwords
  • Voicemail transcription who left that message?
  • Business Applications
  • Calling your bank
  • Government
  • Is that Osama calling from Pakistan?
  • Prison call monitoring
  • Automated parolee calling is he where you think?

18
How Do You Do It?
More recently Support vector machines operating
on GMMs (!)
19
How Do You Do It? (2)
Also use high-level information!
From JHU 2002 SuperSID Final Presentation
Reynolds et al.
20
How Well Can It Be Done Who Salutes?
From NIST 2008 SRE Presentation, Martin
Greenberg
21
More Salutes
From NIST 2008 SRE Presentation, Martin
Greenberg
22
From Europe
From NIST 2008 SRE Presentation, Martin
Greenberg
23
More From Europe
From NIST 2008 SRE Presentation, Martin
Greenberg
24
U.S. Entries
From NIST 2008 SRE Presentation, Martin
Greenberg
25
How Well Can It Be Done Testing Conditions
  • Conditions for different amounts of data
  • 10 sec.
  • 3-5 minutes
  • 8 minutes
  • Separate channel and summed channel conditions
  • English-speakers, non-English speakers,
    multilingual speakers

26
How Well Can It Be Done?
27
Speaker Verification Project
  • Implement a Speaker-ID system
  • Template based
  • GMM based
  • SVM based
  • Vector space model
  • Demonstrate it
  • NIST data, e.g. 2001 Evaluation
  • Your own voice implement on laptop

28
Speech Recognition Project
  • Implement an HMM based recognition system
  • Use, e.g., Phonebook isolated word data data set
    or Aurora digit set
  • Write features with existing front-end
  • Build your own HMM trainer/decoder
  • Set it up on your laptop for online word
    recognition (?!)

29
Highlights of Syllabus
  • Required Texts
  • Huang, Acero, Hon Spoken Language Processing
  • Deng and OShaughnessy, Speech Processing
  • EE516 Reader, at Professional Copy n Print,
    4200 University Way
  • Grading
  • Projects 50
  • Final Exam 30
  • Homework 20
  • Projects
  • Small team or individual
  • Teams are self-forming
  • Presentation times TBD
  • Read ahead pick an area!!!
  • Talk to relevant instructor
  • Suggest deciding no later than 4/30
  • Office Hours at end of class and by appointment
  • Please sign in on email list!
Write a Comment
User Comments (0)
About PowerShow.com