Audiovisual Speech Recognition - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Audiovisual Speech Recognition

Description:

Develop a model for lip shape. Represent it by the parameters ... SVM classifier (with/without beard) Visual Feature Selection ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 18
Provided by: shank5
Category:

less

Transcript and Presenter's Notes

Title: Audiovisual Speech Recognition


1
Audio-visual Speech Recognition
  • ECE 285 Class Presentation - 2
  • Shankar Shivappa
  • 2/14/2005

2
Overview
  • Visual Feature Selection for AVSR
  • Model Based Approach
  • Appearance Based Approach
  • PCA / LDA in Pattern Classification
  • Support Vector Machines - An Overview

3
Audio Visual Speech Recognition
4
Visual Features for AVSR
  • AVSR is basically Pattern Classification
  • Pattern Classification involves
  • Feature Selection - Quantify Observations
  • Training of Classifiers - Learning the pattern
  • Testing

5
Pattern Classification
  • Ex Two Class Case
  • Goal - Detect boundary between classes

6
Model Based Approach
  • Develop a model for lip shape
  • Represent it by the parameters
  • For each frame, get the best-fit parameters
  • Advantage
  • Robust
  • Small number of parameters
  • Physical interpretation of parameters

7
Geometric Lip Model
8
Geometric Lip Model
  • S - introduces skewness
  • ? controls deviation from parabolic shape

9
Probability Map generation
  • For each pixel (x,y),
  • P1(x,y) prob((x,y) ? lip region)
  • P2(x,y) prob((x,y) ? Non-lip region)
  • Using Clustering
  • Luminance
  • Color

10
Best fit
  • Optimum parameters to match template and actual
    image
  • Gradient search for finding the optimum
  • Reduced parameter set is used!

11
Appearance Based Approach
12
Mouth Detection and Tracking
  • Face Detection
  • Multi Scale Search - Done by re-sampling
  • Histogram Equalization
  • SVM classifier (with/without beard)

13
Visual Feature Selection
  • 64 X 64 pixels around the mouth center
  • 32 dimensions extracted by PCA
  • Up-sampled to Audio Feature rate
  • N observations are concatenated
  • Projected to 13 class LDA space

14
PCA / LDA
  • Dimensionality reduction
  • PCA chooses directions of Max. Variation in
    feature space
  • LDA chooses directions of Max. Separation between
    Classes

15
Support Vector Machines
16
SVM
  • Apply a transformation to higher dimensional
    space - Kernel

17
SVM
  • Goal - To maximize the margin between classes
Write a Comment
User Comments (0)
About PowerShow.com