Audiovisual Speech Recognition - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Audiovisual Speech Recognition

Description:

Develop a model for lip shape. Represent it by the parameters ... SVM classifier (with/without beard) Visual Feature Selection ... – PowerPoint PPT presentation

Number of Views:146

Avg rating:3.0/5.0

Slides: 18

Provided by: shank5

Category:

Tags: audiovisual | beard | recognition | speech

Transcript and Presenter's Notes

Title: Audiovisual Speech Recognition

1
Audio-visual Speech Recognition

ECE 285 Class Presentation - 2
Shankar Shivappa
2/14/2005

2
Overview

Visual Feature Selection for AVSR
Model Based Approach
Appearance Based Approach
PCA / LDA in Pattern Classification
Support Vector Machines - An Overview

3
Audio Visual Speech Recognition
4
Visual Features for AVSR

AVSR is basically Pattern Classification
Pattern Classification involves
Feature Selection - Quantify Observations
Training of Classifiers - Learning the pattern
Testing

5
Pattern Classification

Ex Two Class Case
Goal - Detect boundary between classes

6
Model Based Approach

Develop a model for lip shape
Represent it by the parameters
For each frame, get the best-fit parameters
Advantage
Robust
Small number of parameters
Physical interpretation of parameters

7
Geometric Lip Model
8
Geometric Lip Model

S - introduces skewness
? controls deviation from parabolic shape

9
Probability Map generation

For each pixel (x,y),
P1(x,y) prob((x,y) ? lip region)
P2(x,y) prob((x,y) ? Non-lip region)
Using Clustering
Luminance
Color

10
Best fit

Optimum parameters to match template and actual
image
Gradient search for finding the optimum
Reduced parameter set is used!

11
Appearance Based Approach
12
Mouth Detection and Tracking

Face Detection
Multi Scale Search - Done by re-sampling
Histogram Equalization
SVM classifier (with/without beard)

13
Visual Feature Selection

64 X 64 pixels around the mouth center
32 dimensions extracted by PCA
Up-sampled to Audio Feature rate
N observations are concatenated
Projected to 13 class LDA space

14
PCA / LDA

Dimensionality reduction
PCA chooses directions of Max. Variation in
feature space
LDA chooses directions of Max. Separation between
Classes

15
Support Vector Machines
16
SVM

Apply a transformation to higher dimensional
space - Kernel

17
SVM

Goal - To maximize the margin between classes

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Leveraging Wideband Codecs for VoIP Development Laurent Amar President, VoiceAge Corporation PowerPoint PPT Presentation

Leveraging Wideband Codecs for VoIP Development Laurent Amar President, VoiceAge Corporation - ITU-T 2002 G.722.2 recommended for wideband speech ... OMA: Open Mobile Alliance (an organization formed to facilitate the global user ... | PowerPoint PPT presentation | free to view

TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK Monday 12th PowerPoint PPT Presentation

TAPE workshop on the curation and preservation of audiovisual collections University of Glasgow, Scotland, UK Monday 12th - ... on the RAI archive shows that about 80% of the tape handling was due to ... An order can be automatically issued to central archive for the download of the ... | PowerPoint PPT presentation | free to view

Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread PowerPoint PPT Presentation

Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread - Location: Line_obj. Line. Color: green. Label: Evacuation route ... Mutual. Disambiguation. Each input mode provides a set of scored recognition hypotheses ... | PowerPoint PPT presentation | free to view

Audio-Visual Speech and Speaker Recognition PowerPoint PPT Presentation

Audio-Visual Speech and Speaker Recognition - Audio-Visual Speech and Speaker Recognition. G rard Chollet, Guido Aversano, ... In Stork, D.G. and Hennecke, M.E. (Eds.), Speechreading by Humans and Machines. ... | PowerPoint PPT presentation | free to view

Lipreading and Audiovisual Discourse Comprehension by Older Adults in Favorable and Unfavorable Cond PowerPoint PPT Presentation

Lipreading and Audiovisual Discourse Comprehension by Older Adults in Favorable and Unfavorable Cond - Lipreading and Audiovisual Discourse Comprehension by Older Adults in Favorable ... performance on other tests such as reading comprehension (e.g., WAIS, Woodcock ... | PowerPoint PPT presentation | free to view

France Telecom's expectations and research in Object Recognition PowerPoint PPT Presentation

France Telecom's expectations and research in Object Recognition - L'acceptation de ce document par son destinataire implique, de la part de ce ... aucune divulgation et aucune utilisation commerciale sans l'accord pr alable ... | PowerPoint PPT presentation | free to view

ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery PowerPoint PPT Presentation

ContentBased Video Analysis based on Audiovisual Features for Knowledge Discovery - ... tree, a lot of keyword categories: politics, entertainment, stock, art, war, etc. ... Find the hidden links between isolated news, events, etc. ... | PowerPoint PPT presentation | free to view

Emotion Recognition from Physiological Measurement Biosignal PowerPoint PPT Presentation

Emotion Recognition from Physiological Measurement Biosignal - Dependent on the state of sympathetic arousal ... Provide a wide range of links between physiological features and emotional states ... | PowerPoint PPT presentation | free to view

Quiz PowerPoint PPT Presentation

Quiz - Speech audio. Teleconferencing and videoconferencing ... Visible display: 0.28 x 0.21 meters ... Integrating non-speech audio into interfaces. Reference 4,5 ... | PowerPoint PPT presentation | free to view

Computer Vision, Speech Communication PowerPoint PPT Presentation

Computer Vision, Speech Communication - V. Pitsikalis (speech: recognition, fractals/chaos, fusion) ... C0 or logE. HIWIRE Advanced Front-end: Things to Be Done. Script is in Testing Phase ... | PowerPoint PPT presentation | free to view

Lipreading and Audiovisual Discourse Comprehension by Older Adults in Favorable and Unfavorable Cond PowerPoint PPT Presentation

Lipreading and Audiovisual Discourse Comprehension by Older Adults in Favorable and Unfavorable Cond - 54 persons with age appropriate hearing, pure-tone averages (M = 27.8, SD=10.8) ... a. Pure-tone-average. b. Vision-only speech recognition: BAS (favorable) ... | PowerPoint PPT presentation | free to view

Three-month-old Infants Recognize Faces in Unimodal Visual but not Bimodal Audiovisual Stimulation Lorraine E. Bahrick1, Lisa C. Newell2, Melissa Shuman1, and Yael Ben1 1 Florida International University 2 University of Miami PowerPoint PPT Presentation

Three-month-old Infants Recognize Faces in Unimodal Visual but not Bimodal Audiovisual Stimulation Lorraine E. Bahrick1, Lisa C. Newell2, Melissa Shuman1, and Yael Ben1 1 Florida International University 2 University of Miami - ... as to what conditions enhance or attenuate face recognition in early infancy. ... face discrimination is enhanced under unimodal visual conditions and attenuated ... | PowerPoint PPT presentation | free to view

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session PowerPoint PPT Presentation

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session - capability of machines to recognize, express, model, communicate and respond to ... deictic/conversational gestures 'body language' ... | PowerPoint PPT presentation | free to view

Computer Vision in the Interface PowerPoint PPT Presentation

Computer Vision in the Interface - 'Computer vision technology can be used to build machines that 'look at people' ... as vision, speech and sound processing and haptic I/O into the user interface. ... | PowerPoint PPT presentation | free to view

Getting started PowerPoint PPT Presentation

Getting started - ... pragmatic/semantic parameters to prosodic ones, finding the combinations that ... These will then be used to build a prosodic model for each emotional state. ... | PowerPoint PPT presentation | free to view

Getting started PowerPoint PPT Presentation

Getting started - Multilingual and Multisensorial Communication (MMC) Speech-to ... Istituto Trentino di Cultura Centro per la Ricerca Scientifica e Tecnologica (ITC-irst) ... | PowerPoint PPT presentation | free to view

EURESCOM P905 (AQUAVIT) Audio and audiovisual quality for mobile services PowerPoint PPT Presentation

EURESCOM P905 (AQUAVIT) Audio and audiovisual quality for mobile services - 1. IP Cablecom and MEDIACOM 2004. EURESCOM P905 (AQUAVIT) ... Encoders. MPEG-4. Testbeds. Coding of signals for UMTS on send side. IP Cablecom and MEDIACOM 2004 ... | PowerPoint PPT presentation | free to view

Computer Vision, Speech Communication PowerPoint PPT Presentation

Computer Vision, Speech Communication - 4 Noises (artificial): subway, babble, car, exhibition. 5 SNRs : 5, 10, 15, 20dB , clean ... Application to Aurora 3. Fusion with other features. HIWIRE Meeting, ... | PowerPoint PPT presentation | free to view

WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview PowerPoint PPT Presentation

WorkPackage 5: Multimodal Processing and Interaction ETEAMS Overview - ... Articulatory Speech Inversion. ... on common research agendas for AV-ASR and AV speech inversion ... Table-of-Contents of selected chapters is discussed with ... | PowerPoint PPT presentation | free to view

multimodal%20emotion%20recognition PowerPoint PPT Presentation

multimodal%20emotion%20recognition - Two approaches have been developed and used for audiovisual ... 34], F33[-189,-109], F34[-183,-105], F35[-101,-31], F36[-108,-32], F37[29,85], F38[27,89] ... | PowerPoint PPT presentation | free to view

Introduction to Conversational Interfaces PowerPoint PPT Presentation

Introduction to Conversational Interfaces - Text-to-Speech. Conversion. Dialogue. Management. Dialogue. Management. Language. Understanding ... Text-to-Speech. Conversion. Models. Models. Rules. Models ... | PowerPoint PPT presentation | free to view

Locally Linear Embedding LLE PowerPoint PPT Presentation

Locally Linear Embedding LLE - High dimensional data appears frequently in statistical pattern recognition ... used in audiovisual speech synthesis. and in visual pattern recognition. 12/4/09. 6 ... | PowerPoint PPT presentation | free to view

Chapter 14 MPEG Audio Compression PowerPoint PPT Presentation

Chapter 14 MPEG Audio Compression - MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio ... speech compression, perceptually based coders, text-to-speech, and MIDI MPEG-4 AAC ... | PowerPoint PPT presentation | free to view

Module u1: Speech in the Interface 3: Speech input and output technology PowerPoint PPT Presentation

Module u1: Speech in the Interface 3: Speech input and output technology - Module u1: Speech in the Interface 3: Speech input and output technology Jacques Terken SAI User-System Interaction u1, Speech in the Interface: 3. | PowerPoint PPT presentation | free to view

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session PowerPoint PPT Presentation

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session - multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems Lab | PowerPoint PPT presentation | free to view

Voice Over For Videos - The Ultimate Guide - Hei.io PowerPoint PPT Presentation

Voice Over For Videos - The Ultimate Guide - Hei.io - Voice over is often used interchangeably with voice dubbing, but these are not the same. Dubbing, a.k.a. Language Replacement, substitutes onscreen character’s language for a translation that mimics expressions and tone while matching words with mouth movements. https://www.hei.io/guide/video-voice-over | PowerPoint PPT presentation | free to view

Enhancing Accessibility with ClosedCaptions PowerPoint PPT Presentation

Enhancing Accessibility with ClosedCaptions - In today's digital age, where content consumption happens across various platforms and devices, ensuring inclusivity and accessibility is paramount. For individuals who are deaf or hard of hearing, closed captions play a vital role in making audiovisual content understandable and enjoyable. That's where ClosedCaptions, a leading closed caption service provider, steps in, dedicated to making content accessible to all. | PowerPoint PPT presentation | free to view