Third Ear A Voice Recognition System Platform

About This Presentation

Title:

Third Ear A Voice Recognition System Platform

Description:

Work in parallel. Allow component upgrades. Third Ear - Voice Recognition. Iterative ... How does the evaluator compute errors? feed input file to simulator ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 35

Provided by: dav87

Category:

more less

Transcript and Presenter's Notes

Title: Third Ear A Voice Recognition System Platform

1
Third Ear A Voice Recognition System Platform

Sameh Ebeid
Kiet Do
Yi Yang
Igor Likhotkin
Gerardo Quiñones
Andrew Davis

2
Outline

Re-introduce Customer Davis
Current State of the Art Davis
System Overview Quiñones
Discrete Subsystems
Audio Capture/DSP - Quiñones
Speech Recognizer - Davis
Evaluator - Likhotkin
Simulator - Yang
User Interface - Ebeid
Demonstration Do Likhotkin

3
The Customer Joan Bolker

Practicing clinical psychologist
communicate with patients
conferences
Able to read lips
needs augmentation
needs to see whole face

4
Ideal System Performance

Real-time
100 accuracy
Unaffected by ambient noise
Multiple speakers
Portable
Unobtrusive
Highly readable

5
Current State of the Art

Two extremes
one speaker
large vocabulary
must be trained
many speakers
small vocabulary
Software
commercial Dragon Naturally Speaking
open-source CMU Sphinx
Customer requires large vocabulary for many
speakers

6
Our Proposal

Modular test platform
optimize personnel allocation
minimal acceptable functionality
upgradeable
multi-year
TCP/IP socket communication
commercial hardware
open-source software
modified as needed

7
Architectural Considerations

Testing the Boundaries
Significant Experimentation
New Market Niche
High Risk

8
Project Strategy

Mock-ups
Rapid Prototyping
Strong Functional Decomposition
Iterative Development
Aggressive Scheduling

9
System Overview
10
Mock-ups

Establish good customer dialog
Define external interfaces
No functionality
Help customer visualize product

11
Rapid Prototyping

Deliver functionality early
Use off-the-shelf components
Identify weakest link in chain
Redirect resources
Invent only what is needed

12
Strong Functional Decomposition

Break product into small pieces
Clearly define interfaces
Use standard protocols
Decouple technical choices
Work in parallel
Allow component upgrades

13
Iterative Development

The most common cause of project failure is lack
of calendar time
Deliver components early to customer
Become aware of deficiencies while there is still
time to react
Be ready to stop project any time

14
Audio Capture Subsystem

Captures raw analog audio
Eliminates noise
Cancels echo
Delivers digital audio to speech recognizer

15
Audio Capture Problem Statement

Supplements lip reading
Cant obscure speakers faces
No headsets
No wireless microphones
Must be unobtrusive

16
Audio Capture Technical Background

DSP advancing rapidly
Beam forming technology
Beam steering technology
Solutions are proprietary and expensive
Moores law to the rescue
Deliver modest solution today

17
Audio Capture Proposed Solution

Single directional microphone
Stereo mike stretch goal
Uncompressed audio stream
TCP/IP socket client
Stereo mike requires funding

18
Speech Recognizer

Input from audio capture
digitized, monaural, may have DSP
Segment input file
Compare to acoustic model
identify phonemes
Reassemble with language model
reassemble phonemes into most likely words
Output to user interface
text file

19
CMU Sphinx

Several versions
different languages
speed/accuracy tradeoffs
Start with Version 2
written in C
works in real-time, not most accurate
Need customer feedback
Modify language model
May change version

20
WER Evaluator - Intro

What is the WER Evaluator?
computes the Word Error Rate introduced by the
speech recognizer or simulator (percent of
interpreted words that were incorrect)
Speech Recognition Errors
deletion (omitted words)
insertions (extra words that were not spoken)
substitutions (misinterpreted words e.g., pirate
instead of pilot)

21
WER Evaluator - Operation

How does the evaluator compute errors?
feed input file to simulator
collect output into another file
Compares two files using a modified Levenshtein
algorithm
fewest operations needed to change from one
string to another
For actual recognizer
take input file used by the reader/tester
collect recognizer output into second file for
same type of comparison

22
WER Evaluator - Importance

Why is evaluator needed?
collects data on performance of simulator or
recognizer
precise WER measurements needed to prevent
simulator/recognizer performance degradation

23
The Simulator

Serve as feature demonstration for customer
Gain feedback from customer about GUI module
Determine acceptable error rate to set benchmark
for integrated system

24
The Simulation System
Simulator
GUI
Customer
Reader
25
The Simulator Solution

Read .txt file
Create errors -- WER
deletion
insertion
substitution, such as using homophones
Output word stream to TCP/IP socket

26
The Simulator GUI
27
Simulation Limitations

The errors created by simulator are not realistic
actual speech recognition engine may create
different types of errors
Unrealistic delay time
impossible to predict real delay time right now

28
User Interface - Purpose

Display output of speech recognizer or simulator
Allow user to control entire voice recognition
system

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Links

Wiki page
http//voicerecognition.wetpaint.com
Current state of the art (Microsoft Vista)
http//voicerecognition.wetpaint.com/page/Speech2
0Recognition20State20of20the20Art

Write a Comment

User Comments (0)