Third Ear A Voice Recognition System Platform - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Third Ear A Voice Recognition System Platform

Description:

Work in parallel. Allow component upgrades. Third Ear - Voice Recognition. Iterative ... How does the evaluator compute errors? feed input file to simulator ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 35
Provided by: dav87
Category:

less

Transcript and Presenter's Notes

Title: Third Ear A Voice Recognition System Platform


1
Third Ear A Voice Recognition System Platform
  • Sameh Ebeid
  • Kiet Do
  • Yi Yang
  • Igor Likhotkin
  • Gerardo Quiñones
  • Andrew Davis

2
Outline
  • Re-introduce Customer Davis
  • Current State of the Art Davis
  • System Overview Quiñones
  • Discrete Subsystems
  • Audio Capture/DSP - Quiñones
  • Speech Recognizer - Davis
  • Evaluator - Likhotkin
  • Simulator - Yang
  • User Interface - Ebeid
  • Demonstration Do Likhotkin

3
The Customer Joan Bolker
  • Practicing clinical psychologist
  • communicate with patients
  • conferences
  • Able to read lips
  • needs augmentation
  • needs to see whole face

4
Ideal System Performance
  • Real-time
  • 100 accuracy
  • Unaffected by ambient noise
  • Multiple speakers
  • Portable
  • Unobtrusive
  • Highly readable

5
Current State of the Art
  • Two extremes
  • one speaker
  • large vocabulary
  • must be trained
  • many speakers
  • small vocabulary
  • Software
  • commercial Dragon Naturally Speaking
  • open-source CMU Sphinx
  • Customer requires large vocabulary for many
    speakers

6
Our Proposal
  • Modular test platform
  • optimize personnel allocation
  • minimal acceptable functionality
  • upgradeable
  • multi-year
  • TCP/IP socket communication
  • commercial hardware
  • open-source software
  • modified as needed

7
Architectural Considerations
  • Testing the Boundaries
  • Significant Experimentation
  • New Market Niche
  • High Risk

8
Project Strategy
  • Mock-ups
  • Rapid Prototyping
  • Strong Functional Decomposition
  • Iterative Development
  • Aggressive Scheduling

9
System Overview
10
Mock-ups
  • Establish good customer dialog
  • Define external interfaces
  • No functionality
  • Help customer visualize product

11
Rapid Prototyping
  • Deliver functionality early
  • Use off-the-shelf components
  • Identify weakest link in chain
  • Redirect resources
  • Invent only what is needed

12
Strong Functional Decomposition
  • Break product into small pieces
  • Clearly define interfaces
  • Use standard protocols
  • Decouple technical choices
  • Work in parallel
  • Allow component upgrades

13
Iterative Development
  • The most common cause of project failure is lack
    of calendar time
  • Deliver components early to customer
  • Become aware of deficiencies while there is still
    time to react
  • Be ready to stop project any time

14
Audio Capture Subsystem
  • Captures raw analog audio
  • Eliminates noise
  • Cancels echo
  • Delivers digital audio to speech recognizer

15
Audio Capture Problem Statement
  • Supplements lip reading
  • Cant obscure speakers faces
  • No headsets
  • No wireless microphones
  • Must be unobtrusive

16
Audio Capture Technical Background
  • DSP advancing rapidly
  • Beam forming technology
  • Beam steering technology
  • Solutions are proprietary and expensive
  • Moores law to the rescue
  • Deliver modest solution today

17
Audio Capture Proposed Solution
  • Single directional microphone
  • Stereo mike stretch goal
  • Uncompressed audio stream
  • TCP/IP socket client
  • Stereo mike requires funding

18
Speech Recognizer
  • Input from audio capture
  • digitized, monaural, may have DSP
  • Segment input file
  • Compare to acoustic model
  • identify phonemes
  • Reassemble with language model
  • reassemble phonemes into most likely words
  • Output to user interface
  • text file

19
CMU Sphinx
  • Several versions
  • different languages
  • speed/accuracy tradeoffs
  • Start with Version 2
  • written in C
  • works in real-time, not most accurate
  • Need customer feedback
  • Modify language model
  • May change version

20
WER Evaluator - Intro
  • What is the WER Evaluator?
  • computes the Word Error Rate introduced by the
    speech recognizer or simulator (percent of
    interpreted words that were incorrect)
  • Speech Recognition Errors
  • deletion (omitted words)
  • insertions (extra words that were not spoken)
  • substitutions (misinterpreted words e.g., pirate
    instead of pilot)

21
WER Evaluator - Operation
  • How does the evaluator compute errors?
  • feed input file to simulator
  • collect output into another file
  • Compares two files using a modified Levenshtein
    algorithm
  • fewest operations needed to change from one
    string to another
  • For actual recognizer
  • take input file used by the reader/tester
  • collect recognizer output into second file for
    same type of comparison

22
WER Evaluator - Importance
  • Why is evaluator needed?
  • collects data on performance of simulator or
    recognizer
  • precise WER measurements needed to prevent
    simulator/recognizer performance degradation

23
The Simulator
  • Serve as feature demonstration for customer
  • Gain feedback from customer about GUI module
  • Determine acceptable error rate to set benchmark
    for integrated system

24
The Simulation System
Simulator
GUI
Customer
Reader
25
The Simulator Solution
  • Read .txt file
  • Create errors -- WER
  • deletion
  • insertion
  • substitution, such as using homophones
  • Output word stream to TCP/IP socket

26
The Simulator GUI
27
Simulation Limitations
  • The errors created by simulator are not realistic
  • actual speech recognition engine may create
    different types of errors
  • Unrealistic delay time
  • impossible to predict real delay time right now

28
User Interface - Purpose
  • Display output of speech recognizer or simulator
  • Allow user to control entire voice recognition
    system

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Links
  • Wiki page
  • http//voicerecognition.wetpaint.com
  • Current state of the art (Microsoft Vista)
  • http//voicerecognition.wetpaint.com/page/Speech2
    0Recognition20State20of20the20Art
Write a Comment
User Comments (0)
About PowerShow.com