Title: Technologies for speech applications
1Chapter 3
- Technologies for speech applications
2Figure 3.1. Speech Technologies
3(No Transcript)
43.2 Touchtone recognition
- Caller responds to hierarchies of voice menus by
pressing buttons on the telephone keypad - Potential problems
- Lost in space
- Time-consuming menus
5Speech recognition
- Potential problems
- Understandability
- Time-consuming dialogs
- Users may interrupt prompts by barge-in
6Table 3.1 Speech Reco Engines
7For voice portals
- Continuous speech
- Speaker independent
- Switch vocabularies
- Spontaneous speech
- Multi-threaded
8Figure 3.4 Speech Recognition
9Phoneme identification
- Use acoustic model to transform extracted
features to sequences of phonemes - Approaches
- Neural networks
- Hidden Markov models
10Word identification
- Use words in language model to convert sequences
of phonemes to words - Two approaches
- Grammars
- N-grams
11Figure 3.6 Language Model Creation
12Developers responsibility
- Acoustic model
- Lexicon
- Language model
133.4 Voice identification
- General techniques for identifying people
- Something you know
- Something you have
- Something about you
14Figure 3.2 Speaker registration, identification,
and authentication
15Voice id technologies are appealing
- Are unobtrusive
- Are location independent
- Require no special equipment
- Replace passwords
16Why voice technologies fail
- Siblings with similar voice profiles
- Teenage male voice break
- Colds, sore throats, sore lips, etc
- Variety of microphones
- Tape recordings
17Measuring accuracy of speech id systems
183.5 Language identification
- Explicit selectionthe caller speaks the name of
his or her preferred national language
ltpromptgt ltspeak xmllang"en-us"gtFor
service in English, say English lt/speakgt
ltspeak xmllang"fr"gtpour service en français,
dites français lt/speakgt lt/promptgt - Implicit selectiona default language is used
unless the caller overrides the default with an
explicit selection ltpromptgt
ltspeak xmllang"fr"gtBienvenue a portal
françaislt/speakgt - ltspeak xmllang"en-us"gtFor
Service in English, say Englishlt/speakgt
lt/promptgt - Calling area
- Caller profile
- One number per language.
- Language recognition technology
193.6 Word spotting
- Attention word
- User signals that he/she will speak
- Switching contexts
- Suspend current activity and begin a new activity
- Extract critical words
- Security
203.7. Language understanding
- Knowledge representation techniques
- Parse trees
- Form templates
- Semantic net
- Approaches for creating knowledge representation
- Parser
- Semantic attachments
- General natural language understanding algorithm
213.8 Classification
- Uses
- Classify documents into categories and topics
- Categorize graphical objects
- Replace menu hierarchies in voice applications
- Navigating in large web sites
- Locating a chapter or section of a large document
- Locating a Web site about a specific topic
- Locating descriptions of goods or services in a
large on-line database - Example
- How may I help you? from ATT
223.9 Dialog management
- Human-driven conversational dialogsthe
person repeatedly asks a question or speaks a
command and the computer responds. - Application-driven conversational
dialogsthe application repeatedly asks questions
to solicit answers and instructions from a
caller. - Mixed-initiative dialogshuman-driven and
computer-driven dialogs are combined. The caller
and computer take turns driving the
conversations.
23Figure 3.9 Voice Interpreter and voice Browser
243.10 NL processing
- Machine translation
- Word replacement
- Phrase translation
- Full national language translation
- Query generation
- Generate SQL query from knowledge representation
- Summarization
- Generate English summary from knowledge
representation - Generation
- Prerecorded sentences
- Templates
- Reversible parse tree
25Figure 3.7 Speech synthesis
26Figure 3.8 Concatenative Speech
27Table 3.2 Concatenative vs parameter-based
speech synthesis
283.12 Music synthesis
- Uses
- Branding
- Set the mood
- Signal the caller
- Fundamental part of the dialog
- Approaches
- Prerecord
- Synthesize on the flyMIDI
293.13. Tools
- VoiceXML interpreters
- Specification tools
- Call flow tools
- Menu generators
- Form and field generators
- Rehearsal tools
- Logging tools
- Performance measurement summary tools
- System performance tools
303.14 Related Technologies
- Distributed Speech recognition
- Noise mitigation
- Noise reduction and cancellation algorithms
- Feature extractionperform on client
- Signal processing algorithms that extract
essential features from acoustic data - Multimodal user interfaces
- WML and WAP
- Conversey tags
313.15 Key concepts
- Lots of technologies may be useful
- Voice identification
- Speaker registration
- Speaker identification
- Speaker verification
- Speech recognition
- Requires acoustic models, lexicons, language
models, and grammars - Speech synthesis
- Synthesis (during development
- Prerecorded (for production)
- Dialog management
- VoiceXML browser