Creating User Interfaces - PowerPoint PPT Presentation

About This Presentation

Title:

Creating User Interfaces

Description:

Creating User Interfaces [Continue presentations as needed] Speech recognition. Speech synthesis Homework: Report on current products. Register on Tellme Studies. – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 30

Provided by: Jeani174

Learn more at: http://faculty.purchase.edu

Category:

more less

Transcript and Presenter's Notes

Title: Creating User Interfaces

1
Creating User Interfaces

Continue presentations as needed Speech
recognition. Speech synthesis
Homework Report on current products. Register on
Tellme Studies. Study VoiceXML

2
Speech recognition

User speaks. System 'understands', at least
enough to perform some action.
Related to (but not the same as)
Natural language understanding
Voice print identification
Record information to be re-played to human in
compressed form for later interaction
Speech synthesis (other direction) words to
speech
?

3
Natural language understanding

Skip speech altogether, but type in statements or
phrases in normal language
What is normal? We tend not to speak that
grammatically
Many 'natural language systems' actually use
keywords
Histor
Moon rocks example
Combine speech to natural language

4
Continuous versus discrete

Speaker speaks 'naturally' versus
Speaker separates words

5
Examples

Dictation no understanding as such, produce
words/sentences in a program
(Telephone) Help desk / Information generally
restricted or directed speech, choosing from
alternatives (may or may not be given). Advances
the process
Restricted commands actually carrying out
operations
Factory example start and stop
Car radio, heat/AC
Phone call specific number

6
Training

Dictation application user takes time to read
specific test to train the system
Note some systems also adapt with use. If when
user corrects the results, system may do better
next time.
Phone lookup user records names. No
'understanding', just record for matching.

7
Audience content

Some systems may allow adapting to audiences, for
example, male versus female
Some systems have restrictions on types of
content
Historical note IBM system in 1980s 1990s was
restricted to male, American-born speakers (no
speech impediments) and legal text.

8
Speech recognition concepts

Air pressure ? diaphragm in phone ?electrical
signal ? (Fourier Transform) ?wave pattern
matched against
sets of canonical patterns (native speaker of
English, perhaps male/female young/old
alternatives)
generated for the specified grammar (using a
segmentationdividing up of the parts)
Note interplay of grammar and statistics
distinguishes different approaches

9
Fourier Transform(Discrete Fourier Transform --
FFT)

Takes data representing a signal
And produces numbers representing the combination
of sine and cosine waves that make up the signal

10
Speech recognition

Works on the product of the FFT
Uses (in most cases)
Segmentation attempt to break up into pieces,
perhaps syllables or words
Grammar definition of what is to be expected
Probabilities if first part matched X, then
greater probability that then next would match to
Y

11
Current State of the Art

General, no restrictions, speech reco, good
enough to act on the speech? always about to
happen?
dictation / substitute for keyboard exists and
satisfies many
Is this most important application for most
users?
May not be killer ap, but may be good for
motivating research
Homework prepare brief report on a current
product or application. Can be one you use
yourself.

12
Speech synthesis

aka TTS (text to speech)
Application determines that the computer needs to
say certain words
lexical units (syllables of words) ?phonemes
?pre-recorded (wav) files of phonemes

13
Speech synthesis

This is again a segmentation process need to
divide up the words and then put together so
speech sounds 'natural'.
particular phoneme may need to sound different
in different context.
also need to deal with abbreviations local
accents
Place names (important in travel weather
applications)
Special case detect and use wav file for each
name.
Older methods were all synthesized
similar distinction between all synthesized and
samples of music

14
Speech synthesis

is essentially the computer reading out loud.
Easy to do most things
More and more difficult to do complete job
Different languages may be easier than English.
People who are not monolingual please comment!

15
Restricted / directed speech applications

We will use the tellme studio engine to create
directed speech applications.
These make use of
Grammars
Options to use numbers (buttons)
Recorded (.wav) sounds
Text to speech

16
studio.tellme.com

Company that provides engine for applications
Provides developing environment
We are doing the Tellme version of VoiceXML, but
it appears to be standard.
Register as a developer
Provide your own id assigned a PIN
Put VoiceXML in ScratchPad place (no audio files)
1-800-555-VXML (8965)
SAY id and then PIN or can give phone number.
Tellme runs either
program in ScratchPad OR
program at Application URL for projects with
multiple files
To look at someone else's project, you change
your Application URL
called pointing your account to a new source.

17
XML

Generalization of HTML
XML documents have markup.
Tag indicating type of element and, possibly with
attributes, content, tag closer.
Document must be well-formed.
Developers decide on element types.

18
VoiceXML

XML document (VXML header)
This means proper nesting of elements, quotation
marks on attributes
VoiceXML has tags for flow-of-control and
calculations.
Also can use ltscriptgt for JavaScript
Grammars come in different varieties. We will
use the Tellme way.
Grammars are included in CDATA tags to prevent
XML interpretation.
Many grammars constructed for you.
ltfield name"answer" type"boolean" gtwill
listen for yes or no. ltfield name"price"
type"currency" gt will listen for currency.
ltmenu gt ltchoice gt ltchoicegt for list

19
Very brief overview

ltvxmlgt document contains ltformgt and/or menu
elements.
ltformgt can contain ltblockgt, ltfieldgt
ltblockgt can contain ltaudiogt or do its own audio
ltfieldgt can contain ltpromptgt, ltgrammargt,
ltnoinputgt, etc.
NOTE certain types of ltfieldgt elements use
built-in grammars, for example, boolean
Can have a child node ltfilledgt that indicates
what to do if there is a match
ltmenugt is a compressed way use a simple grammar

20
Very brief, cont.

Logic can be done using a ltscriptgt element that
contains a variant of JavaScript and/or
vxml logic elements, including
ltvargt
ltifgt, ltelsegt ltelseifgt
other
These may be part of a ltfilledgt element

21
Audio

Tellme studio provides way to record your
speech as a wav file to upload to a website.
Sends it to your email address
You upload your VoiceXML file plus any wav files
(and anything else)ltaudio src"mygreeting.wav"gtWe
lcome to my site lt/audiogtIf Tellme can't find
the mygreeting.wav file, it uses its Text to
Speech on the string "Welcome to my site".
Note you also can use a full URL
http//....
You put in the URL for the voicexml file into
your Tellme studio account, called pointing to
the URL.
TEST

22
VoiceXML basics, continued

ltformgt element can contain
ltblockgt elements, which can contain ltaudiogt,
ltgogt, other
ltfieldgt which can contain
ltpromptgt
ltgrammargt (if not one of built-in grammars)
ltfilledgt
ltvargt tags can be at different levels (for
example, document, block, or higher levels)
ltifgt ltelseifgtltelsegt tags
ltscriptgt elements for JavaScript (which can also
appear in expressionsgt

23
VoiceXML basics typical case

a form element
ltfieldgt
ltpromptgt, made up of ltaudiogt, with reference to
recorded wav file and backup text
ltgrammargt, if NOT using built-in grammars
designated by type attribute of field. This is a
CDATA section.
ltfilledgt with (follow-on) code using field
ltcatchgt for nomatch, noinput cases

24
Caution

A form contains various elements,
including
a field.
If a field has a grammar and the grammar is
satisfied, control goes to a
filled tag

25
obligatory

lt?xml version"1.0"?gt
ltvxml version"2.0"gt
ltformgt
ltblockgt
ltaudio src"prompt1.wav"gtHello, world lt/audiogt
lt/blockgt
lt/formgt
lt/vxmlgt

recorded using tellme studio
backup using TTS, just in case src file missing
26
example

Asks for number of credits and calculates when
you/caller can register
uses built-in grammar for number
No error recovery. You need to do better than
this in your project.
Unfortunate situation there is a element type
filled and an element type field.
The lt symbols are represented using lt

lt?xml version"1.0"?gt
ltvxml version"2.1"
xmlns"http//www.w3.org/2001/vxml"gt
ltform id"credit"gt
ltvar name"rest" expr"1000"/gt
ltfield name"bcount" type"number"gt
ltpromptgt
ltaudio src"howmanycredits.wav"gtHello there.
How many credits have you earned? lt/audiogt
lt/promptgt
ltgrammar type"application/x-gsl" mode"voice" gt
lt!CDATA
NATURAL_NUMBER_THRU_999
gt
lt/grammargt
ltcatch event"noinput nomatch"gt ltaudio
src"sorry.wav"gtSorry. I didn't get that.lt/audiogt
ltexit/gt lt/catchgt

ltfilledgt
ltassign name"rest" expr"bcount"/gt
ltaudiogt ltvalue expr"rest" /gt lt/audiogt
ltif cond"restlt30" gt
ltaudio src"homestretch.wav"gtYou can
register on the third day lt/audiogt
ltelseif cond"restlt60" /gt
ltaudio src"morethanhalf.wav"gtYou can
register on the second day lt/audiogt
ltelseif cond"restlt90" /gt
ltaudio src"goodstart.wav"gtYou can
register on the first daylt/audiogt
ltelse/gt
ltaudiogtYou can register on the fourth
day lt/audiogt
lt/ifgt
ltaudio src"goodbye.wav"gtGood bye.
lt/audiogt
lt/filledgt lt/fieldgt lt/formgt lt/vxmlgt

29
Homework

Do research / think about your own experiences
and come prepared to report on a speech
recognition / speech synthesis application
Start learning VoiceXML

Write a Comment

User Comments (0)