Title: VOICE XML
1VOICE XML
by Jan Bechstein
2What is Voice XML?
- VoiceXML is the HTML of the voice web, the
open standard markup language for voice
applications. - VoiceXML 1.0 was published by the VoiceXML
Forum, a consortium of over 500 companies,in
March 2000. (Main supp. devel. ATT, IBM,
Motorola) - The Forum then gave control of the standard to
the World Wide Web Consortium (W3C).
3What is Voice XML?
- VoiceXML 2.0 are already widely used
- While HTML assumes a graphical web browser with
display, keyboard, and mouse, VoiceXML assumes a
voice browser with audio output, audio input, and
keypad input. Audio input is handled by the voice
browser's speech recognizer. Audio output
consists both of recordings and speech
synthesized by the voice browser's text-to-speech
system
4What is Voice XML?
- A voice browser typically runs on a specialized
voice gateway node that is connected both to the
Internet and to the public switched telephone
network
5(No Transcript)
6Why VoiceXML so new?
VoiceXML takes advantage of several trends
- The growth of the World-Wide Web and of its
capabilities. - Improvements in computer-based speech
recognition and text-to-speech synthesis. - The spread of the WWW beyond the desktop
computer.
7The WWW
- Web servers once delivered only static content,
but now generate it dynamically using scripts,
server pages, servlets and other technologies.
They also provide access to databases and legacy
systems. VoiceXML takes advantage of all these
generation technologies. - The Internet is improving in performance,
bandwidth, and quality of service. These
improvements lead to new types of web
applications and services, which in turn spur
more improvements. VoiceXML strongly benefits
from the ability to move audio data efficiently
across the web
8The speech technology
- Over the phone, and with no speaker training, a
speech recognition system needs to be given a set
of speech grammars that tell it what words and
phrases it should expect - Advances are also being made in speech synthesis,
or text-to-speech (TTS). Not anymore drunken
robots.
9The speech technology
- Automated speech recognition (ASR) systems have
greatly improved in recent years as better
algorithms and acoustic models are developed, and
as more computer power can be brought to bear on
the task. - An ASR system running on an inexpensive home or
office computer with a good microphone can take
free-form dictation, as long as it has been
pre-trained for the speaker's voice.
10What is so good about Voice XML?
- VoiceXML devices are smaller (no mouse an
keyboard) - there are more phones(1.5 billion), than
Computers connected to the WWW - easlily to combining visual browsing and voice
browsing - cheaper and easier to use
11So what is it good for?
- Information retrieval
- Directory asssitant (ATT saved 20 million last
year with it) - E-commerce
- Telephone services
- E-mail over phone
- payments and sheduling orders
12Lets code some...
- VoiceXML 2.0 is an extensible markup language
(XML) for the creation of automated speech
recognition (ASR) and interactive voice response
(IVR) applications. Based on the XML
tag/attribute format, the VoiceXML syntax
involves enclosing instructions (items) within a
tag structure in the following manner - lt element_name attribute_name"attribute_value"gt
- ......contained items......
- lt /element_namegt
13Lets code some...
- A VoiceXML application consists of one or more
text files called documents. These document files
are denoted by a ".vxml" file extension - The first TAG is lt ?xml version"1.0"?gt
- or ltxml version"2.0"gt
14Lets code some...
- Inside of the ltvxmlgt tag, a document is broken
up into discrete dialog elements called
forms. Each form has an ID. - Like that
- lt form id"welcome"gt
15Lets code some...
- Each form has items which controls the session
and interacts with the user - fields
- ltfieldgt - gathers input from the user via speech
or DTMF recognition as defined by a grammar - ltrecordgt - records an audio clip from the user
- lttransfergt - transfers the user to another phone
number -
Dualtone Multifrequency
16Lets code some...
- fields
- ltobjectgt - invokes a platform-specific object
that may gather user input, returning the result
as an ECMAScript object - ltsubdialoggt - performs a call to another dialog
or document(similar to a function call),
returning the result as an ECMAScript object - ECMA a forum for the standartisation of
Information and Communication Systems
17Lets code some...
fields ltblockgt - encloses a sequence of
statements for prompting and computation ltinitialgt
- controls mixed-initiative interactions withing
a form
18Heres some code
lt?xml version"1.0"?gt ltvxml version"2.0"gt ltform
id"getPhoneNumber"gt ltfield name"PhoneNumber"
type"phone" gt ltgrammar src"../grammars/phone.gr
am"Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â type"application/srgsxml"
/gt ltpromptgtWhat's your phone number?lt/promptgtlth
elpgt Please say your ten digit phone number.
lt/helpgt lt/fieldgt lt/formgt lt/vxmlgt
19Lets make it a bit more complicated...
- There can be different forms in one document.
- Lets take an example a Pizza place
- The pizza ordering application is going to need
to do more than just get a phone number from
the caller. It should probably also have the
ability to find out the type of pizza that
the caller wants and the address for delivery.
20Lets make it a bit more complicated...
- We need three forms
- asking for the telephone number of the customer
(the info can be used to check on the customer
database) - What kind of pizza the customer wants
- checking the adress of the customer ( just as
security check and in case he/she is not at the
entered place)
21Lets make it a bit more complicated...
- To transition between forms, one typically uses
the ltgotogt tag. Execution begins in the next
portion of the dialog (contained in another form)
as dictated by the logic of the application.
22Lets make it a bit more complicated...
- ltform id"getPhoneNumber"gt
- ltfield name"PhoneNumber" type"phone" gt
ltgrammar src"../grammars/phone.gram"Â Â Â Â Â Â Â Â Â Â Â
      type"application/srgsxml" /gt
ltpromptgtWhat's your phone number?lt/promptgt - lthelpgt Please say your ten digit phone number.
lt/helpgtlt/fieldgt - ltblockgt
- ltgoto nextgetPhoneNumbergt
- lt/blockgt lt/formgt
23Lets make it a bit more complicated...
- ltform id"pizzaType"gt
- ltfield name"pizzaTopping" gt
- ltpromptgtWhat type of pizza do you want?lt/promptgt
- ltgrammar src"../grammars/pizzas.gram"
type"application/srgsxml"/gt - lt/fieldgt
- lt/formgt
24Lets make it a bit more complicated...
- Transitioning to a form item within a form
- ltgoto nextitem"some_form_items_var_name" /gt
- Transitioning to another form in the current
document - ltgoto next"some_form_id" /gt
- Transitioning to another document
- ltgoto next"http//www.some_url.com/some_doc.vxm
l" /gt
25- You can also split the forms in different
documents and make transitions to each other
26Not complicated enought? I teach you...
- Conditional Statements
- ltifgt, ltelsegt, ltelseifgt are the three elements
utilized for conditional statements in VoiceXML. - Each element should utilize a cond attribute
specifying an ECMAScript boolean condition.
Examples of the usage of each tag are shown on
the next slide
27Not complicated enought? I teach you...
Example 1. ltif cond"total gt 1000"gt ltpromptgt This
is too much to spend.lt/promptgt lt/ifgt Example
2. ltif cond"amount lt 29.95"gt ltgoto
next"debit"/gt ltelse /gt ltpromptgtYou are out of
cash. lt/promptgt lt/ifgt
28Not complicated enought? I teach you...
Example 3. ltif cond"flavor 'vanilla'"gt ltprompt
gt You ordered vanilla. lt/promptgt ltelseif
cond"flavor 'chocolate'" /gt ltpromptgt You
ordered chocolate. lt/promptgt ltelse /gt ltpromptgt
You didn't order vanilla or chocolate.
lt/promptgt lt/ifgt
29The big scope...
There are also other important tags and values
for example session - read only variables
pertaining to an entire user session. These
variables are declared by the platform and cannot
be set within VoiceXML documents. application -
declared by the ltvargt element as children of the
root applications ltvxmlgt tag (declared directly
under this tag and no other). They exist while
the root document is loaded and can be accessed
at any level within any document in the
application. document - declared as children of a
supporting document's ltvxmlgt tag. They are
initialized upon loading the supporting document
and may be accessed only within that
document. dialog - declared as children of ltformgt
or ltmenugt, these variables are accessible only
within that dialog element and are initialized
when the form is visited. If declared inside of
executable content, initialization occurs when
the content is executed. Form/field item
variables initialize as the form item is
collected (see Tutorials1 2). (anonymous) -
Each ltblockgt, ltfilledgt, and ltcatchgt element
defines a new anonymous scope in which variables
may be declared.
30Chill out...
Are there any questions?
31Sources for the aquisation of info
- VoiceXML Forum website
- W3C.org Website
- WebDevelopersConsortiumForum
- Â
32Additional sources
- A number of VoiceXML Forum Members provide access
to developer sites and tool kits that will allow
you to try out VoiceXML for yourself. A few of
these are - BeVocal Cafe
- IBM WebSphere Voice Server SDK
- Motorola Mobile Application Developer's Kit
- Nuance Voice Site Staging Center
- Tellme.Studio
- VoiceGenie Developer Workshop
- Nuance VBuilder Desktop GUI Developer's Tool
- Â
33Â
Tank U weri mäni ... and have a niiiice
test... good luck 2 you all !!!