Title: VoiceXML for Developers
1Art Clarke Director, Government Solutions Tellme
Networks, Inc Presentation Material Provided by
Scott McGlashan, CTO Pipebeach
2What is VoiceXML?
- an XML-based markup language exposing Speech,
Internet, and Telephony resources, enabling the
creation of conversational dialogs over the
phone - Key Design Principles
- Input speech recognition and DTMF
- Output pre-recorded audio and synthesized speech
- Internet XML, IP, HTTP, SSL, JavaScript
- Telephony call transfer, data passing
3Familiar Web Architecture
4Back-end Integration and VoiceXML
5W3C Voice Browser Working Group
- Founded May 1999
- 60 company members
- Mission Standards group to prepare and review
markup languages to enable internet-based speech
applications - Publishes requirements and specifications for
languages in the W3C Speech Interface Framework - http//www.w3.org/Voice
6W3C Speech Interface Framework
VoiceXML v2.0
Speech Grammars
Speech Synthesis
Call Control
Semantic Interpretation
7Road to Standardization
- VoiceXML v1.0 (May 2000)
- VoiceXML Forum
- Specification submitted to the W3C
- VoiceXML v2.0
- W3C Voice Browser Working Group
- 50 members collaborating
- Addressed 400 change requests
8VoiceXML Forum
- Industry Group to promote VoiceXML
- 550 member companies
- Submitted VoiceXML 1.0 to W3C in May 2000
- Key areas of Forum responsibility
- Education
- Marketing
- Conformance
- http//www.voicexml.org
9VoiceXML Adoption
- Activities
- W3C (50 members), VoiceXML Forum (550 members)
- Companies
- Tellme Networks, Nuance, NMS, IBM, BeVocal,
Speechworks, Cisco, Intel, ATT, Cisco, Motorola,
IBM, Nortel, Lucent, Comverse, Verizon,
Pipebeach - Real-world implementations
- ATT, Toll Free Directory Assistance
(800-555-1212), voice portals, financial
brokerages, carriers ATT Wireless, BellSouth,
Cingular, Vodafone
10Programming VoiceXML
- Writing a VoiceXML application is programming!
- Programming in VoiceXML is a mix of procedural
programming and declarative programming - Control constructs are procedural (if-else etc.)
- Forms are mainly declarative the VoiceXML
platform iterates through a ltformgt until values
for all field items have been collected (FIA,
Form Interpretation Algorithm)
11Document
doc1.vxml
- A VoiceXML document defines one or more dialogs
- The user is always in one dialog at any time
- Each dialog specifies the next dialog to
transition to using a URL
Dialog 1
Transition dialog 2
Dialog 2
Transition http//xyz.com/doc2.vxml
12Dialog
- A Dialog describes an interaction between a user
and the system - Two kinds of dialogs form and menu
13Form
- A form defines an interaction that collects
values for a set of field items. Each field item
may specify output, input and evaluation
input
- ltformgt
- ltfield name"travellersgt
- ltgrammar modevoice src./number.grxml/gt
-
- ltpromptgtHow many are travelling?lt/promptgt
-
- ltfilledgt
- ltsubmit nexthttp//travel.com/order/gt
- lt/filledgt
- lt/fieldgt
- lt/formgt
output
eval
14Menu
ltmenu idcommandsgt What service would you
like? ltchoice next/carsgt Car hire
lt/choicegt ltchoice next/hotelsgt Hotel
reservations lt/choicegt ltchoice next/newsgt
Todays news lt/choicegt lt/menugt
- A menu is a simplified form
- A menu presents the user with a choice of options
and then transitions to another dialog based on
that choice
15FIA - Form Interpretation Algorithm
Simplified!
- The FIA has a main loop that repeatedly selects a
form item and then visits it - The first (in document order) form item, whose
field item variable is undefined, is selected - As a result, the user is prompted for each field
item in turn
16FIA Form Example
- ltformgt
- ltfield namedest-city"gt
- ltpromptgtWhere do you want to go to?lt/promptgt
- ltgrammar modevoice src./cities.grxml/gt
- lt/fieldgt
-
- ltfield name"travellersgt
- ltpromptgtHow many are travelling to your
destination?lt/promptgt - ltgrammar modevoice src./number.grxml/gt
- lt/fieldgt
-
- lt!-- other fields --gt
- lt/formgt
field item variable
Field item 1
Field item 2
17Executable Content
- Executable content is a block of procedural logic
- Executable content gives developer dynamic
control over system behavior - Executable content may appear in
- ltblockgt
- ltfilledgt
- event handlers
- Other behavior is controlled by the VoiceXML
platform, such as Form Interpretation Algorithm
18Filled Field Items (Evaluation)
- ltformgt
- ltfield namedest-city"gt
- ltpromptgt Where do you want to go to?
lt/promptgt - ltgrammar modevoice src./cities.grxml/gt
- lt/fieldgt
- ltfield name"travellersgt
- ltpromptgt How many are travelling to your
destination? lt/promptgt - ltgrammar modevoice src./number.grxml/gt
- lt/fieldgt
-
- ltfilledgt
- lt!-- acknowledge order --gt
- Thank you. Your order is now being
processed. - lt/filledgt
- lt/formgt
19Goto
ltgoto nextdocument2/gt
document2
document1
form1
form1
formItem1
ltgoto nextitemformItem2/gt
formItem2
form2
ltgoto nextform2/gt
20Submit
- Typically used to send results from client to
server - Syntaxltsubmit nextURI namelistvar1 var2
.../gt
HTTP GET/POST
web server
cgi, servlet etc.
VoiceXML platform
internet
21Submit, Example
- ltformgt
- ltfield namedest-city"gt
- ltpromptgt Where do you want to go to?
lt/promptgt - ltgrammar modevoice src./cities.grxml/gt
- lt/fieldgt
- ltfield name"travellersgt
- ltpromptgt How many are travelling to ltvalue
expr"city"/gt? lt/promptgt - ltgrammar modevoice src./number.grxml/gt
- lt/fieldgt
- ltfilledgt
- Thank you. Your order is now being processed.
- ltsubmit next"http//travel.com/order"
namelistdest-city travellers"/gt - lt/filledgt
- lt/formgt
22Scripts
- Scripts are client side programs
- Client-side vs. Server-side processing
- Server-side process/time intensive, accessing
large/dynamic information resources, etc - Client-side validate input, simple computation,
state navigation, etc - Only ECMAScript can be used for scripting in
VoiceXML - VoiceXML platforms must support ECMAScript
- ECMAScript - http//www.ecma.ch/ecma1/STAND/ECMA-2
62.HTM
23Scripts, Example
-
- ltscriptgt
- function factorial(n)
- return (n lt 1)? 1 n factorial(n-1)
- lt/scriptgt
- ltform id"form"gt
- ltfield name"fact"gt
- ltpromptgt Say a number and hear its
factorial.lt/promptgt - ltgrammar modevoice src./number.grxml/gt
- ltfilledgt
- ltvalue expr"fact"/gt factorial is
- ltvalue expr"factorial(fact)"/gt
- lt/filledgt
- lt/fieldgt
- lt/formgt
-
24if, else and elseif
ltformgt ... ltfilledgt ltif cond"travellers
gt 10"gt Sorry, we cannot handle groups larger
than 10 persons ltclear namelist"travellers"
/gt ltelseif cond"travellers gt 5 dest-city
'London'"/gt Sorry, we cannot handle
groups larger than 5 persons travelling to
London ltclear namelistcity travellers"/gt
ltelse/gt ltsubmit next"http//travel.com/o
rder"/gt lt/ifgt lt/filledgt lt/formgt
25Variables
- Variables can be manipulated and referenced
- declare ltfield name"user2"gt
- assign ltassign name"user1" exprpeter"/gt
- clear ltclear namelist"user1 user2"/gt
- Reference How many are travelling to ltvalue
exprdest-city/gt ? - VoiceXML variables are ECMAScript variables
26Variable Scope
Session variables are read-only variables
provided by the interpreter context
Scope defined by element containing executable
content (ltblockgt, ltfilledgt or event handler)
Search for variable name
27Variable Scope, Example
document
dialog
- ltvar name"counter" expr"5"/gt
- ltformgt
- ltvar name"counter" expr"1"/gt lt!-- hides
document.counter --gt - ltvar name"counter2" expr"document.counter"/gt
lt!-- 5 --gt - lt!-- Other form contents --gt
- lt/formgt
ltformgt ltvar name"counter" expr"1"/gt lt!--
hides document.counter --gt ltvar name"counter2"
expr"document.counter"/gt lt!-- 5 --gt lt!--
Other form contents --gt lt/formgt
Referencing variables in enclosing scopes can be
done by using the scopes logic scope name as
prefix
28Events
- Events are used to signal unexpected situations
- Events are thrown when
- There is a syntax or semantics error in the
docment - The user does not provide audio input, or an
intelligible response or hangs up - A ltthrowgt element is encountered
- Events are caught by an catch event handler
- ltcatch eventcom.acme.mailreadergt...lt/catchgt
- ltcatch eventnomatch noinputgt...lt/catchgt
- Shortcut ltnomatchgt is equivalent to ltcatch
event"nomatch"gt - Other shortcuts ltnoinputgt, lterrorgt
29Events, Example
- ltfield namedest-city"gt
- ltpromptgt Where do you want to go to?
lt/promptgt - ltgrammar modevoice src./cities.grxml/gt
-
- ltnomatchgt
- Please say the city you want to fly to.
- lt/nomatchgt
- lt/fieldgt
30Useful Links
- W3C (World Wide Web Consortium)
- http//www.w3.org
- W3C Voice Browser Working Group
- http//www.w3.org/voice
- VoiceXML v2.0 Specification
- http//www.w3.org/TR/2001/WD-voicexml20-20011023/
- VoiceXML Forum
- http//www.voicexmlforum.org
31 32Vision for the Future
- Ubiquitous access to web content through Voice
- Enterprise web infrastructure serving all this
content in XML (Studio .NET, modperl, PHP3, JSP) - More natural user interfaces that learn from the
caller - New devices and new telephony networks enabling
the next generation of applications and services
33Advancements in XML
- Document Object Model
- Methods
- Properties
- Events
- Separation of data and presentation
- XSLT
- Xforms
- loading XML resources
- Modularization
- Schemas
- Namespaces
- Xlink, RDF
34Advances in Voice User Interfaces
- New recognition technologies
- Statistical Language Models
- Speaker Verification
- Richer semantic processing
- Better ability to intuit caller intent
- Understand where barge in occurs
- Always listening
- Easier concatenated speech
- Iteration constructs to loop over a list of audio
files - Cleaner separation of prompt resources for
internationalization
35Advances in Core Infrastructure From the Network
to the Device
- VoIP networks with SIP
- In-network call control at different layers
(CCXML) - New multi-modal devices -- 3G, telematics
36Challenges
- Seamless transition to the future
- Resolving the requirements of new devices, all
XML languages, seamlessly with the requirements
for a simple but powerful dialog language
37Good News!
- VoiceXML 2.0 is nearing Last Call Working Draft!
- Customers are building and launching applications
for millions of callers! - Agreement on general direction of activity in the
Voice Browser Working Group! - Lots of industry activity toward this next
generation! - Strong interest in converging all the activities!