Speech Technologies and VoiceXML - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Speech Technologies and VoiceXML

Description:

... are speech recognition (SR) and text-to-speech synthesis ... Speech Synthesis, or text-to-speech, is the process of converting text into spoken language. ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 39

Provided by: try3

Category:

more less

Transcript and Presenter's Notes

Title: Speech Technologies and VoiceXML

1
Speech Technologies and VoiceXML

try
Department of Computer Science
National Cheng-Chi University

2
Reference

1Bob Edgar(2001),The VoiceXML Handbook
,NYCMP Books.
2Dave Raggett(2001),Getting started with
VoiceXML 2.0,W3C.
3Sun Microsystems(1998),Java Speech Grammar
Format Specification v1.0,Sun Microsystems.
4Chetan Sharma and Jeff Kunins(2002),VoiceXMLS
trategies and Techniques for Effective Voice
Application Development with VoiceXML 2.0,Wiley.
5Brian Eberman,Jerry Carter,Darren Meyer,David
Goddeau(2002),Building VoiceXML Browsers with
OpenVXI, NYACM Press.

3
Reference

6Microsoft (2002),Speech Technology Overview
, http//www.microsoft.com/speech/evaluation/techo
ver/
7 VoiceGenie Technologies Inc.(2001),White
PaperSpeaking Freely About The VoiceGenie
VoiceXML Gateway and the VoiceXML
Interpreter,VoiceGenie Technologies Inc.
8W3C(2002),VoiceXML Specification v2.0,W3C.
9Chun-Feng,Liao(2002), Basics of Speech
Recognition,NCCU Computer Center.

4
Presentation Agenda

Voice technologies Backgrounds
ASR/TTS
Voice browsing with VoiceXML
VoiceXML architecture
Implementations of VoiceXML Platform
VoiceXML document structure
Bringing Voice Technologies into Virtual
Environment

5
Voice Technologies

In the mid- to late 1990s, personal computers
started to become powerful enough to support ASR
The two key underlying technologies behind these
advances are speech recognition (SR) and
text-to-speech synthesis (TTS).

6
Classification of Voice Application

Basic interactive voice response (IVR)
Computer For stock quotes, press 1. For
trading, press 2.
Human (presses DTMF 1)
Basic speech ASR
C Say the stock name for a price quote.
H Lucent Technologies

7
Classification of Voice Application

Advanced speech ASR
C Stock Services, how may I help you?
H Uh, whats Lucent trading at?
Near-natural language ASR
C How may I help you?
H Um, yeah, Id like to get the current price
of Lucent Technologies
C Lucent is up two at sixty eight and a half.
H OK. I want to buy one hundred shares at
market price.
C

8
Speech Recognition

Capturing speech (analog) signals
Digitizing the sound waves, converting them to
basic language units or phonemes,
Constructing words from phonemes, and
contextually analyzing the words to ensure
correct spelling for words that sound alike (such
as write and right).

9
Speech Recognition Process Flow
SourceMicrosoft Speech.NET Home(http//www.micros
oft.com/speech/ )
10
Speech Recognition Process Flow

Step 1User Input
The system catches users voice in the form of
analog acoustic signal .
Step 2Digitization
Digitize the analog acoustic signal.
Step 3Phonetic Breakdown
Breaking signals into phonemes.

11
Speech Recognition Process Flow

Step 4Statistical Modeling
Mapping phonemes to their phonetic representation
using statistics model (exHMM)
Step 5Matching
According to grammar , phonetic representation
and Dictionary , the system returns an n-best
list (I.e.a word plus a confidence score
Grammar-the union words or phrases to constraint
the range of input or output in the voice
application.
Dictionary-the mapping table of phonetic
representation and word(EXthu,thee?the)

12
Speech Synthesis

Speech Synthesis, or text-to-speech, is the
process of converting text into spoken language.
Breaking down the words into phonemes
Analyzing for special handling of text such as
numbers, currency amounts.
Generating the digital audio for playback.

13
Speech Synthesis
SourceMicrosoft Speech.NET Home(http//www.micros
oft.com/speech/ )
14
Pervasive Computing Model

E-business has changed from client-server model
to web-centric model
Once connect to the Internet,one can get any
information he want. But people wants more
convenient way to connect to Internet.
Lou Gerstner,CEO of IBMPervasive Computing Model
is billion people interacting with million
e-business with trillion devices interconnected.

15
(No Transcript)
16
Voice Browsing

VoiceXML instead of HTML
A voice browser instead of an ordinary web
browser
Phone instead of PC.

17
Show An Scenario of Using VoiceXML
18
VoiceXML Overview

A language for specifying voice dialogs.
Voice dialogs use audio prompts and
text-to-speech (TTS) for output touch-tone keys
(DTMF) and automatic speech recognition (ASR) for
input.
Main input/output device (initially) is the
phone.
Leverages the Internet for application
development and delivery.
Standard language enables portability.(unifies
dialog control languages)

19
History of VoiceXML
SourceVoiceXML forum(http//www.voicexml.org)
20
Making use of mature Internet Technologies

Leverage existing web application development
tools.
Leverage existing web infrastructure for
application delivery.
Clean separation of service logic from user
interaction.

21
VoiceXML Platform Architecture
22
VoiceXML Platform Architecture-1

Telephone and Telephone network-Connects callers
telephone with Telephony Server
VoiceXML Gateway
Voice Browser
Audio input-Speech Recognition (ASR), Touchtone
(DTMF), Audio recording.
Audio output-Audio playback, Speech Synthesis
(TTS)
Interface, Call Controls

23
VoiceXML Platform Architecture-2

VoiceXML Documents
Dialog and flow control
Client-side scripting (ECMAScript)
Speech Recognition grammar
Speech Synthesis pronunciation control
Document servers(web server)
Feeding Static VoiceXML documents or audio files.
Application servers
Generate VoiceXML documents dynamically.
Server-side application logic
Connect to Database, or database interface

24
Voice Gateway
25
VoiceXML Gateway(detail)

26
Implementations of VoiceXML Gateways

In Taiwan
Yes Mobile
Chunghwa Telecom Laboratories
eWings Technologies, Inc
Free
IBM VoiceServerSDK
Open Source
CMUOpenVXI

27
DEMOHow to Write and Run VoiceXML Applications?
28
DEMOGenerate VoiceXML Document
Dynamically-using ASP.NET
29
VoiceXML Document Structure.
30
A Simple VoiceXML Document
31
DEMOVoiceXML /HTML Comparison
32
Bringing Voice Technologies to 3D Virtual
Environment
33
Related Research

Raymond L.Smith,III and Stephen D.Roberts
Using voice input command to operate
simulation-animation.
The efficiency issues of ASR/TTS are taken into
account.
Satoru,Osamu,Katunobu,Takashi,Tomoyoshi,Hideki,Sho
taro,Takio and Katsuhiko
Create 3D virtual user who can speak with user
via speaker and microphone.
Virtual User have the ability to learn words and
recognize human face.

34
We can do more..

Speak to many users who are moving in virtual
environment.
System are built in distributed environment.(I.e.
web)
Make use of XML technology (VoiceXML/SALT).

35
Problems to Solve

Voice /Animation synchronization.
Protocol integration.
ASR/TTS integration and its performance issues.
Virtual user autonomy.
The Voice propagation range issues.

36
System Design Prototype
37
Summary

Speech is the most natural way for human to
communicate thus it will become an important way
in HCI.
VoiceXML has revolutionized speech recognition
telephony application development deployment.
Adding Speech facilities into 3D virtual
environment will make UI more friendly and enable
multi-modal input/output.
My research interest on this topic will focus on
voice-animation synchronization and enable SR/TTS
in distributed 3D virtual environment .

38
Q A

Write a Comment

User Comments (0)