Title: The Universal Speech Interface USI PDG Progress Report
1The Universal Speech Interface (USI) PDG
Progress Report
- Thomas Harris, Stefanie Tomko, Arthur Toth, James
Sanders, - Alex Rudnicky, Roni Rosenfeld
- School of Computer Science
- Carnegie Mellon University
- 4 June 2003
2Outline
- USI Project Summary
- USI Device Control
- USI User Studies
- Tech Transfer Initiative
- USI Application Generator
3Program Goals and Plan
- Overall program goal
- Design a universal (i.e. device-independent)
interface for speech-based interaction with
wearable and home devices - Program plan milestones
- Q1 analysis, interaction principles
- Q2 build device-simulation environment
- Q3 build first device prototype
- Q4 initial user studies development tools
4Program Deliverables
- A novel universal design for speech-based
interaction with wearable- and home-devices - At least one demonstration system exemplifying
the new interface - A set of tools for rapid prototyping of compliant
applications
5The Universal Speech Interface (USI)In a Nutshell
- Unifying approach to human-machine speech
communication - Unified look and feel across all applications
- analogous to the Xerox/Macintosh/Windows GUI
look-and-feel - Stylized, semi-natural interaction
- analogous to the Graffiti alphabet for the Palm
PDA
6Existing Speech Paradigm 1Command-and-control
Systems
- Specialized language, optimized for a given
application - each application has its own interface
- Intensive training of each user
- Daily use helps retain knowledge
7Existing Speech Paradigm 2Unconstrained Dialog
Systems
- Off-the-street users, no training required
- System models existing human behavior
- But this comes at a cost
- each application requires a great deal of data,
labor, human expertise - Speech Recognition technology is pushed to the
limit - user does not easily grasp the applications
functional limits - Out-Of-Vocabulary words (OOV)
- Out-Of-Domain concepts, requests
8Is a Third Paradigm Needed?
- In practice, people are likely to use
- a handful of apps daily
- scheduler, contact manager, email,...
- many apps occasionally
- weather, restaurants, ...
- To exploit this, we need
- flexible, powerful interface for familiar
applications. - immediate engagement with occasional or new
applications.
9Our Approach
- Identify application-independent universals
- user-side
- machine-side
- Find suitable, general solutions
- Human and machine meeting halfway
- Design a stylized, universal look and feel
- Teach it in 5 minutes
10Universal Semantic primitives
- Help primitives
- what can the machine do? how do I do X? what
can I say? - Speech channel primitives
- detect correct ASR errors finished talking?
- Interaction primitives
- turn taking question answering session
management undo - Application primitives
- environment variables query, set
- objects (e.g. lists) describe, navigate, create,
modify, delete
11USI Systems Developed
- Information Access
- MovieLine
- FlightLine
- ApartmentLine
- Device Control
- Stereo system
- X-10 control (e.g., lights)
- Alarm Clock applet
- Digital Video Camera
- Windows Media Player
12USI Demonstration
- MovieLine
- Experimental subject
13USI Device Control
14Device Interaction Analysis
- Analysis was done on multiple devices
- alarm clock / radio
- VCR
- cell phone
- MP3 player
- memo pad / email / vmail
- copier/fax
15USI/Device Design Issues
- Confirmation strategy
- Error handling strategy
- Exploration
- Navigation
- Disambiguation / context mgmt
- Orientation
- Querying state variables
16USI/Device Design Issues
- Confirmation strategy restate--execute
- Error handling strategy ignore
- Exploration OPTIONS
- Navigation use concept of focus
- Disambiguation / context mgmt implicit
- Orientation STATUS
- Querying state variables WHAT IS THE...?
17Hooking up with the PUC project
- Fits within the PUC projects vision of
automatically generated interfaces with different
modalities and form factors - But, can also be used as a standalone speech
interface - Compatibility with visual design is desirable,
but not always natural - nameless states (speech interface must have name
for everything!) - speech interface can have shortcuts (MODE CD
vs. CD)
18Meshing with the PUC project
- Device capabilities specified by XML doc
- States vs. Action dichotomy of the visual
interface does not always conform to speech
interface intuition. - For now, creating our own interface specification
document - Ultimately, will augment XML DTD, so both
interfaces can co-exist
19USI Device control(a.k.a. James the Butler)
Hardware hacking courtesy of the PUC project
20USI Demonstration
- Device Control
- Alarm Clock Example
21User Studies
22User study
- Compared Speech Graffiti (SG) natural language
MovieLines - How does Speech Graffiti compare to a natural
language interface? - Subjective user satisfaction
- Task completion rates
- Word error rates
- How do well do users "get" Speech Graffiti?
- How often do they speak within the grammar?
- In what ways do they deviate from the grammar?
23Subjective user satisfaction
- 17 of 23 preferred Speech Graffiti (SG)
- SG user satisfaction ratings higher than NL in
all categories - SG ratings positive except in annoyance
habitability
24Computer experience training
- Computer Science / Engineering backgrounds and /
or programming experience - Higher user satisfaction ratings
- Better task completion rates
- Training in-domain vs. out-of-domain
- No differences in user satisfaction or task
completion rates
25Task completion
- Overall
- 67.9 SG tasks
- 67.4 NL tasks
- Individual means
- 5.43 of 8 SG tasks
- 5.30 of 8 NL tasks
26Time-to-completion
- Completed tasks
- 67.9 seconds SG
- 73.4 seconds NL
- Incomplete tasks
27Turns-to-completion
- Completed tasks
- 8.2 turns SG
- 3.9 turns NL
- Incomplete tasks
28Word error rates
- Very high for both systems
- On "cleaned" set (on-task, non-noisy utts)
- Concept error is lower for USI
- SG 29.2 from WER
- NL 0.8 from WER
- Low error rate is key to acceptance
- 6 who preferred NL-ML had highest SG WER
29WER user satisfaction
30How often do users speak within the Speech
Graffiti grammar?
- Actually, pretty often!
- and
- grammaticality leads to user satisfaction
31How do users deviate from the grammar?
32Future Interface Design Work
- Redesign Help facility
- SG works best for those who "get it"
- Current system provides no assistance to
"clueless user" - Error analysis
- Compare failure cases in SG and NL interfaces
- Compare user recovery attempts in SG and NL
- Address issues of generalizability
- Promoting transparency of slot set and response
sets - Accessing information sets rather than single
items - Adjust grammar components
33Future Architecture Work
- Integrate current USI environments
- Information Access
- Device Control
- Improve interface between PUC and USI components
- Identify USI-specific techniques to achieve lower
WER - Improved documentation and distribution packaging
34Tech Transfer Initiative
35Tech Transfer Initiative
- Tools for creating new USI apps
- 3 days to create a new application
- prior exposure to speech technology highly
beneficial - decided to further reduce the barrier
- ? create an application generator
36From 3 Days to a Few Hours
- A USI Application Generator
- New USI applications w/out programming!
- XML document fully specifies the application
- slot names
- accepted inputs
- data types
- slot properties
- ...
37From a Few Hours to 15 minutes?
- Created a Web interface to generating the XML
document - Form filling, pulldown menus
- Strong effort to further simplify the process,
minimize complexity of form - many defaults
- for less common choices, edit the XML doc.
- More importantly, no computer savvy needed
38Web Application Generator
- Repository and tool for creating USI database
applications - Abundant online help to guide users through
process - Accessible to anyone with an Internet connection
39Web Application Generator
- Two step process
- General specification
- Slot-by-slot specification
- choose datatype from built-in list, or create own
- Fully featured system with save, copy, delete
functionality - Hides intricacies of XML document writing
- Advanced users have ability to further alter the
final XML document
40 General Specification screen with help box
displayed.
41Web Application Generator
- Built-in generic voice can record own voice
- DB backend
- Postgres
- Oracle
- ODBC (including ASCII files)
- Ultimately web tables
- Platform
- originally mixed Unix/Windows, telephone based
- converted to pure Windows, telephone or laptop
42Transferring USI to PDG members
- We do house calls!
- Carnegie Mellon will install USI developer
environment for each interested member and will
train member staff in the use of the developer
environment - Provide a short tutorial on USI principles and
interface design
43Thank you!Pittsburgh Digital Greenhouse